Class AbstractS3FileAccess
- All Implemented Interfaces:
FileAccess,AutoCloseable
- Direct Known Subclasses:
S3FileAccessV1,S3FileAccessV2
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final AtomicReference<List<String>>A thread-safe cache storing the list of all file paths within the module.protected final StringThe name of the S3 bucket to access.Common module files that are frequently accessed and should be prefetched.Cache for directory listings to avoid repeated S3 API calls.protected final ExecutorServiceExecutor service for parallel operations, such as file existence checks and prefetching.Cache for file existence checks to avoid repeated S3 API calls.Cache for file sizes, used to avoid repeated S3 API calls for size checks.protected static final intMaximum size of the small file cache.protected StringThe root path within the S3 bucket to access.Cache for small files (less than STREAMING_THRESHOLD) to avoid repeated S3 API calls.protected static final longThreshold for streaming files instead of caching them in memory. -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedAbstractS3FileAccess(String bucketName, String rootPath) Constructs an abstract S3FileAccess instance. -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract booleancheckFileExistsOnS3(String path) Check if a file exists on S3 using the specific SDK implementation.voidClear all caches - useful for testing or when bucket contents change.voidclose()Closes the resource and performs necessary cleanup operations.protected abstract StringdetectInternalRootDirectory(String rootPath) Detect the internal root directory using the specific SDK implementation.fileExistsBatch(List<String> paths) Batch check if multiple files exist - much more efficient for module parsing.booleanfileExistsInternal(String path) Checks if a file exists at the specified path with caching.Constructs the full S3 path by combining the root path with the relative path.Gets a list of all files in the module.protected longgetCachedFileSize(String path) Get the cached file size or fetch it from S3 if not cached.Retrieves statistics about various internal caches used in the class.protected InputStreamgetFileContentsBase(String path) Base implementation for getting file contents with intelligent streaming/caching.Gets the contents of a file as an InputStream with intelligent streaming/caching.protected abstract longgetFileSizeOnS3(String path) Get the size of a file on S3 using the specific SDK implementation.protected abstract InputStreamgetInputStreamWrapper(InputStream stream, long fileSize) Wraps the provided input stream with additional processing or functionality.Determines the internal root directory within the S3 bucket with lazy initialization.Retrieves the root path of the current instance.protected abstract byte[]getS3ObjectAsBytes(String fullPath) Get the contents of a small S3 object as a byte array.protected abstract InputStreamgetS3ObjectStream(String fullPath) Get a stream for a large S3 object.longGets the total size of all files in the module.listFilesInternal(String directoryPath) Lists the files in the specified directory path with caching and pagination support.listFilesOnS3(String directoryPath) List files in a directory on S3 using the specific SDK implementation.voidPrefetches common files that are not already present in the small file cache.protected final voidreconfigureRootPath(String newRootPath) Reconfigures the root path for the S3 file access.voidshutdown()Shutdown the executor service when the instance is no longer needed.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface dev.jcputney.elearning.parser.api.FileAccess
fileExists, getFileContents, listFiles
-
Field Details
-
STREAMING_THRESHOLD
protected static final long STREAMING_THRESHOLDThreshold for streaming files instead of caching them in memory. Files larger than this size will be streamed directly from S3.- See Also:
-
MAX_CACHE_SIZE
protected static final int MAX_CACHE_SIZEMaximum size of the small file cache. This limits the number of small files (less than STREAMING_THRESHOLD) that can be cached in memory.- See Also:
-
COMMON_MODULE_FILES
Common module files that are frequently accessed and should be prefetched. This set contains file names that are typically present in SCORM/xAPI modules. -
bucketName
The name of the S3 bucket to access. This is used to construct the full S3 paths for file operations. -
executorService
Executor service for parallel operations, such as file existence checks and prefetching. This allows for efficient asynchronous operations without blocking the main thread. -
fileExistsCache
Cache for file existence checks to avoid repeated S3 API calls. This cache stores the existence status of files by their relative paths. -
directoryListCache
Cache for directory listings to avoid repeated S3 API calls. This cache stores the list of files in each directory path. -
smallFileCache
Cache for small files (less than STREAMING_THRESHOLD) to avoid repeated S3 API calls. This cache is used to store the contents of small files as byte arrays for quick access. -
fileSizeCache
Cache for file sizes, used to avoid repeated S3 API calls for size checks. This is particularly useful for large files where we want to avoid streaming the entire content just to get the size. -
allFilesCache
A thread-safe cache storing the list of all file paths within the module.This cache is intended to improve performance for file-related operations by storing the result of a full scan of the S3 bucket or prefix. The cache is stored in an
AtomicReferenceto ensure safe publication across threads once populated.Modifications to this cache should be controlled to maintain data consistency across the class, particularly when the underlying S3 bucket contents change.
-
rootPath
The root path within the S3 bucket to access. This is used to construct full paths for files and directories. It is lazily initialized to allow subclasses to set it up after their S3 client is ready.
-
-
Constructor Details
-
AbstractS3FileAccess
Constructs an abstract S3FileAccess instance.- Parameters:
bucketName- The name of the S3 bucket to access.rootPath- The root path of the S3 bucket to access.
-
-
Method Details
-
close
Closes the resource and performs necessary cleanup operations. This method ensures that the associated executor service, if not null, is properly shut down to release any resources tied to it.- Specified by:
closein interfaceAutoCloseable- Throws:
Exception- if an error occurs during the shutdown process
-
fileExistsInternal
Checks if a file exists at the specified path with caching.- Specified by:
fileExistsInternalin interfaceFileAccess- Parameters:
path- The path of the file to check (guaranteed to be non-null).- Returns:
- True if the file exists, false otherwise.
-
fileExistsBatch
Batch check if multiple files exist - much more efficient for module parsing.- Specified by:
fileExistsBatchin interfaceFileAccess- Parameters:
paths- List of file paths to check- Returns:
- Map of a path to existence boolean
-
prefetchCommonFiles
public void prefetchCommonFiles()Prefetches common files that are not already present in the small file cache.This method identifies files that are listed in the COMMON_MODULE_FILES collection but are not yet loaded into the small file cache. For each file that is missing from the cache, a prefetch task is initiated asynchronously using the provided executorService. All asynchronous tasks are executed concurrently, and the method blocks until all tasks complete.
Key logic details: - Filters the COMMON_MODULE_FILES to identify files absent from the small file cache. - Asynchronously prefetches missing files using independent tasks. - Waits for all asynchronous tasks to complete before returning.
This method is designed to optimize the availability of commonly used files and reduce the latency during their access by loading them in advance.
- Specified by:
prefetchCommonFilesin interfaceFileAccess
-
listFilesInternal
Lists the files in the specified directory path with caching and pagination support.- Specified by:
listFilesInternalin interfaceFileAccess- Parameters:
directoryPath- The path of the directory to list files from (guaranteed to be non-null).- Returns:
- A list of file paths in the specified directory.
- Throws:
IOException- if there's an error accessing the directory or listing its contents.
-
getFileContentsInternal
Gets the contents of a file as an InputStream with intelligent streaming/caching.- Specified by:
getFileContentsInternalin interfaceFileAccess- Parameters:
path- The path of the file to get contents from (guaranteed to be non-null).- Returns:
- An InputStream containing the file contents.
- Throws:
IOException- if the file can't be read.
-
getInternalRootDirectory
Determines the internal root directory within the S3 bucket with lazy initialization.- Returns:
- The detected internal root directory or the original path if none is detected.
-
clearCaches
public void clearCaches()Clear all caches - useful for testing or when bucket contents change.- Specified by:
clearCachesin interfaceFileAccess
-
getAllFiles
Gets a list of all files in the module.This method scans the entire S3 bucket/prefix once and caches the results for subsequent calls, improving performance for file existence checks.
- Specified by:
getAllFilesin interfaceFileAccess- Returns:
- List of all file paths in the module
- Throws:
IOException- if there's an error accessing the S3 bucket
-
getCacheStats
Retrieves statistics about various internal caches used in the class.- Returns:
- A map where the keys are the cache names (e.g., "fileExistsCache", "directoryListCache", etc.) and the values are the respective sizes of these caches.
-
getTotalSize
Gets the total size of all files in the module.This method calculates the sum of all file sizes in the module using the cached file sizes from the S3 bucket.
- Specified by:
getTotalSizein interfaceFileAccess- Returns:
- Total size of all files in bytes
- Throws:
IOException- if there's an error accessing file sizes
-
shutdown
public void shutdown()Shutdown the executor service when the instance is no longer needed. -
fullPath
Constructs the full S3 path by combining the root path with the relative path.- Specified by:
fullPathin interfaceFileAccess- Parameters:
relativePath- The relative path within the module- Returns:
- The full S3 key path
-
getRootPath
Retrieves the root path of the current instance.- Specified by:
getRootPathin interfaceFileAccess- Returns:
- the root path as a String
-
reconfigureRootPath
Reconfigures the root path for the S3 file access. This method clears all internal caches and re-initializes the root path to the specified value.- Parameters:
newRootPath- The new root path to set. This value will be normalized and stored as the root path.
-
getFileContentsBase
Base implementation for getting file contents with intelligent streaming/caching. Protected to allow subclasses to extend with additional functionality.- Parameters:
path- The path of the file to get contents from (guaranteed to be non-null).- Returns:
- An InputStream containing the file contents.
- Throws:
IOException- If an error occurs while getting file contents.
-
getInputStreamWrapper
Wraps the provided input stream with additional processing or functionality.- Parameters:
stream- the original input stream to be wrappedfileSize- the size of the file associated with the input stream, in bytes- Returns:
- an InputStream instance which provides a wrapped version of the original input stream
-
getCachedFileSize
Get the cached file size or fetch it from S3 if not cached.- Parameters:
path- The path of the file- Returns:
- The file size in bytes
-
checkFileExistsOnS3
Check if a file exists on S3 using the specific SDK implementation.- Parameters:
path- The relative path to check- Returns:
- True if the file exists, false otherwise
-
getFileSizeOnS3
Get the size of a file on S3 using the specific SDK implementation.- Parameters:
path- The relative path of the file- Returns:
- The file size in bytes, or 0, if error
-
listFilesOnS3
List files in a directory on S3 using the specific SDK implementation.- Parameters:
directoryPath- The directory path to the list- Returns:
- List of file paths
-
getS3ObjectAsBytes
Get the contents of a small S3 object as a byte array.- Parameters:
fullPath- The full S3 key path- Returns:
- The file contents as bytes
- Throws:
IOException- if there's an error reading the file
-
getS3ObjectStream
Get a stream for a large S3 object.- Parameters:
fullPath- The full S3 key path- Returns:
- An InputStream for the file contents
- Throws:
IOException- if there's an error opening the stream
-
detectInternalRootDirectory
Detect the internal root directory using the specific SDK implementation.- Parameters:
rootPath- The current root path- Returns:
- The detected internal root directory
-