Class AbstractS3FileAccess

java.lang.Object
dev.jcputney.elearning.parser.impl.access.AbstractS3FileAccess
All Implemented Interfaces:
FileAccess, AutoCloseable
Direct Known Subclasses:
S3FileAccessV1, S3FileAccessV2

public abstract class AbstractS3FileAccess extends Object implements FileAccess, AutoCloseable
Abstract base class for S3 FileAccess implementations with common caching and optimization logic. This class provides the shared functionality between AWS SDK v1 and v2 implementations.
  • Field Details

    • STREAMING_THRESHOLD

      protected static final long STREAMING_THRESHOLD
      Threshold for streaming files instead of caching them in memory. Files larger than this size will be streamed directly from S3.
      See Also:
    • MAX_CACHE_SIZE

      protected static final int MAX_CACHE_SIZE
      Maximum size of the small file cache. This limits the number of small files (less than STREAMING_THRESHOLD) that can be cached in memory.
      See Also:
    • COMMON_MODULE_FILES

      protected static final Set<String> COMMON_MODULE_FILES
      Common module files that are frequently accessed and should be prefetched. This set contains file names that are typically present in SCORM/xAPI modules.
    • bucketName

      protected final String bucketName
      The name of the S3 bucket to access. This is used to construct the full S3 paths for file operations.
    • executorService

      protected final ExecutorService executorService
      Executor service for parallel operations, such as file existence checks and prefetching. This allows for efficient asynchronous operations without blocking the main thread.
    • fileExistsCache

      protected final Map<String,Boolean> fileExistsCache
      Cache for file existence checks to avoid repeated S3 API calls. This cache stores the existence status of files by their relative paths.
    • directoryListCache

      protected final Map<String,List<String>> directoryListCache
      Cache for directory listings to avoid repeated S3 API calls. This cache stores the list of files in each directory path.
    • smallFileCache

      protected final Map<String,byte[]> smallFileCache
      Cache for small files (less than STREAMING_THRESHOLD) to avoid repeated S3 API calls. This cache is used to store the contents of small files as byte arrays for quick access.
    • fileSizeCache

      protected final Map<String,Long> fileSizeCache
      Cache for file sizes, used to avoid repeated S3 API calls for size checks. This is particularly useful for large files where we want to avoid streaming the entire content just to get the size.
    • allFilesCache

      protected final AtomicReference<List<String>> allFilesCache
      A thread-safe cache storing the list of all file paths within the module.

      This cache is intended to improve performance for file-related operations by storing the result of a full scan of the S3 bucket or prefix. The cache is stored in an AtomicReference to ensure safe publication across threads once populated.

      Modifications to this cache should be controlled to maintain data consistency across the class, particularly when the underlying S3 bucket contents change.

    • rootPath

      protected volatile String rootPath
      The root path within the S3 bucket to access. This is used to construct full paths for files and directories. It is lazily initialized to allow subclasses to set it up after their S3 client is ready.
  • Constructor Details

    • AbstractS3FileAccess

      protected AbstractS3FileAccess(String bucketName, String rootPath)
      Constructs an abstract S3FileAccess instance.
      Parameters:
      bucketName - The name of the S3 bucket to access.
      rootPath - The root path of the S3 bucket to access.
  • Method Details

    • close

      public void close() throws Exception
      Closes the resource and performs necessary cleanup operations. This method ensures that the associated executor service, if not null, is properly shut down to release any resources tied to it.
      Specified by:
      close in interface AutoCloseable
      Throws:
      Exception - if an error occurs during the shutdown process
    • fileExistsInternal

      public boolean fileExistsInternal(String path)
      Checks if a file exists at the specified path with caching.
      Specified by:
      fileExistsInternal in interface FileAccess
      Parameters:
      path - The path of the file to check (guaranteed to be non-null).
      Returns:
      True if the file exists, false otherwise.
    • fileExistsBatch

      public Map<String,Boolean> fileExistsBatch(List<String> paths)
      Batch check if multiple files exist - much more efficient for module parsing.
      Specified by:
      fileExistsBatch in interface FileAccess
      Parameters:
      paths - List of file paths to check
      Returns:
      Map of a path to existence boolean
    • prefetchCommonFiles

      public void prefetchCommonFiles()
      Prefetches common files that are not already present in the small file cache.

      This method identifies files that are listed in the COMMON_MODULE_FILES collection but are not yet loaded into the small file cache. For each file that is missing from the cache, a prefetch task is initiated asynchronously using the provided executorService. All asynchronous tasks are executed concurrently, and the method blocks until all tasks complete.

      Key logic details: - Filters the COMMON_MODULE_FILES to identify files absent from the small file cache. - Asynchronously prefetches missing files using independent tasks. - Waits for all asynchronous tasks to complete before returning.

      This method is designed to optimize the availability of commonly used files and reduce the latency during their access by loading them in advance.

      Specified by:
      prefetchCommonFiles in interface FileAccess
    • listFilesInternal

      public List<String> listFilesInternal(String directoryPath) throws IOException
      Lists the files in the specified directory path with caching and pagination support.
      Specified by:
      listFilesInternal in interface FileAccess
      Parameters:
      directoryPath - The path of the directory to list files from (guaranteed to be non-null).
      Returns:
      A list of file paths in the specified directory.
      Throws:
      IOException - if there's an error accessing the directory or listing its contents.
    • getFileContentsInternal

      public InputStream getFileContentsInternal(String path) throws IOException
      Gets the contents of a file as an InputStream with intelligent streaming/caching.
      Specified by:
      getFileContentsInternal in interface FileAccess
      Parameters:
      path - The path of the file to get contents from (guaranteed to be non-null).
      Returns:
      An InputStream containing the file contents.
      Throws:
      IOException - if the file can't be read.
    • getInternalRootDirectory

      public String getInternalRootDirectory()
      Determines the internal root directory within the S3 bucket with lazy initialization.
      Returns:
      The detected internal root directory or the original path if none is detected.
    • clearCaches

      public void clearCaches()
      Clear all caches - useful for testing or when bucket contents change.
      Specified by:
      clearCaches in interface FileAccess
    • getAllFiles

      public List<String> getAllFiles() throws IOException
      Gets a list of all files in the module.

      This method scans the entire S3 bucket/prefix once and caches the results for subsequent calls, improving performance for file existence checks.

      Specified by:
      getAllFiles in interface FileAccess
      Returns:
      List of all file paths in the module
      Throws:
      IOException - if there's an error accessing the S3 bucket
    • getCacheStats

      public Map<String,Integer> getCacheStats()
      Retrieves statistics about various internal caches used in the class.
      Returns:
      A map where the keys are the cache names (e.g., "fileExistsCache", "directoryListCache", etc.) and the values are the respective sizes of these caches.
    • getTotalSize

      public long getTotalSize() throws IOException
      Gets the total size of all files in the module.

      This method calculates the sum of all file sizes in the module using the cached file sizes from the S3 bucket.

      Specified by:
      getTotalSize in interface FileAccess
      Returns:
      Total size of all files in bytes
      Throws:
      IOException - if there's an error accessing file sizes
    • shutdown

      public void shutdown()
      Shutdown the executor service when the instance is no longer needed.
    • fullPath

      public String fullPath(String relativePath)
      Constructs the full S3 path by combining the root path with the relative path.
      Specified by:
      fullPath in interface FileAccess
      Parameters:
      relativePath - The relative path within the module
      Returns:
      The full S3 key path
    • getRootPath

      public String getRootPath()
      Retrieves the root path of the current instance.
      Specified by:
      getRootPath in interface FileAccess
      Returns:
      the root path as a String
    • reconfigureRootPath

      protected final void reconfigureRootPath(String newRootPath)
      Reconfigures the root path for the S3 file access. This method clears all internal caches and re-initializes the root path to the specified value.
      Parameters:
      newRootPath - The new root path to set. This value will be normalized and stored as the root path.
    • getFileContentsBase

      protected InputStream getFileContentsBase(String path) throws IOException
      Base implementation for getting file contents with intelligent streaming/caching. Protected to allow subclasses to extend with additional functionality.
      Parameters:
      path - The path of the file to get contents from (guaranteed to be non-null).
      Returns:
      An InputStream containing the file contents.
      Throws:
      IOException - If an error occurs while getting file contents.
    • getInputStreamWrapper

      protected abstract InputStream getInputStreamWrapper(InputStream stream, long fileSize)
      Wraps the provided input stream with additional processing or functionality.
      Parameters:
      stream - the original input stream to be wrapped
      fileSize - the size of the file associated with the input stream, in bytes
      Returns:
      an InputStream instance which provides a wrapped version of the original input stream
    • getCachedFileSize

      protected long getCachedFileSize(String path)
      Get the cached file size or fetch it from S3 if not cached.
      Parameters:
      path - The path of the file
      Returns:
      The file size in bytes
    • checkFileExistsOnS3

      protected abstract boolean checkFileExistsOnS3(String path)
      Check if a file exists on S3 using the specific SDK implementation.
      Parameters:
      path - The relative path to check
      Returns:
      True if the file exists, false otherwise
    • getFileSizeOnS3

      protected abstract long getFileSizeOnS3(String path)
      Get the size of a file on S3 using the specific SDK implementation.
      Parameters:
      path - The relative path of the file
      Returns:
      The file size in bytes, or 0, if error
    • listFilesOnS3

      protected abstract List<String> listFilesOnS3(String directoryPath)
      List files in a directory on S3 using the specific SDK implementation.
      Parameters:
      directoryPath - The directory path to the list
      Returns:
      List of file paths
    • getS3ObjectAsBytes

      protected abstract byte[] getS3ObjectAsBytes(String fullPath) throws IOException
      Get the contents of a small S3 object as a byte array.
      Parameters:
      fullPath - The full S3 key path
      Returns:
      The file contents as bytes
      Throws:
      IOException - if there's an error reading the file
    • getS3ObjectStream

      protected abstract InputStream getS3ObjectStream(String fullPath) throws IOException
      Get a stream for a large S3 object.
      Parameters:
      fullPath - The full S3 key path
      Returns:
      An InputStream for the file contents
      Throws:
      IOException - if there's an error opening the stream
    • detectInternalRootDirectory

      protected abstract String detectInternalRootDirectory(String rootPath)
      Detect the internal root directory using the specific SDK implementation.
      Parameters:
      rootPath - The current root path
      Returns:
      The detected internal root directory