Interface StateStore

  • All Known Implementing Classes:
    AbstractStateStore, LocalStateStore, MemoryStateStore, RawStateStore

    public interface StateStore
    The StateStore helps keep track of the extraction/processing state of a data application (extractor, data pipeline, contextualization pipeline, etc.). It is designed to keep track of watermarks to enable incremental load patterns. At the beginning of a run the data application typically calls the load() method, which loads the states from the remote store (which can either be a local JSON file or a table in CDF RAW), and during and/or at the end of a run, the commit method is called, which saves the current states to the persistent store.
    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      void commit()
      Commit the current states to the persistent store.
      void deleteState​(String key)
      Delete the state(s) for a given id.
      void expandHigh​(String key, long value)
      Like setHigh(String, long), but only sets the state if the proposed state is higher than the current state.
      void expandLow​(String key, long value)
      Like setLow(String, long), but only sets the state if the proposed state is lower than the current state.
      OptionalLong getHigh​(String key)
      Get the high watermark state for a single id.
      OptionalLong getLow​(String key)
      Get the low watermark state for a single id.
      Optional<com.google.protobuf.Struct> getState​(String key)
      Get the set of states for a single id.
      boolean isOutsideState​(String key, long value)
      Check if a state is outside the stored state interval.
      Set<String> keySet()
      Returns a read-only set of all state keys.
      void load()
      Load the states from the persistent store.
      void setHigh​(String key, long value)
      Set/update the high watermark for a single id.
      void setLow​(String key, long value)
      Set/update the low watermark for a single id.
      boolean start()
      Start a background thread to perform a commit every maxUploadInterval.
      boolean stop()
      Stops the background thread if it is running and ensures the upload queue is empty by calling upload() one last time after shutting down the thread.
    • Method Detail

      • load

        void load()
           throws Exception
        Load the states from the persistent store.
        Throws:
        Exception
      • commit

        void commit()
             throws Exception
        Commit the current states to the persistent store. Will overwrite/replace previously persisted states.
        Throws:
        Exception
      • setHigh

        void setHigh​(String key,
                     long value)
        Set/update the high watermark for a single id.
        Parameters:
        key - The id to store the state of.
        value - The value of the high watermark.
      • expandHigh

        void expandHigh​(String key,
                        long value)
        Like setHigh(String, long), but only sets the state if the proposed state is higher than the current state.
        Parameters:
        key - key The id to store the state of.
        value - The value of the high watermark.
      • getHigh

        OptionalLong getHigh​(String key)
        Get the high watermark state for a single id.
        Parameters:
        key - The id to get the state of.
        Returns:
        The value of the high watermark. An empty optional in case the state does not exist.
      • setLow

        void setLow​(String key,
                    long value)
        Set/update the low watermark for a single id.
        Parameters:
        key - The id to store the state of.
        value - The value of the low watermark.
      • expandLow

        void expandLow​(String key,
                       long value)
        Like setLow(String, long), but only sets the state if the proposed state is lower than the current state.
        Parameters:
        key - key The id to store the state of.
        value - The value of the low watermark.
      • getLow

        OptionalLong getLow​(String key)
        Get the low watermark state for a single id.
        Parameters:
        key - The id to get the state of.
        Returns:
        The value of the low watermark. An empty optional in case the state does not exist.
      • isOutsideState

        boolean isOutsideState​(String key,
                               long value)
        Check if a state is outside the stored state interval. I.e. if a state is higher than the current high watermark or lower than the current low watermark. The method can be used for determining if a data object should be processed or not.
        Parameters:
        key - the id to test.
        value - the state to test.
        Returns:
        True if the state is outside of stored state or if the key is previously unseen.
      • getState

        Optional<com.google.protobuf.Struct> getState​(String key)
        Get the set of states for a single id. This will give you both the low and high watermark (if set) as a Struct. The returned Struct is a read-only view of the state values.
        Parameters:
        key - The id to get the state of.
        Returns:
        All the states in a Struct.
      • deleteState

        void deleteState​(String key)
        Delete the state(s) for a given id.
        Parameters:
        key - The id to delete the state(s) of.
      • keySet

        Set<String> keySet()
        Returns a read-only set of all state keys.
        Returns:
        An immutable Set of all state keys.
      • start

        boolean start()
        Start a background thread to perform a commit every maxUploadInterval. The default upload interval is every 20 seconds. If the background thread has already been started (for example by an earlier call to start() then this method does nothing and returns false.
        Returns:
        true if the upload thread started successfully, false if the background thread has already been started.
      • stop

        boolean stop()
        Stops the background thread if it is running and ensures the upload queue is empty by calling upload() one last time after shutting down the thread.
        Returns:
        true if the upload thread stopped successfully, false if the upload thread was not started in the first place.