Class EntityMatching


  • public abstract class EntityMatching
    extends Object
    This class represents the Cognite entity matching api endpoint It provides methods for interacting with the entity matching services.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected static org.slf4j.Logger LOG  
    • Constructor Summary

      Constructors 
      Constructor Description
      EntityMatching()  
    • Field Detail

      • LOG

        protected static final org.slf4j.Logger LOG
    • Constructor Detail

      • EntityMatching

        public EntityMatching()
    • Method Detail

      • of

        public static EntityMatching of​(CogniteClient client)
        Construct a new EntityMatching object using the provided configuration. This method is intended for internal use--SDK clients should always use CogniteClient as the entry point to this class.
        Parameters:
        client - The CogniteClient to use for configuration settings.
        Returns:
        The datasets api object.
      • predict

        public List<com.cognite.client.dto.EntityMatchResult> predict​(String modelExternalId,
                                                                      List<com.google.protobuf.Struct> sources,
                                                                      Collection<com.google.protobuf.Struct> targets)
                                                               throws Exception
        Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training. The default number of matches is 1 and score threshold used for matching is 0.
        Parameters:
        modelExternalId - The external id of the matching model to use.
        sources - A list of entities to match from. If the list is empty, the model training sources will be used.
        targets - A list of entities to match to. If the list is empty, the model traning targets will be used.
        Returns:
        The entity matching results.
        Throws:
        Exception
      • predict

        public List<com.cognite.client.dto.EntityMatchResult> predict​(String modelExternalId,
                                                                      List<com.google.protobuf.Struct> sources,
                                                                      Collection<com.google.protobuf.Struct> targets,
                                                                      int numMatches)
                                                               throws Exception
        Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training. The default score threshold used for matching is 0.
        Parameters:
        modelExternalId - The external id of the matching model to use.
        sources - A list of entities to match from. If the list is empty, the model training sources will be used.
        targets - A list of entities to match to. If the list is empty, the model traning targets will be used.
        numMatches - The maximum number of match candidates per source.
        Returns:
        The entity matching results.
        Throws:
        Exception
      • predict

        public List<com.cognite.client.dto.EntityMatchResult> predict​(String modelExternalId,
                                                                      List<com.google.protobuf.Struct> sources,
                                                                      Collection<com.google.protobuf.Struct> targets,
                                                                      int numMatches,
                                                                      double scoreThreshold)
                                                               throws Exception
        Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training.
        Parameters:
        modelExternalId - The external id of the matching model to use.
        sources - A list of entities to match from. If the list is empty, the model training sources will be used.
        targets - A list of entities to match to. If the list is empty, the model traning targets will be used.
        numMatches - The maximum number of match candidates per source.
        scoreThreshold - The minimum score required for a match candidate.
        Returns:
        The entity matching results.
        Throws:
        Exception
      • predict

        public List<com.cognite.client.dto.EntityMatchResult> predict​(long modelId,
                                                                      List<com.google.protobuf.Struct> sources,
                                                                      Collection<com.google.protobuf.Struct> targets)
                                                               throws Exception
        Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training. The default number of matches is 1 and score threshold used for matching is 0.
        Parameters:
        modelId - The internal id of the matching model to use.
        sources - A list of entities to match from. If the list is empty, the model training sources will be used.
        targets - A list of entities to match to. If the list is empty, the model traning targets will be used.
        Returns:
        The entity matching results.
        Throws:
        Exception
      • predict

        public List<com.cognite.client.dto.EntityMatchResult> predict​(long modelId,
                                                                      List<com.google.protobuf.Struct> sources,
                                                                      Collection<com.google.protobuf.Struct> targets,
                                                                      int numMatches)
                                                               throws Exception
        Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training. The default score threshold used for matching is 0.
        Parameters:
        modelId - The internal id of the matching model to use.
        sources - A list of entities to match from. If the list is empty, the model training sources will be used.
        targets - A list of entities to match to. If the list is empty, the model traning targets will be used.
        numMatches - The maximum number of match candidates per source.
        Returns:
        The entity matching results.
        Throws:
        Exception
      • predict

        public List<com.cognite.client.dto.EntityMatchResult> predict​(long modelId,
                                                                      List<com.google.protobuf.Struct> sources,
                                                                      Collection<com.google.protobuf.Struct> targets,
                                                                      int numMatches,
                                                                      double scoreThreshold)
                                                               throws Exception
        Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training.
        Parameters:
        modelId - The internal id of the matching model to use.
        sources - A list of entities to match from. If the list is empty, the model training sources will be used.
        targets - A list of entities to match to. If the list is empty, the model traning targets will be used.
        numMatches - The maximum number of match candidates per source.
        scoreThreshold - The minimum score required for a match candidate.
        Returns:
        The entity matching results.
        Throws:
        Exception
      • predict

        public List<com.cognite.client.dto.EntityMatchResult> predict​(Collection<Request> requests)
                                                               throws Exception
        Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training. All input parameters are provided via the request object.
        Parameters:
        requests - input parameters for the predict jobs.
        Returns:
        The entity match results.
        Throws:
        Exception
      • create

        public List<com.cognite.client.dto.EntityMatchModel> create​(Collection<Request> requests)
                                                             throws Exception
        Train a model that predicts matches between entities (for example, time series names to asset names). This is also known as fuzzy joining. If there are no trueMatches (labeled data), you train a static (unsupervised) model, otherwise a machine learned (supervised) model is trained. All input parameters are provided via the request object.
        Parameters:
        requests - Input parameters for the create model job(s).
        Returns:
        The created entity match models
        Throws:
        Exception
      • delete

        public List<com.cognite.client.dto.Item> delete​(List<com.cognite.client.dto.Item> entityMatchingModels)
                                                 throws Exception
        Deletes a set of entity matching models. The models to delete are identified via their externalId / id by submitting a list of Item.
        Parameters:
        entityMatchingModels - a list of Item representing the entity matching models (externalId / id) to be deleted
        Returns:
        The deleted models via Item
        Throws:
        Exception
      • buildPartitionsList

        protected List<String> buildPartitionsList​(int noPartitions)
        Builds an array of partition specifications for parallel retrieval from the Cognite api. This specification is used as a parameter together with the filter / list endpoints. The number of partitions indicate the number of parallel read streams. Employ one partition specification per read stream.
        Parameters:
        noPartitions - The total number of partitions
        Returns:
        a List of partition specifications
      • listJson

        protected Iterator<List<String>> listJson​(ResourceType resourceType,
                                                  Request requestParameters,
                                                  String... partitions)
                                           throws Exception
        Will return the results from a list / filter api endpoint. For example, the filter assets endpoint. The results are paged through / iterated over via an Iterator--the entire results set is not buffered in memory, but streamed in "pages" from the Cognite api. If you need to buffer the entire results set, then you have to stream these results into your own data structure. This method support parallel retrieval via a set of partition specifications. The specified partitions will be collected and merged together before being returned via the Iterator.
        Parameters:
        resourceType - The resource type to query / filter / list. Ex. event, asset, time series.
        requestParameters - The query / filter specification. Follows the Cognite api request parameters.
        partitions - An optional set of partitions to read via.
        Returns:
        an Iterator over the results set.
        Throws:
        Exception
      • listJson

        protected Iterator<List<String>> listJson​(ResourceType resourceType,
                                                  Request requestParameters,
                                                  String partitionKey,
                                                  String... partitions)
                                           throws Exception
        Will return the results from a list / filter api endpoint. For example, the filter assets endpoint. The results are paged through / iterated over via an Iterator--the entire results set is not buffered in memory, but streamed in "pages" from the Cognite api. If you need to buffer the entire results set, then you have to stream these results into your own data structure. This method support parallel retrieval via a set of partition specifications. The specified partitions will be collected and merged together before being returned via the Iterator.
        Parameters:
        resourceType - The resource type to query / filter / list. Ex. event, asset, time series.
        requestParameters - The query / filter specification. Follows the Cognite api request parameters.
        partitionKey - The key to use for the partitions in the read request. For example partition or cursor.
        partitions - An optional set of partitions to read via.
        Returns:
        an Iterator over the results set.
        Throws:
        Exception
      • retrieveJson

        protected List<String> retrieveJson​(ResourceType resourceType,
                                            Collection<com.cognite.client.dto.Item> items)
                                     throws Exception
        Retrieve items by id.
        Parameters:
        resourceType - The item resource type (Event, Asset, etc.) to retrieve.
        items - The item(s) externalId / id to retrieve.
        Returns:
        The items in Json representation.
        Throws:
        Exception
      • aggregate

        protected com.cognite.client.dto.Aggregate aggregate​(ResourceType resourceType,
                                                             Request requestParameters)
                                                      throws Exception
        Performs an item aggregation request to Cognite Data Fusion. The default aggregation is a total item count based on the (optional) filters in the request. Some resource types, for example Event, supports multiple types of aggregation.
        Parameters:
        resourceType - The resource type to perform aggregation of.
        requestParameters - The request containing filters.
        Returns:
        The aggregation result.
        Throws:
        Exception
        See Also:
        Cognite API v1 specification
      • addAuthInfo

        protected Request addAuthInfo​(Request request)
                               throws Exception
        Adds the required authentication information into the request object. If the request object already have complete auth info nothing will be added. The following authentication schemes are supported: 1) API key. When using an api key, this service will look up the corresponding project/tenant to issue requests to.
        Parameters:
        request - The request to enrich with auth information.
        Returns:
        The request parameters with auth info added to it.
        Throws:
        Exception
      • parseItems

        protected List<com.cognite.client.dto.Item> parseItems​(List<String> input)
                                                        throws Exception
        Parses a list of item object in json representation to typed objects.
        Parameters:
        input - the item list in Json string representation
        Returns:
        the parsed item objects
        Throws:
        Exception
      • toRequestItems

        protected List<Map<String,​Object>> toRequestItems​(Collection<com.cognite.client.dto.Item> itemList)
        Converts a list of Item to a request object structure (that can later be parsed to Json).
        Parameters:
        itemList - The items to parse.
        Returns:
        The items in request item object form.
      • deDuplicate

        protected List<com.cognite.client.dto.Item> deDuplicate​(Collection<com.cognite.client.dto.Item> itemList)
        De-duplicates a collection of Item.
        Parameters:
        itemList -
        Returns:
      • itemsHaveId

        protected boolean itemsHaveId​(Collection<com.cognite.client.dto.Item> items)
        Returns true if all items contain either an externalId or id.
        Parameters:
        items -
        Returns:
      • mapItemToId

        protected Map<String,​com.cognite.client.dto.Item> mapItemToId​(Collection<com.cognite.client.dto.Item> items)
        Maps all items to their externalId (primary) or id (secondary). If the id function does not return any identity, the item will be mapped to the empty string. Via the identity mapping, this function will also perform deduplication of the input items.
        Parameters:
        items - the items to map to externalId / id.
        Returns:
        the Map with all items mapped to externalId / id.
      • parseString

        protected String parseString​(String itemJson,
                                     String fieldName)
        Try parsing the specified Json path as a String.
        Parameters:
        itemJson - The Json string
        fieldName - The Json path to parse
        Returns:
        The Json path as a String.
      • parseName

        protected String parseName​(String json)
        Returns the name attribute value from a json input.
        Parameters:
        json - the json to parse
        Returns:
        The name value