Class EntityMatching

java.lang.Object
com.cognite.client.EntityMatching

public abstract class EntityMatching extends Object
This class represents the Cognite entity matching api endpoint It provides methods for interacting with the entity matching services.
  • Field Details

    • LOG

      protected static final org.slf4j.Logger LOG
  • Constructor Details

    • EntityMatching

      public EntityMatching()
  • Method Details

    • of

      public static EntityMatching of(CogniteClient client)
      Construct a new EntityMatching object using the provided configuration. This method is intended for internal use--SDK clients should always use CogniteClient as the entry point to this class.
      Parameters:
      client - The CogniteClient to use for configuration settings.
      Returns:
      The datasets api object.
    • predict

      public List<EntityMatchResult> predict(String modelExternalId, List<com.google.protobuf.Struct> sources, Collection<com.google.protobuf.Struct> targets) throws Exception
      Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training. The default number of matches is 1 and score threshold used for matching is 0.

      Example:

       
           String modelExternalId = // modelExternalId;
           List<Struct> sources = // sources ;
           List<Struct> targets = // targets;
           List<EntityMatchResult> result = client.contextualization()
                           .entityMatching()
                           .predict(modelExternalId, sources, targets);
       
       
      API Reference - Predict matches
      Parameters:
      modelExternalId - The external id of the matching model to use.
      sources - A list of entities to match from. If the list is empty, the model training sources will be used.
      targets - A list of entities to match to. If the list is empty, the model training targets will be used.
      Returns:
      The entity matching results.
      Throws:
      Exception
      See Also:
    • predict

      public List<EntityMatchResult> predict(String modelExternalId, List<com.google.protobuf.Struct> sources, Collection<com.google.protobuf.Struct> targets, int numMatches) throws Exception
      Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training. The default score threshold used for matching is 0.

      Example:

       
           String modelExternalId = // modelExternalId;
           List<Struct> sources = // sources ;
           List<Struct> targets = // targets;
           List<EntityMatchResult> result = client.contextualization()
                           .entityMatching()
                           .predict(modelExternalId, sources, targets, 1);
       
       
      API Reference - Predict matches
      Parameters:
      modelExternalId - The external id of the matching model to use.
      sources - A list of entities to match from. If the list is empty, the model training sources will be used.
      targets - A list of entities to match to. If the list is empty, the model training targets will be used.
      numMatches - The maximum number of match candidates per source.
      Returns:
      The entity matching results.
      Throws:
      Exception
      See Also:
    • predict

      public List<EntityMatchResult> predict(String modelExternalId, List<com.google.protobuf.Struct> sources, Collection<com.google.protobuf.Struct> targets, int numMatches, double scoreThreshold) throws Exception
      Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training.

      Example:

       
           String modelExternalId = // modelExternalId;
           List<Struct> sources = // sources ;
           List<Struct> targets = // targets;
           List<EntityMatchResult> result = client.contextualization()
                           .entityMatching()
                           .predict(modelExternalId, sources, targets, 1, 0d);
       
       
      API Reference - Predict matches
      Parameters:
      modelExternalId - The external id of the matching model to use.
      sources - A list of entities to match from. If the list is empty, the model training sources will be used.
      targets - A list of entities to match to. If the list is empty, the model training targets will be used.
      numMatches - The maximum number of match candidates per source.
      scoreThreshold - The minimum score required for a match candidate.
      Returns:
      The entity matching results.
      Throws:
      Exception
      See Also:
    • predict

      public List<EntityMatchResult> predict(long modelId, List<com.google.protobuf.Struct> sources, Collection<com.google.protobuf.Struct> targets) throws Exception
      Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training. The default number of matches is 1 and score threshold used for matching is 0.

      Example:

       
           Long modelId = // modelId;
           List<Struct> sources = // sources ;
           List<Struct> targets = // targets;
           List<EntityMatchResult> result = client.contextualization()
                           .entityMatching()
                           .predict(modelId, sources, targets);
       
       
      API Reference - Predict matches
      Parameters:
      modelId - The internal id of the matching model to use.
      sources - A list of entities to match from. If the list is empty, the model training sources will be used.
      targets - A list of entities to match to. If the list is empty, the model training targets will be used.
      Returns:
      The entity matching results.
      Throws:
      Exception
      See Also:
    • predict

      public List<EntityMatchResult> predict(long modelId, List<com.google.protobuf.Struct> sources, Collection<com.google.protobuf.Struct> targets, int numMatches) throws Exception
      Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training. The default score threshold used for matching is 0.

      Example:

       
           Long modelId = // modelId;
           List<Struct> sources = // sources ;
           List<Struct> targets = // targets;
           List<EntityMatchResult> result = client.contextualization()
                           .entityMatching()
                           .predict(modelId, sources, targets, 1);
       
       
      API Reference - Predict matches
      Parameters:
      modelId - The internal id of the matching model to use.
      sources - A list of entities to match from. If the list is empty, the model training sources will be used.
      targets - A list of entities to match to. If the list is empty, the model traning targets will be used.
      numMatches - The maximum number of match candidates per source.
      Returns:
      The entity matching results.
      Throws:
      Exception
      See Also:
    • predict

      public List<EntityMatchResult> predict(long modelId, List<com.google.protobuf.Struct> sources, Collection<com.google.protobuf.Struct> targets, int numMatches, double scoreThreshold) throws Exception
      Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training.

      Example:

       
           Long modelId = // modelId;
           List<Struct> sources = // sources ;
           List<Struct> targets = // targets;
           List<EntityMatchResult> result = client.contextualization()
                           .entityMatching()
                           .predict(modelId, sources, targets, 1, 0d);
       
       
      API Reference - Predict matches
      Parameters:
      modelId - The internal id of the matching model to use.
      sources - A list of entities to match from. If the list is empty, the model training sources will be used.
      targets - A list of entities to match to. If the list is empty, the model traning targets will be used.
      numMatches - The maximum number of match candidates per source.
      scoreThreshold - The minimum score required for a match candidate.
      Returns:
      The entity matching results.
      Throws:
      Exception
      See Also:
    • predict

      public List<EntityMatchResult> predict(Collection<Request> requests) throws Exception
      Matches a set of source entities with a set of targets via a given matching model. If either sources or targets are empty lists, the entity matcher will use the sources/targets from the model training.

      Example:

       
           List<Struct> sourceBatch = // List of Struct
           List<Request> requestBatches = new ArrayList<>();
           requestBatches.add(Request.create().withRootParameter("sources", sourceBatch));
           List<EntityMatchResult> result = client.contextualization()
                           .entityMatching()
                           .predict(requestBatches);
       
       
      API Reference - Predict matches
      Parameters:
      requests - input parameters for the predict jobs.
      Returns:
      The entity match results.
      Throws:
      Exception
      See Also:
    • create

      public List<EntityMatchModel> create(Collection<Request> requests) throws Exception
      Train a model that predicts matches between entities (for example, time series names to asset names). This is also known as fuzzy joining. If there are no trueMatches (labeled data), you train a static (unsupervised) model, otherwise a machine learned (supervised) model is trained.

      Example:

       
           List<Struct> sources = // sources ;
           List<Struct> targets = // targets;
           String[] modelTypes = {"simple", "insensitive", "bigram", "frequencyweightedbigram",
                           "bigramextratokenizers", "bigramcombo"};
           Request entityMatchFitRequest = Request.create()
                       .withRootParameter("sources",  sources)
                       .withRootParameter("targets", targets)
                       .withRootParameter("matchFields", Map.of("source", "name", "target", "externalId"))
                       .withRootParameter("featureType", modelTypes[1]);
      
           List<EntityMatchModel> models = client.contextualization().entityMatching()
                       .create(List.of(entityMatchFitRequest));
       
       
      API Reference - Create entity matcher model
      Parameters:
      requests - Input parameters for the create model job(s).
      Returns:
      The created entity match models
      Throws:
      Exception
      See Also:
    • delete

      public List<Item> delete(List<Item> entityMatchingModels) throws Exception
      Deletes a set of entity matching models. The models to delete are identified via their externalId / id by submitting a list of Item.

      Example:

       
           List<Item> entityMatchingModels = List.of(Item.newBuilder().setExternalId("1").build());
           List<Item> deleteItemsResults = client.contextualization().entityMatching()
                                                    .delete(entityMatchingModels);
       
       
      API Reference - Delete entity matcher model
      Parameters:
      entityMatchingModels - a list of Item representing the entity matching models (externalId / id) to be deleted
      Returns:
      The deleted models via Item
      Throws:
      Exception
      See Also:
    • getClient

      public abstract CogniteClient getClient()
    • buildPartitionsList

      protected List<String> buildPartitionsList(int noPartitions)
      Builds an array of partition specifications for parallel retrieval from the Cognite api. This specification is used as a parameter together with the filter / list endpoints. The number of partitions indicate the number of parallel read streams. Employ one partition specification per read stream.

      Example:

       
            List<String> partitions = buildPartitionsList(getClient().getClientConfig().getNoListPartitions());
       
       
      Parameters:
      noPartitions - The total number of partitions
      Returns:
      a List of partition specifications
    • listJson

      protected Iterator<List<String>> listJson(ResourceType resourceType, Request requestParameters, String... partitions) throws Exception
      Will return the results from a list / filter api endpoint. For example, the filter assets endpoint. The results are paged through / iterated over via an Iterator--the entire results set is not buffered in memory, but streamed in "pages" from the Cognite api. If you need to buffer the entire results set, then you have to stream these results into your own data structure. This method support parallel retrieval via a set of partition specifications. The specified partitions will be collected and merged together before being returned via the Iterator.

      Example:

       
            Iterator<List<String>> result = listJson(resourceType, requestParameters, partitions);
       
       
      Parameters:
      resourceType - The resource type to query / filter / list. Ex. event, asset, time series.
      requestParameters - The query / filter specification. Follows the Cognite api request parameters.
      partitions - An optional set of partitions to read via.
      Returns:
      an Iterator over the results set.
      Throws:
      Exception
      See Also:
    • listJson

      protected Iterator<List<String>> listJson(ResourceType resourceType, Request requestParameters, String partitionKey, String... partitions) throws Exception
      Will return the results from a list / filter api endpoint. For example, the filter assets endpoint. The results are paged through / iterated over via an Iterator--the entire results set is not buffered in memory, but streamed in "pages" from the Cognite api. If you need to buffer the entire results set, then you have to stream these results into your own data structure. This method support parallel retrieval via a set of partition specifications. The specified partitions will be collected and merged together before being returned via the Iterator.

      Example:

       
            Iterator<List<String>> result = listJson(resourceType, requestParameters, partitionKey, partitions);
       
       
      Parameters:
      resourceType - The resource type to query / filter / list. Ex. event, asset, time series.
      requestParameters - The query / filter specification. Follows the Cognite api request parameters.
      partitionKey - The key to use for the partitions in the read request. For example partition or cursor.
      partitions - An optional set of partitions to read via.
      Returns:
      an Iterator over the results set.
      Throws:
      Exception
    • retrieveJson

      protected List<String> retrieveJson(ResourceType resourceType, Collection<Item> items) throws Exception
      Retrieve items by id. Will ignore unknown ids by default.

      Example:

       
            Collection<Item> items = //Collection of items with ids;
            List<String> result = retrieveJson(resourceType, items);
       
       
      Parameters:
      resourceType - The item resource type (Event, Asset, etc.) to retrieve.
      items - The item(s) externalId / id to retrieve.
      Returns:
      The items in Json representation.
      Throws:
      Exception
      See Also:
    • retrieveJson

      protected List<String> retrieveJson(ResourceType resourceType, Collection<Item> items, Map<String,Object> parameters) throws Exception
      Retrieve items by id. This version allows you to explicitly set additional parameters for the retrieve request. For example: <"ignoreUnknownIds", true> and <"fetchResources", true>.

      Example:

       
            Collection<Item> items = //Collection of items with ids;
            Map<String, Object> parameters = //Parameters;
            List<String> result = retrieveJson(resourceType, items, parameters);
       
       
      Parameters:
      resourceType - The item resource type (Event, Asset, etc.) to retrieve.
      items - The item(s) externalId / id to retrieve.
      parameters - Additional parameters for the request. For example <"ignoreUnknownIds", true>
      Returns:
      The items in Json representation.
      Throws:
      Exception
    • aggregate

      protected Aggregate aggregate(ResourceType resourceType, Request requestParameters) throws Exception
      Performs an item aggregation request to Cognite Data Fusion. The default aggregation is a total item count based on the (optional) filters in the request. Some resource types, for example Event, supports multiple types of aggregation.

      Example:

       
            Aggregate aggregateResult = aggregate(resourceType,requestParameters);
       
       
      Parameters:
      resourceType - The resource type to perform aggregation of.
      requestParameters - The request containing filters.
      Returns:
      The aggregation result.
      Throws:
      Exception
      See Also:
    • addAuthInfo

      protected Request addAuthInfo(Request request) throws Exception
      Adds the required authentication information into the request object. If the request object already have complete auth info nothing will be added. The following authentication schemes are supported: 1) API key. When using an api key, this service will look up the corresponding project/tenant to issue requests to.

      Example:

       
            Request requestParams = addAuthInfo(request);
       
       
      Parameters:
      request - The request to enrich with auth information.
      Returns:
      The request parameters with auth info added to it.
      Throws:
      Exception
    • getListResponseIterator

      protected Iterator<CompletableFuture<ResponseItems<String>>> getListResponseIterator(ResourceType resourceType, Request requestParameters) throws Exception
      Throws:
      Exception
    • parseItems

      protected List<Item> parseItems(List<String> input) throws Exception
      Parses a list of item object in json representation to typed objects.

      Example:

       
            List<String> input = //List of json;
            List<Item> resultList = parseItems(input);
       
       
      Parameters:
      input - the item list in Json string representation
      Returns:
      the parsed item objects
      Throws:
      Exception
    • toRequestItems

      protected List<Map<String,Object>> toRequestItems(Collection<Item> itemList)
      Converts a list of Item to a request object structure (that can later be parsed to Json).

      Example:

       
            Collection<Item> itemList = //Collection of items;
            List<Map<String, Object>> result = toRequestItems(itemList);
       
       
      Parameters:
      itemList - The items to parse.
      Returns:
      The items in request item object form.
    • deDuplicate

      protected List<Item> deDuplicate(Collection<Item> itemList)
      De-duplicates a collection of Item.

      Example:

       
            Collection<Item> itemList = //Collection of items;
            List<Item> result = deDuplicate(itemList);
       
       
      Parameters:
      itemList -
      Returns:
    • itemsHaveId

      protected boolean itemsHaveId(Collection<Item> items)
      Returns true if all items contain either an externalId or id.

      Example:

       
            Collection<Item> items = //Collection of items;
            boolean result = itemsHaveId(items);
       
       
      Parameters:
      items -
      Returns:
    • mapItemToId

      protected Map<String,Item> mapItemToId(Collection<Item> items)
      Maps all items to their externalId (primary) or id (secondary). If the id function does not return any identity, the item will be mapped to the empty string. Via the identity mapping, this function will also perform deduplication of the input items.

      Example:

       
            Collection<Item> items = //Collection of items;
            Map<String, Item> result = mapItemToId(items);
       
       
      Parameters:
      items - the items to map to externalId / id.
      Returns:
      the Map with all items mapped to externalId / id.
    • parseString

      protected String parseString(String itemJson, String fieldName)
      Try parsing the specified Json path as a String.

      Example:

       
            String json = //String of json object
            String result = parseString(json, "name");
       
       
      Parameters:
      itemJson - The Json string
      fieldName - The Json path to parse
      Returns:
      The Json path as a String.
    • parseName

      protected String parseName(String json)
      Returns the name attribute value from a json input.

      Example:

       
            String json = //String of json object
            String result = parseName(json);
       
       
      Parameters:
      json - the json to parse
      Returns:
      The name value