public class DatastoreWordCount extends Object
This example shows how to use DatastoreIO to read from Datastore and write the results to Cloud Storage. Note that this example will write data to Datastore, which may incur charge for Datastore operations.
To run this example, users need to use gcloud to get credential for Datastore:
$ gcloud auth login
To run this pipeline locally, the following options must be provided:
--project=YOUR_PROJECT_ID
--dataset=YOUR_DATASET_ID
--output=[YOUR_LOCAL_FILE | gs://YOUR_OUTPUT_PATH]
To run this example using Dataflow service, you must additionally provide either --stagingLocation or --tempLocation, and select one of the Dataflow pipeline runners, eg --runner=BlockingDataflowPipelineRunner.
Note: this example creates entities with Ancestor keys to ensure that all entities created are in the same entity group. Similarly, the query used to read from the Cloud Datastore uses an Ancestor filter. Ancestors are used to ensure strongly consistent results in Cloud Datastore. For more information, see the Cloud Datastore documentation on Structing Data for Strong Consistency.
| Modifier and Type | Class and Description |
|---|---|
static interface |
DatastoreWordCount.Options
Options supported by
DatastoreWordCount. |
| Constructor and Description |
|---|
DatastoreWordCount() |
| Modifier and Type | Method and Description |
|---|---|
static void |
main(String[] args)
An example to demo how to use
DatastoreIO. |
static void |
readDataFromDatastore(DatastoreWordCount.Options options)
An example that creates a pipeline to do DatastoreIO.Read from Datastore.
|
static void |
writeDataToDatastore(DatastoreWordCount.Options options)
An example that creates a pipeline to populate DatastoreIO from a
text input.
|
public static void writeDataToDatastore(DatastoreWordCount.Options options)
public static void readDataFromDatastore(DatastoreWordCount.Options options)
public static void main(String[] args)
DatastoreIO. The runner here is
customizable, which means users could pass either DirectPipelineRunner
or DataflowPipelineRunner in the pipeline options.