public class DatastoreIO
extends java.lang.Object
The DatastoreIO class provides an experimental API to Read and Write a
PCollection of Datastore Entity. Currently the class supports
read operations on both the DirectPipelineRunner and DataflowPipelineRunner,
and write operations on the DirectPipelineRunner. This API is subject to
change, and currently requires an authentication workaround described below.
Datastore is a fully managed NoSQL data storage service. An Entity is an object in Datastore, analogous to the a row in traditional database table. DatastoreIO supports Read/Write from/to Datastore within Dataflow SDK service.
To use DatastoreIO, users must set up the environment and use gcloud to get credential for Datastore:
$ export CLOUDSDK_EXTRA_SCOPES=https://www.googleapis.com/auth/datastore $ gcloud auth login
Note that the environment variable CLOUDSDK_EXTRA_SCOPES must be set to the same value when executing a Datastore pipeline, as the local auth cache is keyed by the requested scopes.
To read a PCollection from a query to Datastore, use
DatastoreIO.Read, specifying DatastoreIO.Read.from(java.lang.String, com.google.api.services.datastore.DatastoreV1.Query) to specify
dataset to read, the query to read from, and optionally
DatastoreIO.Read.named(java.lang.String) and DatastoreIO.Read.withHost(java.lang.String) to specify
the name of the pipeline step and the host of Datastore, respectively.
For example:
// Read a query from Datastore
PipelineOptions options =
PipelineOptionsFactory.fromArgs(args).create();
Pipeline p = Pipeline.create(options);
PCollection<Entity> entities =
p.apply(DatastoreIO.Read
.named("Read Datastore")
.from(datasetId, query)
.withHost(host));
p.run();
To write a PCollection to a datastore, use
DatastoreIO.Write, specifying DatastoreIO.Write.to(java.lang.String) to specify
the datastore to write to, and optionally TextIO.Write.named(java.lang.String) to specify
the name of the pipeline step. For example:
// A simple Write to Datastore with DirectPipelineRunner (writing is not
// yet implemented for other runners):
PCollection<Entity> entities = ...;
lines.apply(DatastoreIO.Write.to("Write entities", datastore));
p.run();
| Modifier and Type | Class and Description |
|---|---|
static class |
DatastoreIO.Read
A PTransform that reads from a Datastore query and returns a
PCollection<Entity> containing each of the rows of the table. |
static class |
DatastoreIO.Write
A
PTransform that writes a PCollection<Entity> containing
entities to a Datastore kind. |
| Constructor and Description |
|---|
DatastoreIO() |