public class Pipeline
extends java.lang.Object
After a Pipeline has been constructed, it can be executed,
using a default or an explicit PipelineRunner.
Multiple Pipelines can be constructed and executed independently
and concurrently.
Each Pipeline is self-contained and isolated from any other
Pipeline. The PValues that are inputs and outputs of each of a
Pipeline's PTransforms are also owned by that Pipeline.
A PValue owned by one Pipeline can be read only by PTransforms
also owned by that Pipeline.
Here's a typical example of use:
// Start by defining the options for the pipeline.
PipelineOptions options = PipelineOptionsFactory.create();
// Then create the pipeline.
Pipeline p = Pipeline.create(options);
// A root PTransform, like TextIO.Read or Create, gets added
// to the Pipeline by being applied:
PCollection<String> lines =
p.apply(TextIO.Read.from("gs://bucket/dir/file*.txt"));
// A Pipeline can have multiple root transforms:
PCollection<String> moreLines =
p.apply(TextIO.Read.from("gs://bucket/other/dir/file*.txt"));
PCollection<String> yetMoreLines =
p.apply(Create.of("yet", "more", "lines")).setCoder(StringUtf8Coder.of());
// Further PTransforms can be applied, in an arbitrary (acyclic) graph.
// Subsequent PTransforms (and intermediate PCollections etc.) are
// implicitly part of the same Pipeline.
PCollection<String> allLines =
PCollectionList.of(lines).and(moreLines).and(yetMoreLines)
.apply(new Flatten<String>());
PCollection<KV<String, Integer>> wordCounts =
allLines
.apply(ParDo.of(new ExtractWords()))
.apply(new Count<String>());
PCollection<String> formattedWordCounts =
wordCounts.apply(ParDo.of(new FormatCounts()));
formattedWordCounts.apply(TextIO.Write.to("gs://bucket/dir/counts.txt"));
// PTransforms aren't executed when they're applied, rather they're
// just added to the Pipeline. Once the whole Pipeline of PTransforms
// is constructed, the Pipeline's PTransforms can be run using a
// PipelineRunner. The default PipelineRunner executes the Pipeline
// directly, sequentially, in this one process, which is useful for
// unit tests and simple experiments:
p.run();
| Modifier and Type | Class and Description |
|---|---|
static interface |
Pipeline.PipelineVisitor
A PipelineVisitor can be passed into
traverseTopologically(com.google.cloud.dataflow.sdk.Pipeline.PipelineVisitor) to be called for each of the
transforms and values in the Pipeline. |
| Modifier | Constructor and Description |
|---|---|
protected |
Pipeline(PipelineRunner<?> runner)
Deprecated.
|
protected |
Pipeline(PipelineRunner<?> runner,
PipelineOptions options) |
| Modifier and Type | Method and Description |
|---|---|
void |
addValueInternal(PValue value)
Adds the given PValue to this Pipeline.
|
<Output extends POutput> |
apply(PTransform<? super PBegin,Output> root)
Starts using this pipeline with a root PTransform such as
TextIO.Read or
Create. |
static <Input extends PInput,Output extends POutput> |
applyTransform(Input input,
PTransform<? super Input,Output> transform)
Applies the given PTransform to the given Input,
and returns its Output.
|
PBegin |
begin()
Returns a
PBegin owned by this Pipeline. |
static Pipeline |
create(PipelineOptions options)
Constructs a pipeline from the provided options.
|
CoderRegistry |
getCoderRegistry()
Returns the
CoderRegistry that this Pipeline uses. |
java.lang.String |
getFullName(PTransform<?,?> transform)
Returns the fully qualified name of a transform.
|
PInput |
getInput(PTransform<?,?> transform)
Returns the input associated with a transform.
|
PipelineOptions |
getOptions()
Returns the configured pipeline options.
|
POutput |
getOutput(PTransform<?,?> transform)
Returns the output associated with a transform.
|
PipelineRunner<?> |
getRunner()
Returns the configured pipeline runner.
|
PipelineResult |
run()
Runs the Pipeline.
|
void |
setCoderRegistry(CoderRegistry coderRegistry)
Sets the
CoderRegistry that this Pipeline uses. |
java.lang.String |
toString() |
void |
traverseTopologically(Pipeline.PipelineVisitor visitor)
Invokes the PipelineVisitor's
Pipeline.PipelineVisitor.visitTransform(com.google.cloud.dataflow.sdk.runners.TransformTreeNode) and
Pipeline.PipelineVisitor.visitValue(com.google.cloud.dataflow.sdk.values.PValue, com.google.cloud.dataflow.sdk.runners.TransformTreeNode) operations on each of this
Pipeline's PTransforms and PValues, in forward
topological order. |
@Deprecated protected Pipeline(PipelineRunner<?> runner)
protected Pipeline(PipelineRunner<?> runner, PipelineOptions options)
public static Pipeline create(PipelineOptions options)
public PBegin begin()
public <Output extends POutput> Output apply(PTransform<? super PBegin,Output> root)
TextIO.Read or
Create.
Alias for begin().apply(root).
public PipelineResult run()
public CoderRegistry getCoderRegistry()
CoderRegistry that this Pipeline uses.public void setCoderRegistry(CoderRegistry coderRegistry)
CoderRegistry that this Pipeline uses.public void traverseTopologically(Pipeline.PipelineVisitor visitor)
Pipeline.PipelineVisitor.visitTransform(com.google.cloud.dataflow.sdk.runners.TransformTreeNode) and
Pipeline.PipelineVisitor.visitValue(com.google.cloud.dataflow.sdk.values.PValue, com.google.cloud.dataflow.sdk.runners.TransformTreeNode) operations on each of this
Pipeline's PTransforms and PValues, in forward
topological order.
Traversal of the pipeline causes PTransform and PValue instances to be marked as finished, at which point they may no longer be modified.
Typically invoked by PipelineRunner subclasses.
public static <Input extends PInput,Output extends POutput> Output applyTransform(Input input, PTransform<? super Input,Output> transform)
Called by PInput subclasses in their apply methods.
public java.lang.String toString()
toString in class java.lang.Objectpublic PipelineRunner<?> getRunner()
public PipelineOptions getOptions()
public POutput getOutput(PTransform<?,?> transform)
java.lang.IllegalStateException - if the transform has not been applied to the pipeline.public PInput getInput(PTransform<?,?> transform)
java.lang.IllegalStateException - if the transform has not been applied to the pipeline.public java.lang.String getFullName(PTransform<?,?> transform)
java.lang.IllegalStateException - if the transform has not been applied to the pipeline.public void addValueInternal(PValue value)
For internal use only.