K - the type of the keys in the input and output
PCollectionspublic class CoGroupByKey<K> extends PTransform<KeyedPCollectionTuple<K>,PCollection<KV<K,CoGbkResult>>>
Example of performing a CoGroupByKey followed by a ParDo that consumes the results:
PCollection<KV<K, V1>> pt1 = ...;
PCollection<KV<K, V2>> pt2 = ...;
final TupleTag<V1> t1 = new TupleTag<>();
final TupleTag<V2> t2 = new TupleTag<>();
PCollection<KV<K, CoGbkResult>> coGbkResultCollection =
KeyedPCollectionTuple.of(t1, pt1)
.and(t2, pt2)
.apply(CoGroupByKey.<K>create());
PCollection<T> finalResultCollection =
coGbkResultCollection.apply(ParDo.of(
new DoFn<KV<K, CoGbkResult>, T>() {
@Override
public void processElement(ProcessContext c) {
KV<K, CoGbkResult> e = c.element();
Iterable<V1> pt1Vals = e.getValue().getAll(t1);
V2 pt2Val = e.getValue().getOnly(t2);
... Do Something ....
c.output(...some T...);
}
}));
name| Modifier and Type | Method and Description |
|---|---|
PCollection<KV<K,CoGbkResult>> |
apply(KeyedPCollectionTuple<K> input)
Applies this
PTransform on the given Input, and returns its
Output. |
static <K> CoGroupByKey<K> |
create()
Returns a
CoGroupByKey<K> PTransform. |
finishSpecifying, getCoderRegistry, getDefaultName, getDefaultOutputCoder, getDefaultOutputCoder, getInput, getKindString, getName, getOutput, getPipeline, setName, setPipeline, toString, withNamepublic static <K> CoGroupByKey<K> create()
CoGroupByKey<K> PTransform.K - the type of the keys in the input and output
PCollectionspublic PCollection<KV<K,CoGbkResult>> apply(KeyedPCollectionTuple<K> input)
PTransformPTransform on the given Input, and returns its
Output.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
The default implementation throws an exception. A derived class must
either implement apply, or else each runner must supply a custom
implementation via
PipelineRunner.apply(com.google.cloud.dataflow.sdk.transforms.PTransform<Input, Output>, Input).
apply in class PTransform<KeyedPCollectionTuple<K>,PCollection<KV<K,CoGbkResult>>>