InputT - type of input valuesOutputT - type of output valuespublic static class Combine.Globally<InputT,OutputT> extends PTransform<PCollection<InputT>,PCollection<OutputT>>
Combine.Globally<InputT, OutputT> takes a PCollection<InputT>
and returns a PCollection<OutputT> whose elements are the result of
combining all the elements in each window of the input PCollection,
using a specified CombineFn<InputT, AccumT, OutputT>.
It is common for InputT == OutputT, but not required. Common combining
functions include sums, mins, maxes, and averages of numbers,
conjunctions and disjunctions of booleans, statistical
aggregations, etc.
Example of use:
PCollection<Integer> pc = ...;
PCollection<Integer> sum = pc.apply(
Combine.globally(new Sum.SumIntegerFn()));
Combining can happen in parallel, with different subsets of the
input PCollection being combined separately, and their
intermediate results combined further, in an arbitrary tree
reduction pattern, until a single result value is produced.
If the input PCollection is windowed into GlobalWindows,
a default value in the GlobalWindow will be output if the input
PCollection is empty. To use this with inputs with other windowing,
either withoutDefaults() or asSingletonView() must be called.
By default, the Coder of the output PValue<OutputT>
is inferred from the concrete type of the
CombineFn<InputT, AccumT, OutputT>'s output type OutputT.
See also Combine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)/Combine.PerKey and
Combine.groupedValues(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)/Combine.GroupedValues, which
are useful for combining values associated with each key in
a PCollection of KVs.
name| Modifier and Type | Method and Description |
|---|---|
PCollection<OutputT> |
apply(PCollection<InputT> input)
Applies this
PTransform on the given InputT, and returns its
Output. |
Combine.GloballyAsSingletonView<InputT,OutputT> |
asSingletonView()
Returns a
PTransform that produces a PCollectionView
whose elements are the result of combining elements per-window in
the input PCollection. |
Combine.Globally<InputT,OutputT> |
named(String name)
Return a new
Globally transform that's like this transform but with the
specified name. |
Combine.Globally<InputT,OutputT> |
withFanout(int fanout)
Returns a
PTransform identical to this, but that uses an intermediate node
to combine parts of the data to reduce load on the final global combine step. |
Combine.Globally<InputT,OutputT> |
withoutDefaults()
Returns a
PTransform identical to this, but that does not attempt to
provide a default value in the case of empty input. |
getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, toString, validatepublic Combine.Globally<InputT,OutputT> named(String name)
Globally transform that's like this transform but with the
specified name. Does not modify this transform.public Combine.GloballyAsSingletonView<InputT,OutputT> asSingletonView()
PTransform that produces a PCollectionView
whose elements are the result of combining elements per-window in
the input PCollection. If a value is requested from the view
for a window that is not present, the result of calling the CombineFn
on empty input will returned.public Combine.Globally<InputT,OutputT> withoutDefaults()
PTransform identical to this, but that does not attempt to
provide a default value in the case of empty input.public Combine.Globally<InputT,OutputT> withFanout(int fanout)
PTransform identical to this, but that uses an intermediate node
to combine parts of the data to reduce load on the final global combine step.
The fanout parameter determines the number of intermediate keys
that will be used.
public PCollection<OutputT> apply(PCollection<InputT> input)
PTransformPTransform on the given InputT, and returns its
Output.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
The default implementation throws an exception. A derived class must
either implement apply, or else each runner must supply a custom
implementation via
PipelineRunner.apply(com.google.cloud.dataflow.sdk.transforms.PTransform<InputT, OutputT>, InputT).
apply in class PTransform<PCollection<InputT>,PCollection<OutputT>>