K - type of input and output keysVI - type of input valuesVO - type of output valuespublic static class Combine.GroupedValues<K,VI,VO> extends PTransform<PCollection<? extends KV<K,? extends java.lang.Iterable<VI>>>,PCollection<KV<K,VO>>>
GroupedValues<K, VI, VO> takes a
PCollection<KV<K, Iterable<VI>>>, such as the result of
GroupByKey, applies a specified
KeyedCombineFn<K, VI, VA, VO>
to each of the input KV<K, Iterable<VI>> elements to
produce a combined output KV<K, VO> element, and returns a
PCollection<KV<K, VO>> containing all the combined output
elements. It is common for VI == VO, but not required.
Common combining functions include sums, mins, maxes, and averages
of numbers, conjunctions and disjunctions of booleans, statistical
aggregations, etc.
Example of use:
PCollection<KV<String, Integer>> pc = ...;
PCollection<KV<String, Iterable<Integer>>> groupedByKey = pc.apply(
new GroupByKey<String, Integer>());
PCollection<KV<String, Integer>> sumByKey = groupedByKey.apply(
Combine.<String, Integer>groupedValues(
new Sum.SumIntegerFn()));
See also Combine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)/Combine.PerKey
which captures the common pattern of "combining by key" in a
single easy-to-use PTransform.
Combining for different keys can happen in parallel. Moreover,
combining of the Iterable<VI> values associated a single
key can happen in parallel, with different subsets of the values
being combined separately, and their intermediate results combined
further, in an arbitrary tree reduction pattern, until a single
result value is produced for each key.
By default, the Coder of the keys of the output
PCollection<KV<K, VO>> is that of the keys of the input
PCollection<KV<K, VI>>, and the Coder of the values
of the output PCollection<KV<K, VO>> is inferred from the
concrete type of the KeyedCombineFn<K, VI, VA, VO>'s output
type VO.
Each output element has the same timestamp and is in the same window
as its corresponding input element, and the output
PCollection has the same
WindowFn
associated with it as the input.
See also Combine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)/Combine.Globally,
which combines all the values in a PCollection into a
single value in a PCollection.
name| Modifier and Type | Method and Description |
|---|---|
PCollection<KV<K,VO>> |
apply(PCollection<? extends KV<K,? extends java.lang.Iterable<VI>>> input)
Applies this
PTransform on the given Input, and returns its
Output. |
Coder<?> |
getAccumulatorCoder() |
Coder<KV<K,VO>> |
getDefaultOutputCoder()
Returns the default
Coder to use for the output of this
single-output PTransform, or null if
none can be inferred. |
Combine.KeyedCombineFn<? super K,? super VI,?,VO> |
getFn()
Returns the KeyedCombineFn used by this Combine operation.
|
finishSpecifying, getCoderRegistry, getDefaultName, getDefaultOutputCoder, getInput, getKindString, getName, getOutput, getPipeline, setName, setPipeline, toString, withNamepublic Combine.KeyedCombineFn<? super K,? super VI,?,VO> getFn()
public PCollection<KV<K,VO>> apply(PCollection<? extends KV<K,? extends java.lang.Iterable<VI>>> input)
PTransformPTransform on the given Input, and returns its
Output.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
The default implementation throws an exception. A derived class must
either implement apply, or else each runner must supply a custom
implementation via
PipelineRunner.apply(com.google.cloud.dataflow.sdk.transforms.PTransform<Input, Output>, Input).
apply in class PTransform<PCollection<? extends KV<K,? extends java.lang.Iterable<VI>>>,PCollection<KV<K,VO>>>public Coder<?> getAccumulatorCoder()
public Coder<KV<K,VO>> getDefaultOutputCoder()
PTransformCoder to use for the output of this
single-output PTransform, or null if
none can be inferred.
By default, returns null.
getDefaultOutputCoder in class PTransform<PCollection<? extends KV<K,? extends java.lang.Iterable<VI>>>,PCollection<KV<K,VO>>>