VI - type of input valuesVA - type of mutable accumulator valuesVO - type of output valuespublic abstract static class Combine.CombineFn<VI,VA,VO>
extends java.lang.Object
implements java.io.Serializable
CombineFn<VI, VA, VO> specifies how to combine a
collection of input values of type VI into a single
output value of type VO. It does this via one or more
intermediate mutable accumulator values of type VA.
The overall process to combine a collection of input
VI values into a single output VO value is as
follows:
VI values are partitioned into one or more
batches.
createAccumulator() operation is
invoked to create a fresh mutable accumulator value of type
VA, initialized to represent the combination of zero
values.
VI value in a batch, the
addInput(VA, VI) operation is invoked to add the value to that
batch's accumulator VA value. The accumulator may just
record the new value (e.g., if VA == List<VI>, or may do
work to represent the combination more compactly.
mergeAccumulators(java.lang.Iterable<VA>) operation is invoked to
combine a collection of accumulator VA values into a
single combined output accumulator VA value, once the
merging accumulators have had all all the input values in their
batches added to them. This operation is invoked repeatedly,
until there is only one accumulator value left.
extractOutput(VA) operation is invoked on the final
accumulator VA value to get the output VO value.
For example:
public class AverageFn extends CombineFn<Integer, AverageFn.Accum, Double> {
public static class Accum {
int sum = 0;
int count = 0;
}
public Accum createAccumulator() { return new Accum(); }
public void addInput(Accum accum, Integer input) {
accum.sum += input;
accum.count++;
}
public Accum mergeAccumulators(Iterable<Accum> accums) {
Accum merged = createAccumulator();
for (Accum accum : accums) {
merged.sum += accum.sum;
merged.count += accum.count;
}
return merged;
}
public Double extractOutput(Accum accum) {
return ((double) accum.sum) / accum.count;
}
}
PCollection<Integer> pc = ...;
PCollection<Double> average = pc.apply(Combine.globally(new AverageFn()));
Combining functions used by Combine.Globally,
Combine.PerKey, Combine.GroupedValues, and
PTransforms derived from them should be
associative and commutative. Associativity is
required because input values are first broken up into subgroups
before being combined, and their intermediate results further
combined, in an arbitrary tree structure. Commutativity is
required because any order of the input values is ignored when
breaking up input values into groups.
| Constructor and Description |
|---|
Combine.CombineFn() |
| Modifier and Type | Method and Description |
|---|---|
abstract VA |
addInput(VA accumulator,
VI input)
Adds the given input value to the given accumulator, returning the
new accumulator value.
|
VO |
apply(java.lang.Iterable<? extends VI> inputs)
Applies this
CombineFn to a collection of input values
to produce a combined output value. |
<K> Combine.KeyedCombineFn<K,VI,VA,VO> |
asKeyedFn()
Converts this
CombineFn into an equivalent
Combine.KeyedCombineFn, which ignores the keys passed to it and
combines the values according to this CombineFn. |
abstract VA |
createAccumulator()
Returns a new, mutable accumulator value, representing the
accumulation of zero input values.
|
abstract VO |
extractOutput(VA accumulator)
Returns the output value that is the result of combining all
the input values represented by the given accumulator.
|
Coder<VA> |
getAccumulatorCoder(CoderRegistry registry,
Coder<VI> inputCoder)
Returns the
Coder to use for accumulator VA
values, or null if it is not able to be inferred. |
Coder<VO> |
getDefaultOutputCoder(CoderRegistry registry,
Coder<VI> inputCoder)
Returns the
Coder to use by default for output
VO values, or null if it is not able to be inferred. |
abstract VA |
mergeAccumulators(java.lang.Iterable<VA> accumulators)
Returns an accumulator representing the accumulation of all the
input values accumulated in the merging accumulators.
|
public abstract VA createAccumulator()
public abstract VA addInput(VA accumulator, VI input)
For efficiency, the input accumulator may be modified and returned.
public abstract VA mergeAccumulators(java.lang.Iterable<VA> accumulators)
May modify any of the argument accumulators. May return a fresh accumulator, or may return one of the (modified) argument accumulators.
public abstract VO extractOutput(VA accumulator)
public VO apply(java.lang.Iterable<? extends VI> inputs)
CombineFn to a collection of input values
to produce a combined output value.
Useful when testing the behavior of a CombineFn
separately from a Combine transform.
public Coder<VA> getAccumulatorCoder(CoderRegistry registry, Coder<VI> inputCoder)
Coder to use for accumulator VA
values, or null if it is not able to be inferred.
By default, uses the knowledge of the Coder being used
for VI values and the enclosing Pipeline's
CoderRegistry to try to infer the Coder for VA
values.
This is the Coder used to send data through a communication-intensive shuffle step, so a compact and efficient representation may have significant performance benefits.
public Coder<VO> getDefaultOutputCoder(CoderRegistry registry, Coder<VI> inputCoder)
Coder to use by default for output
VO values, or null if it is not able to be inferred.
By default, uses the knowledge of the Coder being
used for input VI values and the enclosing
Pipeline's CoderRegistry to try to infer the
Coder for VO values.
public <K> Combine.KeyedCombineFn<K,VI,VA,VO> asKeyedFn()
CombineFn into an equivalent
Combine.KeyedCombineFn, which ignores the keys passed to it and
combines the values according to this CombineFn.K - the type of the (ignored) keys