public static class DirectPipelineRunner.TestCombineDoFn<K,InputT,AccumT,OutputT> extends DoFn<KV<K,Iterable<InputT>>,KV<K,OutputT>>
Combine.KeyedCombineFn into ADD, MERGE and EXTRACT phases (
see com.google.cloud.dataflow.sdk.runners.worker.CombineValuesFn). In order to emulate
this for the DirectPipelineRunner and provide an experience closer to the service, go
through heavy serializability checks for the equivalent of the results of the ADD phase, but
after the GroupByKey shuffle, and the MERGE
phase. Doing these checks ensure that not only is the accumulator coder serializable, but
the accumulator coder can actually serialize the data in question.DoFn.Context, DoFn.ProcessContext, DoFn.RequiresWindowAccess| Constructor and Description |
|---|
TestCombineDoFn(Combine.KeyedCombineFn<? super K,? super InputT,AccumT,OutputT> fn,
Coder<AccumT> accumCoder,
boolean testSerializability,
Random rand) |
| Modifier and Type | Method and Description |
|---|---|
static <K,AccumT,InputT> |
addInputsRandomly(Combine.KeyedCombineFn<? super K,? super InputT,AccumT,?> fn,
K key,
Iterable<InputT> values,
Random random)
Create a random list of accumulators from the given list of values.
|
static <K,InputT,AccumT,OutputT> |
create(Combine.GroupedValues<K,InputT,OutputT> transform,
PCollection<KV<K,Iterable<InputT>>> input,
boolean testSerializability,
Random rand) |
<T> T |
ensureSerializableByCoder(Coder<T> coder,
T value,
String errorContext) |
void |
processElement(DoFn.ProcessContext c)
Processes one input element.
|
createAggregator, createAggregator, finishBundle, getAllowedTimestampSkew, getInputTypeDescriptor, getOutputTypeDescriptor, startBundlepublic static <K,InputT,AccumT,OutputT> DirectPipelineRunner.TestCombineDoFn<K,InputT,AccumT,OutputT> create(Combine.GroupedValues<K,InputT,OutputT> transform, PCollection<KV<K,Iterable<InputT>>> input, boolean testSerializability, Random rand)
public void processElement(DoFn.ProcessContext c) throws Exception
DoFnThe current element of the input PCollection is returned by
c.element(). It should be considered immutable. The Dataflow
runtime will not mutate the element, so it is safe to cache, etc. The element should not be
mutated by any of the DoFn methods, because it may be cached elsewhere, retained by the
Dataflow runtime, or used in other unspecified ways.
A value is added to the main output PCollection by DoFn.Context.output(OutputT).
Once passed to output the element should be considered immutable and not be modified in
any way. It may be cached elsewhere, retained by the Dataflow runtime, or used in other
unspecified ways.
public static <K,AccumT,InputT> List<AccumT> addInputsRandomly(Combine.KeyedCombineFn<? super K,? super InputT,AccumT,?> fn, K key, Iterable<InputT> values, Random random)
Visible for testing purposes only.