public static class DirectPipelineRunner.TestCombineDoFn<K,InputT,AccumT,OutputT> extends DoFn<KV<K,Iterable<InputT>>,KV<K,OutputT>>
Combine.KeyedCombineFn into ADD, MERGE
and EXTRACT phases (see com.google.cloud.dataflow.sdk.runners.worker.CombineValuesFn).
In order to emulate
this for the DirectPipelineRunner and provide an experience
closer to the service, go through heavy seralizability checks for
the equivalent of the results of the ADD phase, but after the
GroupByKey
shuffle, and the MERGE phase. Doing these checks
ensure that not only is the accumulator coder serializable, but
the accumulator coder can actually serialize the data in
question.DoFn.Context, DoFn.ProcessContext, DoFn.RequiresWindowAccess| Constructor and Description |
|---|
TestCombineDoFn(Combine.KeyedCombineFn<? super K,? super InputT,AccumT,OutputT> fn,
Coder<AccumT> accumCoder,
boolean testSerializability) |
| Modifier and Type | Method and Description |
|---|---|
static <K,AccumT,InputT> |
addInputsRandomly(Combine.KeyedCombineFn<? super K,? super InputT,AccumT,?> fn,
K key,
Iterable<InputT> values,
Random random) |
static <K,InputT,AccumT,OutputT> |
create(Combine.GroupedValues<K,InputT,OutputT> transform,
PCollection<KV<K,Iterable<InputT>>> input,
boolean testSerializability) |
<T> T |
ensureSerializableByCoder(Coder<T> coder,
T value,
String errorContext) |
void |
processElement(DoFn.ProcessContext c)
Processes an input element.
|
createAggregator, createAggregator, finishBundle, getAllowedTimestampSkew, getInputTypeDescriptor, getOutputTypeDescriptor, startBundlepublic static <K,InputT,AccumT,OutputT> DirectPipelineRunner.TestCombineDoFn<K,InputT,AccumT,OutputT> create(Combine.GroupedValues<K,InputT,OutputT> transform, PCollection<KV<K,Iterable<InputT>>> input, boolean testSerializability)
public void processElement(DoFn.ProcessContext c) throws Exception
DoFnpublic static <K,AccumT,InputT> List<AccumT> addInputsRandomly(Combine.KeyedCombineFn<? super K,? super InputT,AccumT,?> fn, K key, Iterable<InputT> values, Random random)