public class TopWikipediaSessions extends Object
Concepts: Using Windowing to perform time-based aggregations of data.
It is not recommended to execute this pipeline locally, given the size of the default input data.
To execute this pipeline using the Dataflow service, specify pipeline configuration:
--project=YOUR_PROJECT_ID
--stagingLocation=gs://YOUR_STAGING_DIRECTORY
--runner=BlockingDataflowPipelineRunner
and an output prefix on GCS:
--output=gs://YOUR_OUTPUT_PREFIX
The default input is gs://dataflow-samples/wikipedia_edits/*.json and can be
overridden with --input.
The input for this example is large enough that it's a good place to enable (experimental) autoscaling:
--autoscalingAlgorithm=BASIC
--maxNumWorkers=20
This will automatically scale the number of workers up over time until the job completes.| Constructor and Description |
|---|
TopWikipediaSessions() |
public static void main(String[] args)