public class WordCount extends Object
This class, WordCount, is the second in a series of four successively more detailed
'word count' examples. You may first want to take a look at MinimalWordCount.
After you've looked at this example, then see the DebuggingWordCount
pipeline, for introduction of additional concepts.
For a detailed walkthrough of this example, see https://cloud.google.com/dataflow/java-sdk/wordcount-example
Basic concepts, also in the MinimalWordCount example: Reading text files; counting a PCollection; writing to GCS.
New Concepts:
1. Executing a Pipeline both locally and using the Dataflow service 2. Using ParDo with static DoFns defined out-of-line 3. Building a composite transform 4. Defining your own pipeline options
Concept #1: you can execute this pipeline either locally or using the Dataflow service. These are now command-line options and not hard-coded as they were in the MinimalWordCount example. To execute this pipeline locally, specify general pipeline configuration:
--project=YOUR_PROJECT_ID
and a local output file or output prefix on GCS:
--output=[YOUR_LOCAL_FILE | gs://YOUR_OUTPUT_PREFIX]
To execute this pipeline using the Dataflow service, specify pipeline configuration:
--project=YOUR_PROJECT_ID
--stagingLocation=gs://YOUR_STAGING_DIRECTORY
--runner=BlockingDataflowPipelineRunner
and an output prefix on GCS:
--output=gs://YOUR_OUTPUT_PREFIX
The input file defaults to gs://dataflow-samples/shakespeare/kinglear.txt and can be
overridden with --inputFile.
| Modifier and Type | Class and Description |
|---|---|
static class |
WordCount.CountWords
A PTransform that converts a PCollection containing lines of text into a PCollection of
formatted word counts.
|
static class |
WordCount.FormatAsTextFn
A SimpleFunction that converts a Word and Count into a printable string.
|
static interface |
WordCount.WordCountOptions
Options supported by
WordCount. |
| Constructor and Description |
|---|
WordCount() |
public static void main(String[] args)