public class FilterExamples extends Object
Concepts: The Mean transform; Options configuration; using pipeline-derived data as a side input; approaches to filtering, selection, and projection.
The example reads public samples of weather data from BigQuery. It performs a projection on the data, finds the global mean of the temperature readings, filters on readings for a single given month, and then outputs only data (for that month) that has a mean temp smaller than the derived global mean.
Note: Before running this example, you must create a BigQuery dataset to contain your output table.
To execute this pipeline locally, specify general pipeline configuration:
--project=YOUR_PROJECT_ID
and the BigQuery table for the output:
--output=YOUR_PROJECT_ID:DATASET_ID.TABLE_ID
[--monthFilter=<month_number>]
where optional parameter --monthFilter is set to a number 1-12.
To execute this pipeline using the Dataflow service, specify pipeline configuration:
--project=YOUR_PROJECT_ID
--stagingLocation=gs://YOUR_STAGING_DIRECTORY
--runner=BlockingDataflowPipelineRunner
and the BigQuery table for the output:
--output=YOUR_PROJECT_ID:DATASET_ID.TABLE_ID
[--monthFilter=<month_number>]
where optional parameter --monthFilter is set to a number 1-12.
The BigQuery input table defaults to clouddataflow-readonly:samples.weather_stations
and can be overridden with --input.
| Constructor and Description |
|---|
FilterExamples() |