public static class PubsubIO.Read extends Object
PTransform that continuously reads from a Pubsub stream and
returns a PCollection<String> containing the items from
the stream.
When running with a runner that only supports bounded PCollections
(such as DirectPipelineRunner or DataflowPipelineRunner without --streaming), only a
bounded portion of the input Pubsub stream can be processed. As such, either
PubsubIO.Read.Bound.maxNumRecords or PubsubIO.Read.Bound.maxReadTime must be set.
| Modifier and Type | Class and Description |
|---|---|
static class |
PubsubIO.Read.Bound<T>
A
PTransform that reads from a PubSub source and returns
a unbounded PCollection containing the items from the stream. |
| Constructor and Description |
|---|
Read() |
| Modifier and Type | Method and Description |
|---|---|
static PubsubIO.Read.Bound<String> |
idLabel(String idLabel)
Creates and returns a PubSubIO.Read PTransform where unique record identifiers are
expected to be provided using the PubSub labeling API.
|
static PubsubIO.Read.Bound<String> |
maxNumRecords(int maxNumRecords)
Sets the maximum number of records that will be read from Pubsub.
|
static PubsubIO.Read.Bound<String> |
maxReadTime(org.joda.time.Duration maxReadTime)
Sets the maximum duration during which records will be read from Pubsub.
|
static PubsubIO.Read.Bound<String> |
named(String name) |
static PubsubIO.Read.Bound<String> |
subscription(String subscription)
Creates and returns a PubsubIO.Read PTransform for reading from
a specific Pubsub subscription.
|
static PubsubIO.Read.Bound<String> |
timestampLabel(String timestampLabel)
Creates and returns a PubsubIO.Read PTransform where record timestamps are expected
to be provided using the PubSub labeling API.
|
static PubsubIO.Read.Bound<String> |
topic(String topic)
Creates and returns a PubsubIO.Read PTransform for reading from
a Pubsub topic with the specified publisher topic.
|
static <T> PubsubIO.Read.Bound<T> |
withCoder(Coder<T> coder)
Creates and returns a PubsubIO.Read PTransform that uses the given
Coder<T> to decode PubSub record into a value of type T. |
public static PubsubIO.Read.Bound<String> named(String name)
public static PubsubIO.Read.Bound<String> topic(String topic)
/topics/<project>/<topic>, where <project> is the name of
the publishing project. The <topic> component must comply with
the below requirements.
Dataflow will start reading data published on this topic from the time the pipeline is started. Any data published on the topic before the pipeline is started will not be read by Dataflow.
public static PubsubIO.Read.Bound<String> subscription(String subscription)
projects/<project>/subscriptions/<subscription>,
where <project> is the name of the project the subscription belongs to.
The <subscription> component must comply with the below requirements.
public static PubsubIO.Read.Bound<String> timestampLabel(String timestampLabel)
<timestampLabel> parameter
specifies the label name. The label value sent to PubsSub is a numerical value representing
the number of milliseconds since the Unix epoch. For example, if using the joda time classes,
org.joda.time.Instant.getMillis() returns the correct value for this label.
If <timestampLabel> is not provided, the system will generate record timestamps
the first time it sees each record. All windowing will be done relative to these timestamps.
By default windows are emitted based on an estimate of when this source is likely
done producing data for a given timestamp (referred to as the Watermark; see
AfterWatermark for more details).
Any late data will be handled by the trigger specified with the windowing strategy -- by
default it will be output immediately.
Note that the system can guarantee that no late data will ever be seen when it assigns
timestamps by arrival time (i.e. timestampLabel is not provided).
public static PubsubIO.Read.Bound<String> idLabel(String idLabel)
<idLabel> parameter
specifies the label name. The label value sent to PubSub can be any string value that
uniquely identifies this record.
If idLabel is not provided, Dataflow cannot guarantee that no duplicate data will be delivered on the PubSub stream. In this case, deduplication of the stream will be stricly best effort.
public static <T> PubsubIO.Read.Bound<T> withCoder(Coder<T> coder)
Coder<T> to decode PubSub record into a value of type T.
By default, uses StringUtf8Coder, which just
returns the text lines as Java strings.
T - the type of the decoded elements, and the elements
of the resulting PCollection.public static PubsubIO.Read.Bound<String> maxNumRecords(int maxNumRecords)
Either this or maxReadTime(org.joda.time.Duration) must be set for use as a bounded
PCollection.
public static PubsubIO.Read.Bound<String> maxReadTime(org.joda.time.Duration maxReadTime)
Either this or maxNumRecords(int) must be set for use as a bounded
PCollection.