T - the type of each of the elements of the input PCollectionpublic static class AvroIO.Write.Bound<T> extends PTransform<PCollection<T>,PDone>
PTransform that writes a bounded PCollection to an Avro file (or
multiple Avro files matching a sharding pattern).name| Modifier and Type | Method and Description |
|---|---|
PDone |
apply(PCollection<T> input)
Applies this
PTransform on the given InputT, and returns its
Output. |
protected Coder<Void> |
getDefaultOutputCoder()
Returns the default
Coder to use for the output of this
single-output PTransform. |
String |
getFilenamePrefix() |
String |
getFilenameSuffix() |
int |
getNumShards() |
org.apache.avro.Schema |
getSchema() |
String |
getShardNameTemplate()
Returns the current shard name template string.
|
String |
getShardTemplate() |
Class<T> |
getType() |
AvroIO.Write.Bound<T> |
named(String name)
Returns a new AvroIO.Write PTransform that's like this one but
with the given step name.
|
boolean |
needsValidation() |
AvroIO.Write.Bound<T> |
to(String filenamePrefix)
Returns a new AvroIO.Write PTransform that's like this one but
that writes to the file(s) with the given filename prefix.
|
AvroIO.Write.Bound<T> |
withNumShards(int numShards)
Returns a new AvroIO.Write PTransform that's like this one but
that uses the provided shard count.
|
AvroIO.Write.Bound<T> |
withoutSharding()
Returns a new AvroIO.Write PTransform that's like this one but
that forces a single file as output.
|
AvroIO.Write.Bound<T> |
withoutValidation()
Returns a new TextIO.Write PTransform that's like this one but
that has GCS output path validation on pipeline creation disabled.
|
<X> AvroIO.Write.Bound<X> |
withSchema(Class<X> type)
Returns a new AvroIO.Write PTransform that's like this one but
that writes to Avro file(s) containing records whose type is the
specified Avro-generated class.
|
AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> |
withSchema(org.apache.avro.Schema schema)
Returns a new AvroIO.Write PTransform that's like this one but
that writes to Avro file(s) containing records of the specified
schema.
|
AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> |
withSchema(String schema)
Returns a new AvroIO.Write PTransform that's like this one but
that writes to Avro file(s) containing records of the specified
schema in a JSON-encoded string form.
|
AvroIO.Write.Bound<T> |
withShardNameTemplate(String shardTemplate)
Returns a new AvroIO.Write PTransform that's like this one but
that uses the given shard name template.
|
AvroIO.Write.Bound<T> |
withSuffix(String filenameSuffix)
Returns a new AvroIO.Write PTransform that's like this one but
that writes to the file(s) with the given filename suffix.
|
getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, toString, validatepublic AvroIO.Write.Bound<T> named(String name)
public AvroIO.Write.Bound<T> to(String filenamePrefix)
See Write.to(String) for more information.
Does not modify this object.
public AvroIO.Write.Bound<T> withSuffix(String filenameSuffix)
Does not modify this object.
ShardNameTemplatepublic AvroIO.Write.Bound<T> withNumShards(int numShards)
Constraining the number of shards is likely to reduce the performance of a pipeline. Setting this value is not recommended unless you require a specific number of output files.
Does not modify this object.
numShards - the number of shards to use, or 0 to let the system
decide.ShardNameTemplatepublic AvroIO.Write.Bound<T> withShardNameTemplate(String shardTemplate)
Does not modify this object.
ShardNameTemplatepublic AvroIO.Write.Bound<T> withoutSharding()
This is a shortcut for
.withNumShards(1).withShardNameTemplate("")
Does not modify this object.
public <X> AvroIO.Write.Bound<X> withSchema(Class<X> type)
X - the type of the elements of the input PCollectionpublic AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> withSchema(org.apache.avro.Schema schema)
public AvroIO.Write.Bound<org.apache.avro.generic.GenericRecord> withSchema(String schema)
public AvroIO.Write.Bound<T> withoutValidation()
This can be useful in the case where the GCS output location does not exist at the pipeline creation time, but is expected to be available at execution time.
public PDone apply(PCollection<T> input)
PTransformPTransform on the given InputT, and returns its
Output.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
The default implementation throws an exception. A derived class must
either implement apply, or else each runner must supply a custom
implementation via
PipelineRunner.apply(com.google.cloud.dataflow.sdk.transforms.PTransform<InputT, OutputT>, InputT).
apply in class PTransform<PCollection<T>,PDone>public String getShardNameTemplate()
protected Coder<Void> getDefaultOutputCoder()
PTransformCoder to use for the output of this
single-output PTransform.
By default, always throws
getDefaultOutputCoder in class PTransform<PCollection<T>,PDone>public String getFilenamePrefix()
public String getShardTemplate()
public int getNumShards()
public String getFilenameSuffix()
public org.apache.avro.Schema getSchema()
public boolean needsValidation()