write a apache crunch Pcollection to multiple output files

136 Views Asked by At

I have a crunch dofn which generates a Pcollection currently i m writing the pcollection to a single avro file i want to write the Pcollection to multiple files.


 PCollection<String> generatedResults = results.parallelDo(new AvroGeneratorDofn(count),Avros.specifics(String.class));
    //generatedResults.write(To.avroFile(outputPath));
    pipeline.write(generatedResults,new AvroFileTarget(outputPath), Target.WriteMode.APPEND);
1

There are 1 best solutions below

0
sudeep On

The same PCollection can be written to any number of targets,

generatedResults.write(To.avroFile(outputPath));
generatedResults.write(new AvroFileTarget(outputPath), Target.WriteMode.APPEND);

See Apache Crunch - Getting Started:

Just as a single Pipeline instance can read data from multiple Sources, a Pipeline may also write multiple outputs for each PCollection.