Iterating over PTable in crunch

395 Views Asked by At

I have following PTables,

PTable<String, String> somePTable1 = somePCollection1.parallelDo(new SomeClass(),
    Writables.tableOf(Writables.strings(), Writables.strings()));

PTable<String, Collection<String>> somePTable2 = somePTable1.collectValues();

For somePTable2 described above, I want to make a new file for every record in somePTable2, Is there any way to iterate over somePTable2 so that I can access the record.I know I can apply the DoFn on somePTable2, but is it possible to apply pipeline.write() operation in DoFn ?

1

There are 1 best solutions below

0
hlagos On

Try this to store your list as is

somePTable2.values().write()

If you want generate one record for each element in the collection inside your PTable, you will need apply a DoFn and emit one record for each element in the collection before write it.