I'm using Beam (and Scio, though feel free to answer this question for PCollection
s too) to read from multiple tables in BigQuery. Because I'm reading multiple datasets from a dynamically generated list (it is itself an SCollection[String]
, where the String specifies the table name essentially), I wind up with an SCollection[SCollection[MyCoolDataType]]
.
Is there any way to flatten (union) these SCollection
objects into one? I've tried:
doubleCollection.reduce((col1, col2) => col1.union(col2))
and
sc.unionAll(doubleCollection)
but unfortunately an SCollection is not itself an iterable, so I think I may need to get more creative about mapping elements.
Flattening
SCollection[SCollection[T]]
isn't supported in scio or the underlying beam model.If you were using
FileIO
, you could useFileIO.matchAll()
followed byFileIO.readMatches()
to accept a list of file patterns and then read these into the PCollection.For BigQuery however this is not currently supported by beam (nor scio). What you can instead do is use scio's taps to materialize the dynamic list and use the result to construct a subsequent step. See example here