In Apache Crunch, How to find out if a PCollection or PTable has any elements in it? And if so how many?

1.5k Views Asked by At

I tried to put a break point and do the following in the watch window: check .getSize() which is supposed to return size in bytes. And .materialize() to see if I can look at the java objects.

The .getSize() does show a number >0 but I doubt if that should be an indicator of the PTable having elements. The .materialize() did not show anything to indicate the presence of elements.

Thanks in advance.

2

There are 2 best solutions below

1
On

Instead of relying on PCollection.size() method to check whether your collection is empty or not, you should use PCollection.length(), which does exactly what you need.

0
On

I have come across this issue sometimes and the API methods like materialize don't really give a satisfactory result. I would suggest to create a simple DoFn that takes this PCollection as input and use loggers to see if it has elements or not. And PCollection getSize() method will be helpful to know how many elements it has.