happybase crash when it's trying to scan a very big Hbase column

544 Views Asked by At

my code as following:

for key,data in table.scan(columns=["raw:dataInfo"]):
   count+=1
   ...

The column raw:dataInfo maybe as big as 50MB, When I ran the above code happybase crashed and threw the following exception:

Traceback (most recent call last):
  File "happybasetestscan.py", line 8, in <module>
    for key,data in table.scan(columns=["raw:sample"],limit=10):
  File "/usr/lib/python2.6/site-packages/happybase/table.py", line 374, in scan
    self.name, scan, {})
.......
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes 

Any ideas please, how to count the big column.Thanks!

1

There are 1 best solutions below

1
On BEST ANSWER

i guess the thrift server didn't answer properly. happybase reports (via the thrift library) that no data could be read from the socket.

anyway, if you want to do a full table scan in order to do a count (which is inefficient but ok), use a filter on your scan:

# Scan, get only keys (data will be empty)
scanner = table.scan(
    row_start=b'aaa',
    row_stop=b'bbb',
    filter=b'KeyOnlyFilter() AND FirstKeyOnlyFilter()',
)

for row_key, data in scanner:
    pass  # do something with row_key

See https://github.com/wbolster/happybase/issues/12#issuecomment-12754400 for more information