I want to scan a big-table for a list of IDs (or prefixes of IDs) (using Python HappyBase).
Is there any way to do it on server side? That is, I'd like to send a list of start/stop rows to be scanned in one API call rather than performing a long series of API calls.
Here's an example. For my_big_tables keys:
2019/1
2019/2
2019/3
...
2020/1
2020/2
2020/3
2020/4
..
In one query, I'd like to get all the records from months 1 and 2 for all years. The results should be:
2019/1
2019/2
2020/1
2020/2
Rather than using the
row_startandrow_stoparguments in Table.scan(), this may be a better fit for thefilterargument with a regular expression.See the API reference for details on the filter argument:
RowFilter is a type provided by Google's Bigtable library. Here are the docs. Assuming that the ID field you're referring to is your row key, we can use RowKeyRegexFilter to filter the IDs by the pattern you've described.
We'll start by coming up with a regular expression to match a list of IDs for the desired months. For example, if you wanted to filter year-based IDs for the months of December and January, you could use this (note that you must go from the largest number to the shortest) -- see this link to test the regular expression:
Here's an attempt to write a function that creates a Google Bigtable HappyBase scan call with an appropriate filter, where
tableis a HappyBase table andmonthsis a list of integers. Please note that I have not tested this code, but hopefully it at least gives you a starting point.