how to fetch data in a batch from hbase in Geomesa?

139 Views Asked by At

GeoTools api is one way for Geomesa ingest method to get data from Hbase, but when I use org.geotools.data.simple.SimpleFeatureCollection, it seems that only a Iterator can be manipulated by SimpleFeatureCollection.features(), one problem occurs in which when I want to traverse the results , the iterator.hasNext() method costs too much time, Can I fetch data in a batch way from hbase in Geomesa not only by the Iterator?

1

There are 1 best solutions below

0
On

Behind the scenes, there is some batching being done, but the batches are fetched lazily (i.e. on a call to hasNext, if there isn't any local data it will do a remote fetch). You can control the HBase read-ahead through the system property geomesa.hbase.client.scanner.caching.size (see here). The GeoTools API doesn't provide any batch mechanisms per-say, however.

For simple use cases, if you just want to fetch everything up front, you can pull the iterator into an ArrayList, then operate on it afterwards. To avoid waiting for the entire result set to be fetched, you could set up producer/consumer threads, so that one thread is continuously pre-fetching data and the second thread is operating on the results that have come back.

For more advanced use cases, you can use Spark (or map/reduce directly) to load an entire result set at once.