SQLAlchemy PyHive limit query result size in MB

472 Views Asked by At

I am using PyHive with SQLAlchemy DB-API (asynchronous). My Hive table has millions of records, if I execute:

SELECT * FROM table

It loads millions of records in the memory. Is there a way to limit the size of query result to certain size, let's say 5 MB?

I discard all the results more than 5 MB. Millions of record can be 100 MB but I do want to load these much data in my Python code.

Example, if:

SELECT * FROM table

1000 rows = 5 MB for table, SQLAlchemy will only return me 1000 rows.

For another query, example:

SELECT * FROM table2

here, 100,000 rows = 5 MB for table2, SQLAlchemy will return 100,000 rows.

This way I do not run out of memory on my client. Example are simple but there might be complex join SQL and I want to limit the query result size based on MB of data.

Is there a way to achieve it with SQLAlchemy or need to modify Hive & Presto server to achieve such behavior?

0

There are 0 best solutions below