Adding stream_results=True (execution_options) to kedro.extras.datasets.pandas.SQLQueryDataSet

880 Views Asked by At

Is it possible to add execution_options to kedro.extras.datasets.pandas.SQLQueryDataSet?

For example, I would like to add stream_results=True to the connection string.

engine = create_engine( "postgresql://postgres:pass@localhost/example" ) conn = engine.connect().execution_options(stream_results=True)

Here is my catalog.yml

table_name:
  type: pandas.SQLQueryDataSet
  credentials: creds
  sql: select * from table
  load_args:
    chunksize: 1000

Any idea on how to add/edit execution_options using pandas.SQLQueryDataSet? Specifically, stream_results=True.

1

There are 1 best solutions below

0
On

You would likely need to create a thin layer over the existing SQLQueryDataSet:

class CustomSQLQueryDataSet(kedro.extras.datasets.SQLQueryDataSet):
  def _load(self, *args, **kwargs):
      self._load_args["con"] = create_engine(self._load_args["con"]).connect().execution_options(stream_results=True)
      return super()._load(*args, **kwargs)

and then use this class in your catalog.