Reading large volume data from Teradata using Dask cluster/Teradatasql and sqlalchemy

291 Views Asked by Reetesh Nigam At 27 July 2025 at 16:57

I need to read large volume data(app. 800M records) from teradata, my code is working fine for a million record. for larger sets its taking time to build metadata. Could someone please suggest how to make it faster. Below is the code snippet which I am using for my application.

def get_partitions(num_partitions):
    list_range =[]
    initial_start=0
    for i in range(num_partitions):
        amp_range = 3240//num_partitions
        start = (i*amp_range+1)*initial_start
        end   = (i+1)*amp_range
        list_range.append((start,end))
        initial_start = 1
    return list_range

@delayed
def load(query,start,end,connString):
    df = pd.read_sql(query.format(start, end),connString)
    engine.dispose()
    return df

connString = "teradatasql://{user}:{password}@{hostname}/?logmech={logmech}&encryptdata=true"

results = from_delayed([load(query,start, end,connString) for start,end in get_partitions(num_partitions)])

Original Q&A

There are 1 best solutions below

mdurant On 20 December 2020 at 17:39

The build time is probably taken in finding out the metadata of your table. This is done by fetching the whole of the first partition and analysing it.

You would be better off either specifying it explcitly, if you know the dtypes upfront, e.g., {col: dtype, ...} for all the columns, or generating it from a separate query that you limit to just as many rows as it takes to be sure you have the right types:

meta = dask.compute(load(query, 0,10 ,connString))

results = from_delayed(
    [
        load(query,start, end,connString) for start,end in 
        get_partitions(num_partitions)
    ],
    mete=meta.loc[:0, :]  # zero-length version of table
)

Reading large volume data from Teradata using Dask cluster/Teradatasql and sqlalchemy

There are 1 best solutions below

Related Questions in PYTHON-3.X

Related Questions in DASK

Related Questions in DASK-DISTRIBUTED

Related Questions in DASK-DELAYED

Related Questions in TERADATASQL

Trending Questions

Popular # Hahtags

Popular Questions