Is there anyway to read data from Microsoft sql directly into cudf (GPU's RAM)?

94 Views Asked by Mostafa Bouzari At 08 February 2024 at 08:22

I searched for this through internet, but couldn't find any code. What comes to mind is to first load the data into Pandas (Ram) then load it into Cudf (GPU's ram).

import cudf
from sqlalchemy import create_engine

db_url = "postgresql://username:password@localhost:5432/database_name"
engine = create_engine(db_url)

query = "SELECT * FROM your_table_name"

pandas_df = pd.read_sql(query, engine)

cudf_df = cudf.DataFrame.from_pandas(pandas_df)
print(cudf_df)

However with this approach while in WSL2 Environment, it takes longer to load the data and after the operation we still have the loaded data in ram (pandas Dataframe) which we need to drop.

is there more efficient way do achieve this?

Original Q&A

There are 1 best solutions below

Bradley Dice On 08 February 2024 at 17:07

Thanks for the question. cuDF doesn't have a way to get transactions from external SQL databases directly to GPU. Your solution with pandas.read_sql is a reasonable choice.

If you are able to export the database to a Parquet file, you could use cudf's GPU-accelerated Parquet reader to load the data very quickly. That would probably give nearly-optimal performance. Also be aware of features like GPUDirect Storage that can boost performance on supported GPUs:

You can also use dask-sql (https://dask-sql.readthedocs.io/en/latest/) to run a SQL query directly on the GPU. dask-sql supports cuDF for GPU acceleration. You would load the data some other way (such as via pandas.read_sql or cudf.read_parquet), and then use dask-sql to execute queries on the cudf DataFrame in memory.

Is there anyway to read data from Microsoft sql directly into cudf (GPU's RAM)?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in BIGDATA

Related Questions in CUDF

Trending Questions

Popular # Hahtags

Popular Questions