Read CSV into DataFusion DataFrame with Python

295 Views Asked by At

How can I read a CSV into a DataFusion DataFrame with datafusion-python?

Here's what I have so far:

import datafusion

ctx = datafusion.SessionContext()

I couldn't find any instructions in the docs.

I am using DataFusion v0.6.0.

1

There are 1 best solutions below

3
On BEST ANSWER

There is some documentation here - https://github.com/apache/arrow-datafusion/blob/master/docs/source/python/index.rst

Here is one of the examples:

import datafusion
from datafusion import functions as f
from datafusion import col
import pyarrow

# create a context
ctx = datafusion.SessionContext()

# register a CSV
ctx.register_csv('example', 'example.csv')

# create a new statement via SQL
df = ctx.sql("SELECT a+b, a-b FROM example")

# execute and collect the first (and only) batch
result = df.collect()[0]

assert result.column(0) == pyarrow.array([5, 7, 9])
assert result.column(1) == pyarrow.array([-3, -3, -3])

There is work under way to move the documentation to the datafusion-python repo (see https://github.com/apache/arrow-datafusion/issues/2866)