I am using the following bit of code to read the iris dataset from an s3 bucket.
import pandas as pd
import s3fs
s3_path = 's3://h2o-public-test-data/smalldata/iris/iris.csv'
s3 = s3fs.S3FileSystem(anon=True)
with s3.open(s3_path, 'rb') as f:
df = pd.read_csv(f, header = True)
However, the column names are just the contents of the first row of the dataset. How do I fix that?
The following changes are required:
s3://
.iris.csv
is a file without header. In case you need a file with header then you should go foriris_wheader.csv
file.read_csv
header accepts boolean valueYour final code should look something like this
Edit: You can directly read the file in pandas as follows:
You still need to install s3fs. Just that no need to open a file for accessing it.