Why do I get an error when I remove Chunksize?

Question

Why do I get an error when I remove Chunksize?

28 Views Asked by Syed Mohmmad Ali Jafri At 27 March 2024 at 20:03

I'm encountering an error while running my Python code and need assistance in resolving it. Below are the details of the :

import pandas as pd

df_list = []
file_path = 'houses.txt'

for chunk in pd.read_csv(file_path, chunksize=1000000, names=['Size()sqft', 'No of bedrooms', 'No of floors', 'Age of home', 'Price(1000s dollar)']):
    df_list.append(chunk)

df = pd.concat(df_list)

print(df_list)

Output :

0        952.0             2.0           1.0         65.0                271.5
1       1244.0             3.0           1.0         64.0                300.0
2       1947.0             3.0           2.0         17.0                509.8
3       1725.0             3.0           2.0         42.0                394.0
4       1959.0             3.0           2.0         15.0                540.0
..         ...             ...           ...          ...                  ...
95      1224.0             2.0           2.0         12.0                329.0
96      1432.0             2.0           1.0         43.0                388.0
97      1660.0             3.0           2.0         19.0                390.0
98      1212.0             3.0           1.0         20.0                356.0
99      1050.0             2.0           1.0         65.0                257.8

[100 rows x 5 columns]]

After removing 'chunksize'. I get this error:

TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid

Kindly explain what's the issue

Original Q&A

There are 1 best solutions below

**sytech** · Answer 1 · 2024-03-27T20:21:17.483000

chunksize implies iterable and is what changes the return type of read_csv to be a TextFileReader object that you're iterating over.

chunksize : int, optional
Number of lines to read from the file per chunk. Passing a value will cause the function to return a TextFileReader object for iteration. See the IO Tools docs for more information on iterator and chunksize.

When chunksize is not specified, it returns a DataFrame instead.

So, if you remove chunksize, iterable is no longer implied, and the object that you end up iterating over in your for loop will be no longer be a TextFileReader object, and the object types in your df_list will also be different as a consequence, ultimately causing the error when you call pd.concat on that list.

Chunking the file is useful in some cases where responsiveness or memory are concerns or when the full df is not needed, but if you don't need to chunk the file and want the full dataframe anyhow, you can skip the iteration and subsequent concatenation and just read the whole file into a dataframe in one step:

df = pd.read_csv(file_path, names=['Size()sqft', 'No of bedrooms', 'No of floors', 'Age of home', 'Price(1000s dollar)'])

Why do I get an error when I remove Chunksize?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in CSV

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions