I'm encountering an error while running my Python code and need assistance in resolving it. Below are the details of the :
import pandas as pd
df_list = []
file_path = 'houses.txt'
for chunk in pd.read_csv(file_path, chunksize=1000000, names=['Size()sqft', 'No of bedrooms', 'No of floors', 'Age of home', 'Price(1000s dollar)']):
df_list.append(chunk)
df = pd.concat(df_list)
print(df_list)
Output :
0 952.0 2.0 1.0 65.0 271.5
1 1244.0 3.0 1.0 64.0 300.0
2 1947.0 3.0 2.0 17.0 509.8
3 1725.0 3.0 2.0 42.0 394.0
4 1959.0 3.0 2.0 15.0 540.0
.. ... ... ... ... ...
95 1224.0 2.0 2.0 12.0 329.0
96 1432.0 2.0 1.0 43.0 388.0
97 1660.0 3.0 2.0 19.0 390.0
98 1212.0 3.0 1.0 20.0 356.0
99 1050.0 2.0 1.0 65.0 257.8
[100 rows x 5 columns]]
After removing 'chunksize'. I get this error:
TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid
Kindly explain what's the issue
chunksizeimpliesiterableand is what changes the return type ofread_csvto be aTextFileReaderobject that you're iterating over.When
chunksizeis not specified, it returns aDataFrameinstead.So, if you remove
chunksize,iterableis no longer implied, and the object that you end up iterating over in yourforloop will be no longer be aTextFileReaderobject, and the object types in yourdf_listwill also be different as a consequence, ultimately causing the error when you callpd.concaton that list.Chunking the file is useful in some cases where responsiveness or memory are concerns or when the full df is not needed, but if you don't need to chunk the file and want the full dataframe anyhow, you can skip the iteration and subsequent concatenation and just read the whole file into a dataframe in one step: