I have two big csv files with different number of rows which I am importing as follows:
tdata = pd.read_csv(tfilepath, sep=',', parse_dates=['date_1'])
print(tdata.iloc[:, [0,3]])
TBA date_1
0 0 2010-01-04
1 9 2010-01-05
2 0 2010-01-06
3 8 2010-01-07
4 0 2010-01-08
5 0 2010-01-09
pdata = pd.read_csv(pfilepath, sep=',', parse_dates=['date_2'])
print(pdata.iloc[:, [0,3]])
TBA date_2
0 3 2011-01-04
1 5 2010-01-09
2 0 2012-02-03
3 9 2010-03-17
4 1 2010-11-08
5 2 2010-01-05
Now I want to replace TBA in the first dataframe with corresponding TBA in the second dataframe where the dates match. The default value would be 0. So I am iterating through rows as follows:
for i, row1 in tdata.iterrows():
for j, row2 in pdata.iterrows():
if row1['date_1'] == row2['date_2']:
tdata.loc[i, 'TBA'] = row2['TBA']
break
else:
tdata.loc[i, 'TBA'] = 0
Problem is this takes very long (around 11 minutes). I want to compare one csv with 160 other csv and further run some tree based models. I am a newbee with little coding background! Pardon me if this is a 'dirty' way. Any help would be appreciated. Thanks!
If you call
set_index
onpdata
todate_2
then you can pass this as the param tomap
and call this ontdata['date_1']
column and thenfillna
: