After going through some previous answers I found that I could use this code to fill missing values of df1[0] which range from 340 to 515,
with open('contactasortedtest.dat', 'r') as f:
text = [line.split() for line in f]
def replace_missing(df1 , Ids ):
missing = np.setdiff1d(Ids,df1[1])
print(missing)
if len(missing) > 0 :
missing_df = pd.DataFrame(data = np.zeros( (len(missing) , 4 )))
missing_df[1] = missing
missing_df[2].replace(0 , df1[2].iloc[1] , inplace = True)
df1 = pd.concat([df1 , missing_df])
return df1
Ids = (np.arange(340.0,515.0))
final_df = df1.groupby(df1[2],as_index=True).apply(replace_missing ,Ids).reset_index(drop = True)
final_df
Through troubleshooting I found that missing = np.setdiff1d(Ids,df1[1]) does not perform. Rather return the whole array. I found many answers on this, but I couldn't work it out. Any help would be appreciated.
Sample data I used,
12 340.0 1.0 0.0
2 491.0 1.0 35.8
13 492.0 1.0 81.4
4 493.0 1.0 0.0
7 495.0 1.0 0.2
0 496.0 1.0 90.3
11 509.0 1.0 2.3
6 513.0 1.0 4.3
8 515.0 1.0 0.1
Thank you !
You can use
df['x'].fillna(0)to fill non zeros in a column