Join 2 panda data frames consisting of different indexes - getting Nan

36 Views Asked by At
# combine yield changes and normalised surprises 
eco_var_yields = pd.merge(norm_eco_surprise, yield__var, how = 'inner', on = 'date')
eco_var_yields_dates = pd.merge(eco_var_yields, eco_var_release_dates, how = 'inner', on = 'date')
eco_var_yields = eco_var_yields.fillna(0)

the index values are dates (date times for the above date frame)

# Vectorized check for date index match in eco_var_yields 's columns
results_df = (eco_var_release_dates.apply(lambda col: eco_var_yields.index.isin(col)).astype(int))

For context, results df consists of rows of date time values, i run a check to see if i can match a column of dates in a eco_var_release_dates, if yes then true, not not then false.

There index in this data frame consists simply of index values no date, so the indexes do not match.

I then convert the true/false to integers 1 or 0.

# join data into a single data frame
renamed_eco_var_yields_df = eco_var_yields.add_suffix('_surprises')
renamed_results_df = results_df.add_suffix('_dates')
reactivity_df = renamed_eco_var_yields_df.join(renamed_results_df)

reactivity_df shows Nan for all the values of renamed_results_df, the rest of the data is fine.

reactivity_df

            Retail_Sales_surprises  ...  Average_Hourly_Earnings_dates
date                                ...                               
2012-12-31                0.666667  ...                            NaN
2013-01-01                0.666667  ...                            NaN
2013-01-02                0.666667  ...                            NaN
2013-01-03                0.666667  ...                            NaN
2013-01-04                0.666667  ...                            NaN
                           ...  ...                            ...
2024-02-15               -2.333333  ...                            NaN
2024-02-16               -2.333333  ...                            NaN
2024-02-19               -2.333333  ...                            NaN
2024-02-20               -2.333333  ...                            NaN
2024-02-21               -2.333333  ...                            NaN

Can you anybody help?

I have tried converting to floats, or using concatenation, which does not work.

reactivity_df  = pd.concat([renamed_eco_var_yields_df , renamed_results_df ], axis=1)
reactivity_df  = reactivity_df.reindex(renamed_eco_var_yields_df.index)
0

There are 0 best solutions below