I am trying find the cleanest, most pandastic way to create a new column that has the minimum values from one column in the same row as the maximum values in another column. The rest of the values can be nan as I will be interpolating.
rng = pd.date_range(start=datetime.date(2020,8,1), end=datetime.date(2020,8,3), freq='H')
df = pd.DataFrame(rng, columns=['date'])
df.index=pd.to_datetime(df['date'])
df.drop(['date'],axis=1,inplace=True)
df['val0']=np.random.randint(0,50,49)
df['val1']=np.random.randint(0,50,49)
One realization of df (cut and paste for reproducability):
val0 val1
date
2020-08-01 00:00:00 17 4
2020-08-01 01:00:00 89 0
2020-08-01 02:00:00 85 48
2020-08-01 03:00:00 83 13
2020-08-01 04:00:00 56 65
2020-08-01 05:00:00 48 31
2020-08-01 06:00:00 55 11
2020-08-01 07:00:00 15 87
2020-08-01 08:00:00 92 70
2020-08-01 09:00:00 95 57
2020-08-01 10:00:00 68 79
2020-08-01 11:00:00 87 7
2020-08-01 12:00:00 43 15
2020-08-01 13:00:00 23 4
2020-08-01 14:00:00 68 13
2020-08-01 15:00:00 68 63
2020-08-01 16:00:00 28 86
2020-08-01 17:00:00 12 40
2020-08-01 18:00:00 51 20
2020-08-01 19:00:00 20 48
2020-08-01 20:00:00 79 78
2020-08-01 21:00:00 67 89
2020-08-01 22:00:00 46 52
2020-08-01 23:00:00 7 47
2020-08-02 00:00:00 14 73
2020-08-02 01:00:00 70 30
2020-08-02 02:00:00 2 39
2020-08-02 03:00:00 65 81
2020-08-02 04:00:00 65 8
2020-08-02 05:00:00 83 60
2020-08-02 06:00:00 1 64
2020-08-02 07:00:00 13 63
2020-08-02 08:00:00 45 78
2020-08-02 09:00:00 83 7
2020-08-02 10:00:00 75 0
2020-08-02 11:00:00 52 3
2020-08-02 12:00:00 59 34
2020-08-02 13:00:00 54 57
2020-08-02 14:00:00 90 66
2020-08-02 15:00:00 82 56
2020-08-02 16:00:00 9 2
2020-08-02 17:00:00 5 51
2020-08-02 18:00:00 67 96
2020-08-02 19:00:00 18 77
2020-08-02 20:00:00 28 89
2020-08-02 21:00:00 96 53
2020-08-02 22:00:00 28 46
2020-08-02 23:00:00 41 87
2020-08-03 00:00:00 26 47
Now I find idxmax for and idxmin:
minidx=df.groupby(pd.Grouper(freq='D')).idxmin()
maxidx=df.groupby(pd.Grouper(freq='D')).idxmax()
minidx:
val0 val1
date
2020-08-01 2020-08-01 23:00:00 2020-08-01 01:00:00
2020-08-02 2020-08-02 06:00:00 2020-08-02 10:00:00
2020-08-03 2020-08-03 00:00:00 2020-08-03 00:00:00
maxidx:
val0 val1
date
2020-08-01 2020-08-01 09:00:00 2020-08-01 21:00:00
2020-08-02 2020-08-02 21:00:00 2020-08-02 18:00:00
2020-08-03 2020-08-03 00:00:00 2020-08-03 00:00:00
In this case, I would like to put the minimum daily value (7) located at 2020-08-01 23:00:00 into a new column at 2020-08-01 21:00:00 (i.e. adjacent to 89, the daily max of val1), and do the same for all other dates so the 'new' value on 2020-08-02 18:00:00 will be 1 (i.e. the minimum daily value occurring on 2020-08-02 06:00:00).
I tried the following, but I just get a bunch of nans:
df.loc[maxidx['val1'].values,'new']=df.loc[minidx['val0'].values,'val0']
If I just set it to an int (df.loc[maxidx['val1'].values,'new']=6
), I get the int in the places I want the new values. The values I want are give by df.loc[minidx['val0'].values,'val0']
, but I can't seem to get them into the dataframe.
minidx['val0'].values
and maxidx['val1'].values
are arrays of the same size with elements of type numpy.datetime64, and they are all generated from the same dataframe so maxidx and minidx should exist in df.index (df.index.values
).
Is there an obvious reason this isn't working? Thanks
The simplest solution I have found is to loop through the idxmin and idxmax:
This gives me what I want, but doesn't seem very pandastic, so any other suggestions to accomplish the same thing would be great.