Using idxmax and idxmin to change values in different rows

291 Views Asked by At

I am trying find the cleanest, most pandastic way to create a new column that has the minimum values from one column in the same row as the maximum values in another column. The rest of the values can be nan as I will be interpolating.

rng = pd.date_range(start=datetime.date(2020,8,1), end=datetime.date(2020,8,3), freq='H')
df = pd.DataFrame(rng, columns=['date'])
df.index=pd.to_datetime(df['date'])
df.drop(['date'],axis=1,inplace=True)
df['val0']=np.random.randint(0,50,49)
df['val1']=np.random.randint(0,50,49)

One realization of df (cut and paste for reproducability):

                     val0  val1
date                           
2020-08-01 00:00:00    17     4
2020-08-01 01:00:00    89     0
2020-08-01 02:00:00    85    48
2020-08-01 03:00:00    83    13
2020-08-01 04:00:00    56    65
2020-08-01 05:00:00    48    31
2020-08-01 06:00:00    55    11
2020-08-01 07:00:00    15    87
2020-08-01 08:00:00    92    70
2020-08-01 09:00:00    95    57
2020-08-01 10:00:00    68    79
2020-08-01 11:00:00    87     7
2020-08-01 12:00:00    43    15
2020-08-01 13:00:00    23     4
2020-08-01 14:00:00    68    13
2020-08-01 15:00:00    68    63
2020-08-01 16:00:00    28    86
2020-08-01 17:00:00    12    40
2020-08-01 18:00:00    51    20
2020-08-01 19:00:00    20    48
2020-08-01 20:00:00    79    78
2020-08-01 21:00:00    67    89
2020-08-01 22:00:00    46    52
2020-08-01 23:00:00     7    47
2020-08-02 00:00:00    14    73
2020-08-02 01:00:00    70    30
2020-08-02 02:00:00     2    39
2020-08-02 03:00:00    65    81
2020-08-02 04:00:00    65     8
2020-08-02 05:00:00    83    60
2020-08-02 06:00:00     1    64
2020-08-02 07:00:00    13    63
2020-08-02 08:00:00    45    78
2020-08-02 09:00:00    83     7
2020-08-02 10:00:00    75     0
2020-08-02 11:00:00    52     3
2020-08-02 12:00:00    59    34
2020-08-02 13:00:00    54    57
2020-08-02 14:00:00    90    66
2020-08-02 15:00:00    82    56
2020-08-02 16:00:00     9     2
2020-08-02 17:00:00     5    51
2020-08-02 18:00:00    67    96
2020-08-02 19:00:00    18    77
2020-08-02 20:00:00    28    89
2020-08-02 21:00:00    96    53
2020-08-02 22:00:00    28    46
2020-08-02 23:00:00    41    87
2020-08-03 00:00:00    26    47

Now I find idxmax for and idxmin:

minidx=df.groupby(pd.Grouper(freq='D')).idxmin()
maxidx=df.groupby(pd.Grouper(freq='D')).idxmax()

minidx:

                          val0                val1
date                                              
2020-08-01 2020-08-01 23:00:00 2020-08-01 01:00:00
2020-08-02 2020-08-02 06:00:00 2020-08-02 10:00:00
2020-08-03 2020-08-03 00:00:00 2020-08-03 00:00:00

maxidx:

                          val0                val1
date                                              
2020-08-01 2020-08-01 09:00:00 2020-08-01 21:00:00
2020-08-02 2020-08-02 21:00:00 2020-08-02 18:00:00
2020-08-03 2020-08-03 00:00:00 2020-08-03 00:00:00

In this case, I would like to put the minimum daily value (7) located at 2020-08-01 23:00:00 into a new column at 2020-08-01 21:00:00 (i.e. adjacent to 89, the daily max of val1), and do the same for all other dates so the 'new' value on 2020-08-02 18:00:00 will be 1 (i.e. the minimum daily value occurring on 2020-08-02 06:00:00).

I tried the following, but I just get a bunch of nans:

df.loc[maxidx['val1'].values,'new']=df.loc[minidx['val0'].values,'val0']

If I just set it to an int (df.loc[maxidx['val1'].values,'new']=6), I get the int in the places I want the new values. The values I want are give by df.loc[minidx['val0'].values,'val0'], but I can't seem to get them into the dataframe.

minidx['val0'].values and maxidx['val1'].values are arrays of the same size with elements of type numpy.datetime64, and they are all generated from the same dataframe so maxidx and minidx should exist in df.index (df.index.values).

Is there an obvious reason this isn't working? Thanks

2

There are 2 best solutions below

0
On

The simplest solution I have found is to loop through the idxmin and idxmax:

for v0,v1 in zip(minidx['val0'].values,maxidx['val1'].values):
    df.loc[v1,'new']=df.loc[v0,'val0']

This gives me what I want, but doesn't seem very pandastic, so any other suggestions to accomplish the same thing would be great.

0
On

IIUC, you can do this using NamedAgg:

df.groupby(pd.Grouper(freq='D')).agg(val0_min_time=('val0','idxmin'),
                                     val0_min_value=('val0','min'),
                                     val0_max_time=('val0','idxmax'),
                                     val0_max_value=('val0','max'),
                                     val1_min_time=('val1','idxmin'),
                                     val1_min_value=('val1','min'),
                                     val1_max_time=('val1','idxmax'),
                                     val1_max_value=('val1','max'),)

Output:

                 val0_min_time  val0_min_value       val0_max_time  val0_max_value       val1_min_time  val1_min_value       val1_max_time  val1_max_value
date                                                                                                                                                      
2020-08-01 2020-08-01 23:00:00               7 2020-08-01 09:00:00              95 2020-08-01 01:00:00               0 2020-08-01 21:00:00              89
2020-08-02 2020-08-02 06:00:00               1 2020-08-02 21:00:00              96 2020-08-02 10:00:00               0 2020-08-02 18:00:00              96
2020-08-03 2020-08-03 00:00:00              26 2020-08-03 00:00:00              26 2020-08-03 00:00:00              47 2020-08-03 00:00:00              47