Reindex pivot table to add missing days and times

462 Views Asked by At

I try to make a heatmap out of an activitylog using pandas pivottable. To add missing times and dates I use the reindex function but I fail to reindex by columns.

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'Date':['2015-11-10 18:39:00','2015-11-14 11:30:00','2015-11-14 13:48:00','2015-11-15 16:06:00',
    '2015-11-16 21:08:00','2020-08-31 18:06:00','2020-09-18 11:17:00','2020-09-19 15:41:00',
    '2020-09-25 16:28:00','2020-09-25 19:45:00']})
df['Date'] = pd.to_datetime(df['Date'])

    
hm = df.pivot_table(columns=df['Date'].dt.hour, index=df['Date'].dt.day_name(), aggfunc='count', fill_value=0, dropna=False)
hm = hm.reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], axis='index')
hm = hm.reindex(range(0,24), axis='columns', fill_value=0)

figg, ax2 = plt.subplots(figsize=(20,3))
im = ax2.imshow(hm)

I get the following error message

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

The expected output should be

           0   1   2   3   4   5   6   7   8   9       20  21  22  23
Monday     0   0   0   0   0   0   0   0   0   0  ...   0   0   0   0   
Tuesday    0   0   0   0   0   0   0   0   0   0  ...   0   0   0   0  
Wednesday  0   0   0   0   0   0   0   0   0   0  ...   0   0   0   0  
Thursday   0   0   0   0   0   0   0   0   0   0  ...   0   0   0   0  
Friday     0   0   0   0   0   0   0   0   0   0  ...   0   0   1   0   
Saturday   0   0   0   0   0   0   0   0   0   0  ...   0   1   0   0   
Sunday     0   0   0   0   0   0   0   0   0   0  ...   0   0   1   0   
2

There are 2 best solutions below

0
On BEST ANSWER

The output you're getting is a MultiIndex column index (the first level is "Date" then the second level are the hours). The reason this is happening is because you did not supply a values argument to your call to pivot_table

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'Date':['2015-11-10 18:39:00','2015-11-14 11:30:00','2015-11-14 13:48:00','2015-11-15 16:06:00',
    '2015-11-16 21:08:00','2020-08-31 18:06:00','2020-09-18 11:17:00','2020-09-19 15:41:00',
    '2020-09-25 16:28:00','2020-09-25 19:45:00']})
df['Date'] = pd.to_datetime(df['Date'])

# Note the addition of the `values` argument
hm = df.pivot_table(columns=df['Date'].dt.hour, index=df['Date'].dt.day_name(), values="Date", aggfunc='count', fill_value=0, dropna=False)

# you can also reindex both the index and columns in 1 call
hm = hm.reindex(
    index=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'],
    columns=range(24),
    fill_value=0
)

print(hm)
Date       0   1   2   3   4   5   6   7   8   9   ...  14  15  16  17  18  19  20  21  22  23
Date                                               ...                                        
Monday      0   0   0   0   0   0   0   0   0   0  ...   0   0   0   0   1   0   0   1   0   0
Tuesday     0   0   0   0   0   0   0   0   0   0  ...   0   0   0   0   1   0   0   0   0   0
Wednesday   0   0   0   0   0   0   0   0   0   0  ...   0   0   0   0   0   0   0   0   0   0
Thursday    0   0   0   0   0   0   0   0   0   0  ...   0   0   0   0   0   0   0   0   0   0
Friday      0   0   0   0   0   0   0   0   0   0  ...   0   0   1   0   0   1   0   0   0   0
Saturday    0   0   0   0   0   0   0   0   0   0  ...   0   1   0   0   0   0   0   0   0   0
Sunday      0   0   0   0   0   0   0   0   0   0  ...   0   0   1   0   0   0   0   0   0   0
0
On

Welcome to StackOverflow.

Your index here is a MultiIndex, with the first level being "Date." Pandas thinks you're trying to change this part of the index, which doesn't work because it's a string. (I think. Not sure about this.)

If you drop the first level of the index using droplevel(), you'll have a regular integer index, and it will work. I also ended up with NaNs in the result, so I added a fillna() call too.

hm = df.pivot_table(columns=df['Date'].dt.hour, index=df['Date'].dt.day_name(), aggfunc='count', fill_value=0, dropna=False)
hm = hm.reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], axis='index')
hm.columns = hm.columns.droplevel(0)
hm = hm.reindex(range(0,24), axis='columns', fill_value=0)
hm = hm.fillna(0)

I get the following plot as a result.

rendered plot