Pandas: Calculate total timedelta of intermediate time entries

Question

Pandas: Calculate total timedelta of intermediate time entries

49 Views Asked by Brunch1012 At 26 March 2024 at 13:29

I have a dataframe that looks like below

Date	Name	In/Out	Time
2024-01-01	Homer	IN	07:10
2024-01-01	Homer	OUT	09:30
2024-01-01	Homer	IN	10:00
2024-01-01	Homer	OUT	16:00
2024-01-01	Marge	In	07:15
2024-01-01	Marge	Out	16:10
2024-01-01	Bart	In	07:14
2024-01-01	Bart	Out	10:00
2024-01-01	Bart	In	10:15
2024-01-01	Bart	Out	12:00
2024-01-01	Bart	In	12:30
2024-01-01	Bart	Out	17:00

My end goal is to calculate the total timedelta (total_seconds()) of intermediate IN/OUT times only, as highlighted. Expected output below.

Date	Name	TimeDelta
2024-01-01	Homer	1800
2024-01-01	Bart	2700

I haven't been to sure where to start with this one and cannot find any examples anywhere else?

example dataframe:

example_df = pd.DataFrame([
    ['2024-01-01', 'Homer', 'in', '07:30'],
    ['2024-01-01', 'Homer', 'out' ,'09:00'],
    ['2024-01-01', 'Homer', 'in' ,'09:30'],
    ['2024-01-01', 'Homer', 'out' ,'16:00'],
    ['2024-01-01', 'Marge', 'in' , '06:20'],
    ['2024-01-01', 'Marge', 'out' ,'16:00'],
    ['2024-01-01', 'Bart', 'in' ,'07:10'],
    ['2024-01-01', 'Bart', 'out' ,'08:00'],
    ['2024-01-01', 'Bart', 'in' ,'08:20'],
    ['2024-01-01', 'Bart', 'out' ,'17:00'],
    ['2024-01-01', 'Barney', 'in' ,'08:10'],
    ['2024-01-01', 'Lisa', 'in' ,'08:05'],
    ['2024-01-01', 'Lisa', 'out' ,'14:00'],
    ['2024-01-01', 'Lisa', 'in' ,'14:15'],
    ['2024-01-01', 'Lisa', 'out' ,'18:10'],
    ['2024-01-01', 'Millhouse', 'out' ,'19:10'],
    ['2024-02-01', 'Homer', 'in', '07:30'],
    ['2024-02-01', 'Homer', 'out' ,'09:00'],
    ['2024-02-01', 'Marge', 'in' , '06:30'],
    ['2024-02-01', 'Marge', 'out' ,'09:10'],
    ['2024-02-01', 'Marge', 'in' ,'10:10'],
    ['2024-02-01', 'Marge', 'out' ,'16:10'],
    ['2024-02-01', 'Bart', 'in' ,'07:10'],
    ['2024-02-01', 'Bart', 'out' ,'15:00'],
    ['2024-02-01', 'Barney', 'in' ,'08:10'],
    ['2024-02-01', 'Lisa', 'in' ,'08:05'],
    ['2024-02-01', 'Lisa', 'out' ,'16:00'],
    ['2024-02-01', 'Millhouse', 'in' ,'08:10'],
    ['2024-02-01', 'Millhouse', 'in' ,'08:10'],
    ['2024-02-01', 'Millhouse', 'in' ,'16:15']],
    columns=['Date', 'Name', 'In/Out', 'Time'])

Original Q&A

There are 2 best solutions below

Jamie On 26 March 2024 at 14:09

I would convert your dataframe so that each record was on person on day X with an in time and an out time. Then you can just calculate the delta from those two times. Here is how I would do it.

import pandas as pd
import random as rd
names=['John','Jeff']
dates=['01/01/2024','01/02/2024']
intimes=['08:00','09:00']
outtimes=['17:00','17:30']

namelist=[]
datelist=[]
inlist=[]
outlist=[]

for i in names:
    for j in dates:
        namelist.append(i)
        datelist.append(j)
        inlist.append(rd.sample(intimes,1)[0])
        outlist.append(rd.sample(outtimes,1)[0])
        
df=pd.DataFrame()
df['Name']=namelist
df['Date']=datelist
df['In']=inlist
df['Out']=outlist

df['InDateandTime']=pd.to_datetime(df['Date']+' '+df['In'])
df['OutDateandTime']=pd.to_datetime(df['Date']+' '+df['Out'])

df['TimeDelta']=df['OutDateandTime']-df['InDateandTime']


display(df)

And here is the output:

You just need to make a new dataframe that converts your records to a single record per person per day and then apply this method to that dataframe. You will need to combine the dates and times as strings first and then use pd.DateTime to convert them to DateTime. I have shown how to do this in the code. You can also drop the original 'Date', 'In' and 'Out' columns but I left them to show you exactly what I did. Much of the code is just creating a dataset that I could use as an example.

**mozway** · Accepted Answer · 2024-03-26T13:40:16.457000

Assuming Time is sorted within a group, that the first In/Out is always In, and that In/Out are always alternating.

You could convert the times to_datetime, then use groupby.apply to compute the diff, ignore the first/last value (with iloc) and sum the "IN" timedeltas before converting to total_seconds:

# cleanup IN/OUT format
df['In/Out'] = df['In/Out'].str.upper()

out = (df
  .assign(dt=pd.to_datetime(df['Time'], format='%H:%M'))
  .groupby(['Date', 'Name'])
  .apply(lambda g: 
         g['dt'].diff().iloc[1:-1]
         [g['In/Out'].eq('IN')]
         .sum().total_seconds())
  .reset_index(name='TimeDelta')
  .query('TimeDelta>0') # optional: remove rows with null TimeDelta
)

Output:

         Date       Name  TimeDelta
1  2024-01-01   J Bloggs     1800.0
2  2024-01-01  M Simpson     2700.0

NB. If any of the initial assumptions is incorrect, you just need to pre-process the data to sort it and remove invalid rows.

Example:

# cleanup IN/OUT format
df['In/Out'] = df['In/Out'].str.upper()

m1 = df.sort_values(by='Time').groupby(['Date', 'Name'])['In/Out'].shift(-1).ne(df['In/Out'])

out = (df[m1]
  .assign(dt=pd.to_datetime(df.loc[m1, 'Time'], format='%H:%M'))
  .groupby(['Date', 'Name'])
  .apply(lambda g: 
         g['dt'].diff().iloc[1:-1]
         [g['In/Out'].eq('IN')]
         .sum().total_seconds())
  .reset_index(name='TimeDelta')
  .query('TimeDelta>0') # optional: remove rows with null TimeDelta
)

Output:

          Date   Name  TimeDelta
1   2024-01-01   Bart     1200.0
2   2024-01-01  Homer     1800.0
3   2024-01-01   Lisa      900.0
10  2024-02-01  Marge     3600.0

Pandas: Calculate total timedelta of intermediate time entries

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in TIMEDELTA

Trending Questions

Popular # Hahtags

Popular Questions