Pandas: How to extract and calculate the number of “hour” per row in a Dataframe

212 Views Asked by At

I have a data frame representing the schedule of some restaurants in a week.

  • What i want to do is to add a column week_hours to my initial Dataframe df that represents the total number of hours the restaurant is open per week.

note : the value 0 simply means that the restaurant is closed that day.

data = {
        'restaurant_id':  ['1', '2','3'],
        'Monday':  ['11:0-20:0', '11:30-22:0','11:30-21:0'],
        'Tuesday':  ['11:0-20:0', '11:30-22:0','11:30-22:0'],
        'Wednesday':  ['11:0-20:0', '11:30-22:0','11:30-21:0'],
        'Thursday':  ['11:0-20:0', '11:30-22:0','11:30-21:0'],
        'Friday':  ['11:0-22:0', '11:30-22:0','11:30-21:0'],
        'Saturday':  ['11:0-22:0', '12:0-22:0','0'],
        'Sunday':  ['11:0-17:0', '16:30-21:30','11:30-21:0',],

        }

df = pd.DataFrame (data, columns = ['restaurant_id','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday',])
df.head()

What could be a simple syntax to achieve this?

2

There are 2 best solutions below

2
On BEST ANSWER

One way is to first transpose and then create custom function to extract the values before calculation (note that this works only if the hours doesn't go pass 24, else need Timedelta):

def find_diff(col):
    data = col.str.extract(r"(\d*):(\d*)-(\d*):(\d*)").astype(float)
    hours = (data[2] - data[0]).sum()
    minutes = (data[3] - data[1]).sum()/60
    return hours+minutes

print (df.set_index("restaurant_id").T.apply(find_diff)) 

restaurant_id
1    64.0
2    67.5
3    58.0
dtype: float64
0
On

You can use apply, see this answer for more info https://stackoverflow.com/a/40045819/14273548.

def calc_hours(ranges):
    # calc the hours in the ranges and sum

df['Weekly Hours'] = df.apply(lambda x: sum(calc_hours(x[day]) for day in ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']), axis=1)