How can I find the elapsed time from a specific date to now when using Pandas Timestamp in Python?

43 Views Asked by At

I have a dataframe

company_name founded_on
a 2004-01-01 00:00:00
b 2013-01-01 00:00:00
c 2008-01-01 00:00:00
d 1997-01-01 00:00:00
today = pd.Timestamp.now()
df["founded_on"]=pd.to_datetime(df["founded_on"])
df["Time_Since_founded_on"] = (today - df["founded_on"]).dt.days // 30

OverflowError: Overflow in int64 addition

I want to know how many months each companies have been in operation.

1

There are 1 best solutions below

4
mozway On

Use monthly period operations, converting using to_period('M'):

df['Time_Since_founded_on'] = (pd.to_datetime(df['founded_on']).dt.to_period('M')
                                 .rsub(pd.Timestamp('today').to_period('M'))
                                 .apply(lambda x: x.n)
                              )

Output:

  company_name           founded_on  Time_Since_founded_on
0            a  2004-01-01 00:00:00                    242
1            b  2013-01-01 00:00:00                    134
2            c  2008-01-01 00:00:00                    194
3            d  1997-01-01 00:00:00                    326

Note that your original approach (using days) would give incorrect values for large periods since months are not exactly equal to 30D:

df['Time_Since_founded_on'] = (pd.to_datetime(df['founded_on'])
                                 .rsub(pd.Timestamp('today'))
                                 .dt.days.div(30)
                              )

  company_name           founded_on  Time_Since_founded_on
0            a  2004-01-01 00:00:00             246.066667  # +4M
1            b  2013-01-01 00:00:00             136.466667  # +2M
2            c  2008-01-01 00:00:00             197.366667  # +3M
3            d  1997-01-01 00:00:00             331.266667  # +5M

A better (but not perfect) approximation would be to divide by 365.25/12.

handling invalid/missing dates

If dates are invalid/missing, you need to adapt the code to use errors='coerce' and dropna before extracting the number of months:

df['Time_Since_founded_on'] = (pd.to_datetime(df['founded_on'], errors='coerce')
                                 .dt.to_period('M')
                                 .rsub(pd.Timestamp('today').to_period('M'))
                                 .dropna().apply(lambda x: x.n)
                              )

Output:

  company_name           founded_on  Time_Since_founded_on
0            a  2004-01-01 00:00:00                  242.0
1            b  2013-01-01 00:00:00                  134.0
2            c  2008-01-01 00:00:00                  194.0
3            d  1997-01-01 00:00:00                  326.0
4            e                  NaT                    NaN