I have the following dataset:
df = pd.DataFrame([
['B2', 'G2',[1291593600000000000, 1394755200000000000, 1397347200000000000,
1506816000000000000, 1509494400000000000, None]],
['B10', 'G10',[1291593600000000000, 1394755200000000000, 1460505600000000000,
1506816000000000000]],
['B14', 'G14',[1291593600000000000, 1394755200000000000, 1460505600000000000,
1506816000000000000]]],
columns= ['Baum2', 'Baum7', 'value_pair'])
The values in value_pair are dates in unix time.
What I want to do: I want to check if the difference between two specific dates in each row (let's say the third minus the second entry in each array) is more than 70 days. If that is true I want to delete the row.
I want to do this same operation in each row (which I grouped beforehand) in the column value_pair.
The Problem:
I can't calculate with the dates in unix time and later convert them with pd.to_datetime() to my desired format (as far as I know). Subtracting works but converting them doesn't work:
<class 'numpy.ndarray'> is not convertible to datetime
Second approach:
Before subtracting the dates from each other, I put them in my desired format beforehand:
#df['value_pair'] = pd.to_datetime(df['value_pair'])
#df['value_pair'] = df['value_pair'].dt.strftime('%Y-%m-%d')
The Problem:
The problem is now that after the following line I get this error: TypeError: unsupported operand type(s) for -: 'numpy.str_' and 'numpy.str_'
erg1 = df['value_pair'][0][2]-df['value_pair'][0][1]
Makes sense because I can't subtract strings from each other like that.
Right here, I am out of ideas. Does anyone know a different approach to this problem?
My code:
import pandas as pd
df = pd.DataFrame([
['B2', 'G2',[1291593600000000000, 1394755200000000000, 1397347200000000000,
1506816000000000000, 1509494400000000000, None]],
['B10', 'G10',[1291593600000000000, 1394755200000000000, 1460505600000000000,
1506816000000000000]],
['B14', 'G14',[1291593600000000000, 1394755200000000000, 1460505600000000000,
1506816000000000000]]], columns= ['Baum2', 'Baum7', 'value_pair'])
df['value_pair'] = pd.to_datetime(df['value_pair'])
df['value_pair'] = df['value_pair'].dt.strftime('%Y-%m-%d')
erg1 = df['value_pair'][0][2]-df['value_pair'][0][1]
print(df)
Firstly check Convert unix time to readable date in pandas dataframe.
pd.to_datetime(df['date'],unit='s')is the solution they use.You can do the same using
datetimeOutput:
datetime.datetime(1974, 2, 3, 16, 36)But this will not work for your
value_pairsas it has a lot of extra trailing zeros.For example with your first
value_pairentry is1291593600000000000, based on the trailing zeroes used you get different dates:Output:
Any higher trailing zeroes will give you an error. So you'll have to clean your data first and then you can convert the timestamp and get the time difference/delta