I am using featuretools (1.1x version), I read the docs,and also searched here
but still struggle to find how to do simple things like SELECT MIN(datetime_field_1)..
I also checked list_primitives() those related to time seem not what I need,
I can do this for numeric fields, but seems can't do it on Datetime fields..
https://featuretools.alteryx.com/en/stable/
I simply want to get min(timestamp), max(timestamp) group by customer_id, but max/min primitive only works on numeric
import featuretools as ft
import pandas as pd
import numpy as np
# make some random data
n = 100
events_df = pd.DataFrame({
"id" : range(n),
"customer_id": np.random.choice(["a", "b", "c"], n),
"timestamp": pd.date_range("Jan 1, 2019", freq="1h", periods=n),
"amount": np.random.rand(n) * 100
})
def to_part_of_day(x):
if x < 12:
return "morning"
elif x < 18:
return "afternoon"
else:
return "evening"
es = ft.EntitySet(id='my_set')
es = es.add_dataframe(dataframe = events_df, dataframe_name = 'events', time_index='timestamp', index='index')
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name='events',
agg_primitives=['min','max'],
trans_primitive=[],
primitive_options={
'max':{
"include_groupby_columns":{"events":["customer_id"]}
}
}
)
How should I get max(amount), max(timestamp) by each customer_id? Thanks! Feels silly to ask such basic thing after reading featuretools.alteryx.com and their github examples..
I think you have a few issues here. First of all, the
MaxandMinprimitives only operate on numeric columns, as you mention. If your data is sorted based on the datetime value, you could use theFirstandLastaggregation primitives to get the first and last values, respectively, which will correspond toMinandMaxif sorted in ascending order. If these primitives aren't sufficient, you will need to define your own custom primitive.Second, by default, Featuretools will not return datetime values as features. In order to get those values returned, you will need to change the
return_typesin your call to DFS to include datetime values.Finally, your example above only uses a single dataframe in the
EntitySet. Featuretools aggregations are only applied when multiple dataframes are present in an EntitySet. Aggregations are performed across the defined relationship. To define features for a customer you would target the customer table and then aggregate values from other tables (like purchases for example) to get features likeMAX(purchases.amount)orFIRST(purchases.date).Here is a complete example building off your starting data:
If you want to return only certain column types you can pass a list of types to
return_typesinstead of"all". Also, if you only wantFirstandLastto apply to the datetime column you can do that by passing appropriate values toprimitive_optionsin the call to DFS. The documentation contains information on how to do that.