I receive the following error once trying to display my training dataframe created by a training_set.
SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC] Execution of function mycatalog.mydatabase.product_difference_ratio_on_demand_feature(left_MaxProductAmount#6091, left_Amount#6087) failed.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 217.0 failed 4 times, most recent failure: Lost task 0.3 in stage 217.0 (TID 823) (ip-10-0-32-203.us-west-2.compute.internal executor driver): org.apache.spark.SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC] Execution of function mycatalog.mydatabase.product_difference_ratio_on_demand_feature(left_MaxProductAmount#6091, left_Amount#6087) failed.
== Error ==
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'
== Stacktrace ==
File "<udfbody>", line 5, in main
return calc_ratio_difference(max_price, transaction_amount)
File "<udfbody>", line 3, in calc_ratio_difference
return round(((n1 - n2)/n1),2) SQLSTATE: 39000
== SQL (line 1, position 1) ==
mycatalog.mydatabase.product_difference_ratio_on_demand_feature(`MaxProductAmount`, `Amount`)
Here is my training_set
from databricks.feature_engineering import FeatureEngineeringClient, FeatureFunction, FeatureLookup
fe = FeatureEngineeringClient()
training_feature_lookups = [
FeatureLookup(
table_name="transaction_count_history",
rename_outputs={
"eventTimestamp": "TransactionTimestamp"
},
lookup_key=["CustomerID"],
feature_names=["transactionCount", "isTimeout"],
timestamp_lookup_key = "TransactionTimestamp"
),
FeatureLookup(
table_name="product_3minute_max_price_ft",
rename_outputs={
"LookupTimestamp": "TransactionTimestamp"
},
lookup_key=['Product'],
timestamp_lookup_key='TransactionTimestamp'
),
FeatureFunction(
udf_name="product_difference_ratio_on_demand_feature",
input_bindings={"max_price":"MaxProductAmount", "transaction_amount":"Amount"},
output_name="MaxDifferenceRatio"
)
]
raw_transactions_df = spark.table("raw_transactions")
training_set = fe.create_training_set(
df=raw_transactions_df,
feature_lookups=training_feature_lookups,
label="Label",
exclude_columns="_rescued_data"
)
training_df = training_set.load_df()
What stands out to me is the TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'
However, everything is a float. Floats go in, and a float comes out. The function itself works fine in testing.
Nulls were created when the lookups occurred. I put a minimum timestamp on the base dataframe. This made sure no nulls were being imputed. This makes sense, given the
NoneType
error.