I've been trying to get the Market Basket Analysis done with FP-Growth algorithm with fpgrowth function from mlxtend library available in Snowpark.
It works with smaller datasets but fails for the whole dataset (over 4 million rows and 6000 columns of one-hot encoding). It exits with error:
Function available memory exhausted. Please visit https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-designing.html#memory for help.
The log says that the Python sandbox max memory usage was 80 GB. I tried Snowpark-optimized XL and 2XL warehouses but it failed with the same max memory usage.
Is it possible to provide Snowpark DataFrame to mlxtend's fpgrowth function without converting it first to Pandas DF which makes it fail because of the data size?
Snowflake does not natively support the
fpgrowthalgorithm.Consider this answer which may fix your out-of-memory errors with fpgrowth.