Is it possible to use mlxtend's fpgrowth() in Snowpark without transforming data to Pandas DF?

175 Views Asked by At

I've been trying to get the Market Basket Analysis done with FP-Growth algorithm with fpgrowth function from mlxtend library available in Snowpark.

It works with smaller datasets but fails for the whole dataset (over 4 million rows and 6000 columns of one-hot encoding). It exits with error:

Function available memory exhausted. Please visit https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-designing.html#memory for help.

The log says that the Python sandbox max memory usage was 80 GB. I tried Snowpark-optimized XL and 2XL warehouses but it failed with the same max memory usage.

Is it possible to provide Snowpark DataFrame to mlxtend's fpgrowth function without converting it first to Pandas DF which makes it fail because of the data size?

1

There are 1 best solutions below

0
Teej On

Snowflake does not natively support the fpgrowth algorithm.

Consider this answer which may fix your out-of-memory errors with fpgrowth.