Is it possible to use mlxtend's fpgrowth() in Snowpark without transforming data to Pandas DF?

175 Views Asked by Piotr K At 05 September 2023 at 18:03

I've been trying to get the Market Basket Analysis done with FP-Growth algorithm with fpgrowth function from mlxtend library available in Snowpark.

It works with smaller datasets but fails for the whole dataset (over 4 million rows and 6000 columns of one-hot encoding). It exits with error:

Function available memory exhausted. Please visit https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-designing.html#memory for help.

The log says that the Python sandbox max memory usage was 80 GB. I tried Snowpark-optimized XL and 2XL warehouses but it failed with the same max memory usage.

Is it possible to provide Snowpark DataFrame to mlxtend's fpgrowth function without converting it first to Pandas DF which makes it fail because of the data size?

Original Q&A

There are 1 best solutions below

Teej On 07 September 2023 at 21:43

Snowflake does not natively support the fpgrowth algorithm.

Consider this answer which may fix your out-of-memory errors with fpgrowth.

Is it possible to use mlxtend's fpgrowth() in Snowpark without transforming data to Pandas DF?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in SNOWFLAKE-CLOUD-DATA-PLATFORM

Related Questions in MARKET-BASKET-ANALYSIS

Related Questions in MLXTEND

Related Questions in FPGROWTH

Trending Questions

Popular # Hahtags

Popular Questions