I've been trying to set up Catboost to work with Pyspark in a Colab notebook (specifically a Kaggle integrated notebook).

As a starting point I've pip installed pyspark 3.1 and copied the "Binary Classification" quickstart code from the (impressively detailed) catboost documentation.

When I run it, I get a few things come up. A warning which may or may not be relevant

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/conda/lib/python3.10/site-packages/pyspark/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

A bunch of what seem to be success printouts, another warning which might be relevant

23/11/16 08:27:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

And then the error I mention in the title of this post;

TypeError                                 Traceback (most recent call last)
Cell In[1], line 42
     39 classifier = catboost_spark.CatBoostClassifier()
     41 # train a model
---> 42 model = classifier.fit(trainPool, eval_set=[evalPool])
     44 # apply the model
     45 predictions = model.transform(evalPool.data)

TypeError: CatBoostClassifier.fit() got an unexpected keyword argument 'eval_set'

I'm posting here based on guidance from the catboost github thanks in advance for any help you can give!

0

There are 0 best solutions below