I've been trying to set up Catboost to work with Pyspark in a Colab notebook (specifically a Kaggle integrated notebook).
As a starting point I've pip installed pyspark 3.1 and copied the "Binary Classification" quickstart code from the (impressively detailed) catboost documentation.
When I run it, I get a few things come up. A warning which may or may not be relevant
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/conda/lib/python3.10/site-packages/pyspark/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
A bunch of what seem to be success printouts, another warning which might be relevant
23/11/16 08:27:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
And then the error I mention in the title of this post;
TypeError Traceback (most recent call last)
Cell In[1], line 42
39 classifier = catboost_spark.CatBoostClassifier()
41 # train a model
---> 42 model = classifier.fit(trainPool, eval_set=[evalPool])
44 # apply the model
45 predictions = model.transform(evalPool.data)
TypeError: CatBoostClassifier.fit() got an unexpected keyword argument 'eval_set'
I'm posting here based on guidance from the catboost github thanks in advance for any help you can give!