mleap AttributeError: 'Pipeline' object has no attribute 'serializeToBundle'

Question

mleap AttributeError: 'Pipeline' object has no attribute 'serializeToBundle'

1.9k Views Asked by Ollie At 18 September 2017 at 20:36

I am having problems executing the example code from the mleap repository. I wish to run the code in a script instead of a jupyter notebook (which is how the example is run). My script is as follows:

##################################################################################
# start a local spark session
# https://spark.apache.org/docs/0.9.0/python-programming-guide.html
##################################################################################
from pyspark import SparkContext, SparkConf
conf = SparkConf()

#set app name
conf.set("spark.app.name", "train classifier")
#Run Spark locally with as many worker threads as logical cores on your machine (cores X threads).
conf.set("spark.master", "local[*]")
#number of cores to use for the driver process (only in cluster mode)
conf.set("spark.driver.cores", "1")
#Limit of total size of serialized results of all partitions for each Spark action (e.g. collect)
conf.set("spark.driver.maxResultSize", "1g")
#Amount of memory to use for the driver process
conf.set("spark.driver.memory", "1g")
#Amount of memory to use per executor process (e.g. 2g, 8g).
conf.set("spark.executor.memory", "2g")

#pass configuration to the spark context object along with code dependencies
sc = SparkContext(conf=conf)
from pyspark.sql.session import SparkSession
spark = SparkSession(sc)
##################################################################################


import mleap.pyspark

# # Imports MLeap serialization functionality for PySpark
from mleap.pyspark.spark_support import SimpleSparkSerializer

# Import standard PySpark Transformers and packages
from pyspark.ml.feature import VectorAssembler, StandardScaler, OneHotEncoder, StringIndexer
from pyspark.ml import Pipeline, PipelineModel
from pyspark.sql import Row

# Create a test data frame
l = [('Alice', 1), ('Bob', 2)]
rdd = sc.parallelize(l)
Person = Row('name', 'age')
person = rdd.map(lambda r: Person(*r))
df2 = spark.createDataFrame(person)
df2.collect()

# Build a very simple pipeline using two transformers
string_indexer = StringIndexer(inputCol='name', outputCol='name_string_index')

feature_assembler = VectorAssembler(
    inputCols=[string_indexer.getOutputCol()], outputCol="features")

feature_pipeline = [string_indexer, feature_assembler]

featurePipeline = Pipeline(stages=feature_pipeline)

featurePipeline.fit(df2)

featurePipeline.serializeToBundle("jar:file:/tmp/pyspark.example.zip")

On executing spark-submit script.py I get the following error:

17/09/18 13:26:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
  File "/Users/opringle/Documents/Repos/finn/Magellan/src/no_spark_predict.py", line 58, in <module>
    featurePipeline.serializeToBundle("jar:file:/tmp/pyspark.example.zip")
AttributeError: 'Pipeline' object has no attribute 'serializeToBundle'

Any help would be much appreciated! I have installed mleap from pypy.

Original Q&A

There are 3 best solutions below

MaFF On 19 September 2017 at 07:37

It seems you didn't follow the steps correctly, here http://mleap-docs.combust.ml/getting-started/py-spark.html it says that

Note: the import of mleap.pyspark needs to happen before any other PySpark libraries are imported.

Hence try importing your SparkContext after mleap

Ollie On 17 October 2017 at 16:49

I have solved the issue by attaching the following jar file when running:

spark-submit --packages ml.combust.mleap:mleap-spark_2.11:0.8.1  script.py

**cappaberra** · Accepted Answer · 2018-05-15T19:56:26.917000

cappaberra On 15 May 2018 at 19:56 BEST ANSWER

See Here

It seems MLeap isn't ready for Spark 2.3 yet. If you happen to be running Spark 2.3, try downgrading to 2.2 and retry. Hopefully, that helps!

mleap AttributeError: 'Pipeline' object has no attribute 'serializeToBundle'

There are 3 best solutions below

Related Questions in PYTHON

Related Questions in PYSPARK

Related Questions in MLEAP

Trending Questions

Popular # Hahtags

Popular Questions