I have successfully trained a model scikit on ML Engine. I can get the model.joblib file from my Cloud Storage bucket and load it, and also get local predictions using gcloud. However I can't create a model version.
I using sklearn_crfsuite estimator
crf = sklearn_crfsuite.CRF(
algorithm='lbfgs',
c1=0.1,
c2=0.1,
max_iterations=2,
all_possible_transitions=True
)
I'm saving the model as described below:
model = 'model.joblib'
joblib.dump(crf, model)
my setup.py to train is:
'''Cloud ML Engine package configuration.'''
from setuptools import setup, find_packages
REQUIRED_PACKAGES = ['joblib==0.13.0',
'sklearn-crfsuite==0.3.6',
'sklearn==0.0'
]
setup(name='trainer',
version='1.0',
packages=find_packages(),
include_package_data=True,
install_requires=REQUIRED_PACKAGES)
I submit package train:
gcloud ml-engine jobs submit training train_$JOB_NAME \
--runtime-version 1.8 \
--python-version 2.7 \
--job-dir=gs://$BUCKET_NAME/jobs/$JOB_NAME/ \
--package-path= trainer \
--module-name trainer.model \
--region $REGION \
--scale-tier BASIC \
-- \
--train-data-dir=gs://$BUCKET_NAME/dataset \
--job-dir=gs://$BUCKET_NAME/jobs/$JOB_NAME
The model is trained and exported in job-dir, but when to deploy:
gcloud alpha ml-engine versions create v1 --model teste --origin \
$ORI --python-version 2.7 --runtime-version 1.8 --framework scikit-learn
it reports this error:
ERROR: (gcloud.alpha.ml-engine.versions.create) Bad model detected with error: "Failed to load model: Could not load the model: /tmp/model/0001/model.joblib. No module named sklearn_crfsuite.estimator. (Error code: 0)"
Could you verify that you have the directory structure correct?
You do not need to include sklearn in your setup.py, since it is provided by the framework. To avoid confusion, please remove it from REQUIRED_PACKAGES.
You can verify that your setup.py is correct by seeing if moving
import joblib
to be before the import tosklearn-crfsuite
worksMake sure setup.py is parallel to trainer (i.e. one directory up from model.py). See this GitHub repo for an example:
https://github.com/GoogleCloudPlatform/training-data-analyst/tree/master/blogs/sklearn/babyweight