I'm trying to use google cloud platform to deploy a model to support prediction.

I train the model (locally) with the following instruction

    ~/$ gcloud ml-engine local train --module-name trainer.task --package-path trainer

and everything works fine (...):

    INFO:tensorflow:Restoring parameters from gs://my-bucket1/test2/model.ckpt-45000
    INFO:tensorflow:Saving checkpoints for 45001 into gs://my-bucket1/test2/model.ckpt.
    INFO:tensorflow:loss = 17471.6, step = 45001
    [...]
    Loss: 144278.046875
    average_loss: 1453.68
    global_step: 50000
    loss: 144278.0
    INFO:tensorflow:Restoring parameters from gs://my-bucket1/test2/model.ckpt-50000
    Mean Square Error of Test Set =  593.1018482

But, when I run the following command to create a version,

    gcloud ml-engine versions create Mo1 --model mod1 --origin gs://my-bucket1/test2/ --runtime-version 1.3

Then I get the following error.

    ERROR: (gcloud.ml-engine.versions.create) FAILED_PRECONDITION: Field: version.deployment_uri 
    Error: SavedModel directory gs://my-bucket1/test2/ is expected to contain exactly one 
of: [saved_model.pb, saved_model.pbtxt].- '@type': type.googleapis.com/google.rpc.BadRequest
fieldViolations:- description: 'SavedModel directory gs://my-bucket1/test2/ is expected
  to contain exactly one of: [saved_model.pb, saved_model.pbtxt].'
field: version.deployment_uri

Here is a screenshot of my bucket. I have a saved model with 'pbtxt' format

my-bucket-image

Finally, I add the piece of code where I save the model in the bucket.

  regressor = tf.estimator.DNNRegressor(feature_columns=feature_columns,
                                    hidden_units=[40, 30, 20],
                                    model_dir="gs://my-bucket1/test2",
                                    optimizer='RMSProp'
                                    )
2

There are 2 best solutions below

2
On

You'll notice that the file in your screenshot is graph.pbtxt whereas saved_model.pb{txt} is needed.

Note that just renaming the file generally will not be sufficient. The training process outputs checkpoints periodically in case restarts happen and recovery is needed. However, those checkpoints (and corresponding graphs) are the training graph. Training graphs tend to have things like file readers, input queues, dropout layers, etc. which are not appropriate for serving.

Instead, TensorFlow requires you to explicitly export a separate graph for serving. You can do this in one of two ways:

  1. During training (typically, after training is complete)
  2. As a separate process after training.

During/After Training

For this, I'll refer you to the Census sample.

First, You'll need a "Serving Input Function", such as

def serving_input_fn():
  """Build the serving inputs."""
  inputs = {}
  for feat in INPUT_COLUMNS:
    inputs[feat.name] = tf.placeholder(shape=[None], dtype=feat.dtype)

  features = {
      key: tf.expand_dims(tensor, -1)
      for key, tensor in inputs.iteritems()
  }
  return tf.contrib.learn.InputFnOps(features, None, inputs)

The you can simply call:

regressor.export_savedmodel("path/to/model", serving_input_fn)

Or, if you're using learn_runner/Experiment, you'll need to pass an ExportStrategy like the following to the constructor of Experiment:

export_strategies=[saved_model_export_utils.make_export_strategy(
              serving_input_fn,
              exports_to_keep=1,
              default_output_alternative_key=None,
          )]

After Training

Almost exactly the same steps as above, but just in a separate Python script you can run after training is over (in your case, this is beneficial because you won't have to retrain). The basic idea is to construct the Estimator with the same model_dir used in training, then to call export as above, something like:

def serving_input_fn():
  """Build the serving inputs."""
  inputs = {}
  for feat in INPUT_COLUMNS:
    inputs[feat.name] = tf.placeholder(shape=[None], dtype=feat.dtype)

  features = {
      key: tf.expand_dims(tensor, -1)
      for key, tensor in inputs.iteritems()
  }
  return tf.contrib.learn.InputFnOps(features, None, inputs)

regressor = tf.contrib.learn.DNNRegressor(
    feature_columns=feature_columns,
    hidden_units=[40, 30, 20],
    model_dir="gs://my-bucket1/test2",
    optimizer='RMSProp'
)
regressor.export_savedmodel("my_model", serving_input_fn)

EDIT 09/12/2017

One slight change is needed to your training code. You are using tf.estimator.DNNRegressor, but that was introduced in TensorFlow 1.3; CloudML Engine only officially supports TensorFlow 1.2, so you'll need to use tf.contrib.learn.DNNRegressor instead. They are very similar, but one notable difference is that you'll need to use the fit method instead of train.

0
On

I had the same error message here, in my case there was two problems:

  1. The path to bucket with misspelling
  2. Wrong saved_file.pbtxt (with the first error message I put another renamed .pbtxt file in the same bucket with my model classes and this make the problem persist after the path corrected)

The command worked after delete the wrong file and correct the path. I hope this helps too.