I'm doing a Deep Neural Network regressor with Tensorflow based on this tuorial. When I'm trying to save the model with tf.estimator export_savemodel I get the following error:
raise ValueError('Feature {} is not in features dictionary.'.format(key))
ValueError: Feature ad_provider is not in features dictionary.
I need to export it in order to deploy a model to support prediction in Google Cloud Platform.
Here is my where I define the columns :
CSV_COLUMNS = [
"ad_provider", "device", "split_group","gold", "secret_areas",
"scored_enemies", "tutorial_sec", "video_success"
]
FEATURES = ["ad_provider", "device", "split_group","gold", "secret_areas",
"scored_enemies", "tutorial_sec"]
LABEL = "video_success"
ad_provider = tf.feature_column.categorical_column_with_vocabulary_list(
"ad_provider", ["Organic","Apple Search Ads","googleadwords_int",
"Facebook Ads","website"] )
split_group = tf.feature_column.categorical_column_with_vocabulary_list(
"split_group", [1,2,3,4])
device = tf.feature_column.categorical_column_with_hash_bucket(
"device", hash_bucket_size=100)
secret_areas = tf.feature_column.numeric_column("secret_areas")
gold = tf.feature_column.numeric_column("gold")
scored_enemies = tf.feature_column.numeric_column("scored_enemies")
finish_tutorial_sec = tf.feature_column.numeric_column("tutorial_sec")
video_success = tf.feature_column.numeric_column("video_success")
feature_columns = [
tf.feature_column.indicator_column(ad_provider),
tf.feature_column.embedding_column(device, dimension=8),
tf.feature_column.indicator_column(split_group),
tf.feature_column.numeric_column(key="gold"),
tf.feature_column.numeric_column(key="secret_areas"),
tf.feature_column.numeric_column(key="scored_enemies"),
tf.feature_column.numeric_column(key="tutorial_sec"),
]
After, I create a function to export my model in JSON dictionaries. I'm not sure If I'm doing well the serving function.
def json_serving_input_fn():
"""Build the serving inputs."""
inputs = {}
for feat in feature_columns:
inputs[feat.name] = tf.placeholder(shape=[None], dtype= feat.dtype if
hasattr(feat, 'dtype') else tf.string)
features = {
key: tf.expand_dims(tensor, -1)
for key, tensor in inputs.items()
}
return tf.contrib.learn.InputFnOps(features, None, inputs)
Here is the rest of my code:
def main(unused_argv):
#Normalize columns 'Gold' and 'tutorial_sec' for Traininig Set
train_n = training_set
train_n['gold'] = (train_n['gold'] - train_n['gold'].mean()) / (train_n['gold'].max() - train_n['gold'].min())
train_n['tutorial_sec'] = (train_n['tutorial_sec'] - train_n['tutorial_sec'].mean()) / (train_n['tutorial_sec'].max() - train_n['tutorial_sec'].min())
train_n['scored_enemies'] = (train_n['scored_enemies'] - train_n['scored_enemies'].mean()) / (train_n['scored_enemies'].max() - train_n['scored_enemies'].min())
test_n = test_set
test_n['gold'] = (test_n['gold'] - test_n['gold'].mean()) / (test_n['gold'].max() - test_n['gold'].min())
test_n['tutorial_sec'] = (test_n['tutorial_sec'] - test_n['tutorial_sec'].mean()) / (test_n['tutorial_sec'].max() - test_n['tutorial_sec'].min())
test_n['scored_enemies'] = (test_n['scored_enemies'] - test_n['scored_enemies'].mean()) / (test_n['scored_enemies'].max() - test_n['scored_enemies'].min())
train_input_fn = tf.estimator.inputs.pandas_input_fn(
x=train_n,
y=pd.Series(train_n[LABEL].values),
batch_size=100,
num_epochs=None,
shuffle=True)
test_input_fn = tf.estimator.inputs.pandas_input_fn(
x=test_n,
y=pd.Series(test_n[LABEL].values),
batch_size=100,
num_epochs=1,
shuffle=False)
regressor = tf.estimator.DNNRegressor(feature_columns=feature_columns,
hidden_units=[40, 30, 20],
model_dir="model1",
optimizer='RMSProp'
)
# Train
regressor.train(input_fn=train_input_fn, steps=5)
regressor.export_savedmodel("test",json_serving_input_fn)
#Evaluate loss over one epoch of test_set.
#For each step, calls `input_fn`, which returns one batch of data.
ev = regressor.evaluate(
input_fn=test_input_fn)
loss_score = ev["loss"]
print("Loss: {0:f}".format(loss_score))
for key in sorted(ev):
print("%s: %s" % (key, ev[key]))
# Print out predictions over a slice of prediction_set.
y = regressor.predict(
input_fn=test_input_fn)
# Array with prediction list!
predictions = list(p["predictions"] for p in y)
#real = list(p["real"] for p in pd.Series(training_set[LABEL].values))
real = test_set[LABEL].values
diff = np.subtract(real,predictions)
diff = np.absolute(diff)
diff = np.mean(diff)
print("Mean Square Error of Test Set = ",diff*diff)
Besides the issue you mentioned, there are actual multiple additional issues I foresee you running into:
tf.estimator.DnnRegressor
which was introduced in TensorFlow 1.3. CloudML Engine only officially supports TF 1.2.So let's start by using
tf.contrib.learn.DNNRegressor
, which only requires minor changes:Note the
fit
instead oftrain
.(NB: your
json_serving_inputfn
is actually already written for TF 1.2 and is incompatible with TF 1.3. Which is good for now).Now, the root cause of the error that you are seeing is that the column/features
ad_provider
is not in the list of inputs and features (but you do havead_provider_indicator
). This is because you are iterating throughfeature_columns
and not through the original input column list. The way to address that is by iterating over the actual inputs instead of the feature columns; however, we'll need to know the types, too (simplified with just a few columns):Finally, to normalize your data, you'll probably want to do that in the graph. You could try using
tf.transform
, or, alternatively, write a custom estimator that does the transformation, delegating the actual model implementation DNNRegressor.