Case 1
Framework: Tensorflow 2.5.0, Intel-Tensorflow 2.5.0
Environment: Google Colab
I have a successfully quantized model quantized by LPOT that is to be run for inference without using LPOT API, so I wrote the following inference code:
with tf.compat.v1.Session() as sess:
tf.compat.v1.saved_model.loader.load(sess, ['serve'], model)
output = sess.graph.get_tensor_by_name(output_tensor_name)
predictions = sess.run(output, {input_tensor_name: x})
mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y, predictions))
print(mse.eval())
When running the line predictions = sess.run(output, {input_tensor_name: x}):
---------------------------------------------------------------------------
InternalError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1374 try:
-> 1375 return fn(*args)
1376 except errors.OpError as e:
7 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
1359 return self._call_tf_sessionrun(options, feed_dict, fetch_list,
-> 1360 target_list, run_metadata)
1361
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
1452 fetch_list, target_list,
-> 1453 run_metadata)
1454
InternalError: Missing 0-th output from {{node model/layer_1/Conv2D_eightbit_requantize}}
During handling of the above exception, another exception occurred:
InternalError Traceback (most recent call last)
<ipython-input-6-2bddd853d111> in <module>()
2 tf.compat.v1.saved_model.loader.load(sess, ['serve'], model)
3 output = sess.graph.get_tensor_by_name(output_tensor_name)
----> 4 predictions = sess.run(output, {input_tensor_name: x[:64]}) # 64, 257, 60, 1
5 mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y[:64], predictions))
6 print(mse.eval())
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
966 try:
967 result = self._run(None, fetches, feed_dict, options_ptr,
--> 968 run_metadata_ptr)
969 if run_metadata:
970 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1189 if final_fetches or final_targets or (handle and feed_dict_tensor):
1190 results = self._do_run(handle, final_targets, final_fetches,
-> 1191 feed_dict_tensor, options, run_metadata)
1192 else:
1193 results = []
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1367 if handle is None:
1368 return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1369 run_metadata)
1370 else:
1371 return self._do_call(_prun_fn, handle, feeds, fetches)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1392 '\nsession_config.graph_options.rewrite_options.'
1393 'disable_meta_optimizer = True')
-> 1394 raise type(e)(node_def, op, message)
1395
1396 def _extend_graph(self):
InternalError: Missing 0-th output from node model/layer_1/Conv2D_eightbit_requantize (defined at <ipython-input-6-2bddd853d111>:2)
This error happens with or without Intel-Tensorflow==2.5.0 installed, nor is it resolved when os.environ['TF_ENABLE_ONEDNN_OPTS'] = '1' is set explicitly.
On the other hand, when I run the same code in VS Code with Python 3.6.8 64-bit base: Conda, it returns the same error message as in Case 2.
Case 2
Framework: Tensorflow 2.4.0, Intel-Tensorflow 2.4.0
Environment: Google Colab
This case works well and prints out the MSE loss of the predictions, but when I uninstall Intel-Tensorflow 2.4.0 and run with official Tensorflow only, while running the same line in Case 1 (predictions = sess.run(output, {input_tensor_name: x})):
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1374 try:
-> 1375 return fn(*args)
1376 except errors.OpError as e:
7 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
1357 # Ensure any changes to the graph are reflected in the runtime.
-> 1358 self._extend_graph()
1359 return self._call_tf_sessionrun(options, feed_dict, fetch_list,
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _extend_graph(self)
1397 with self._graph._session_run_lock(): # pylint: disable=protected-access
-> 1398 tf_session.ExtendSession(self._session)
1399
InvalidArgumentError: No OpKernel was registered to support Op 'QuantizedMatMulWithBiasAndDequantize' used by {{node model/dense/Tensordot/MatMul_eightbit_requantize}} with these attrs: [input_quant_mode="MIN_FIRST", T1=DT_QUINT8, Toutput=DT_FLOAT, T2=DT_QINT8, Tbias=DT_QINT32, transpose_a=false, transpose_b=false]
Registered devices: [CPU]
Registered kernels:
<no registered kernels>
[[model/dense/Tensordot/MatMul_eightbit_requantize]]
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
<ipython-input-6-2bddd853d111> in <module>()
2 tf.compat.v1.saved_model.loader.load(sess, ['serve'], model)
3 output = sess.graph.get_tensor_by_name(output_tensor_name)
----> 4 predictions = sess.run(output, {input_tensor_name: x[:64]}) # 64, 257, 60, 1
5 mse = tf.reduce_mean(tf.keras.losses.mean_squared_error(y[:64], predictions))
6 print(mse.eval())
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
966 try:
967 result = self._run(None, fetches, feed_dict, options_ptr,
--> 968 run_metadata_ptr)
969 if run_metadata:
970 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1189 if final_fetches or final_targets or (handle and feed_dict_tensor):
1190 results = self._do_run(handle, final_targets, final_fetches,
-> 1191 feed_dict_tensor, options, run_metadata)
1192 else:
1193 results = []
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1367 if handle is None:
1368 return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1369 run_metadata)
1370 else:
1371 return self._do_call(_prun_fn, handle, feeds, fetches)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1392 '\nsession_config.graph_options.rewrite_options.'
1393 'disable_meta_optimizer = True')
-> 1394 raise type(e)(node_def, op, message)
1395
1396 def _extend_graph(self):
InvalidArgumentError: No OpKernel was registered to support Op 'QuantizedMatMulWithBiasAndDequantize' used by node model/dense/Tensordot/MatMul_eightbit_requantize (defined at <ipython-input-6-2bddd853d111>:2) with these attrs: [input_quant_mode="MIN_FIRST", T1=DT_QUINT8, Toutput=DT_FLOAT, T2=DT_QINT8, Tbias=DT_QINT32, transpose_a=false, transpose_b=false]
Registered devices: [CPU]
Registered kernels:
<no registered kernels>
[[model/dense/Tensordot/MatMul_eightbit_requantize]]
The error persists even with os.environ['TF_ENABLE_ONEDNN_OPTS'] = '1' set explicitly.
Conclusion
I believe both cases are caused by the same type of error, i.e. No OpKernel was registered to support Op ...
I was given to understand that with official Tensorflow v2.5 installed and the environment variable TF_ENABLE_ONEDNN_OPTS=1 set (reference), the quantized model is supposed to run with oneDNN supported. But it doesn't seem to be the case in neither v2.4 nor v2.5.
My question is how do I have official Tensorflow 2.5 environment with oneDNN support without having to install Intel-Tensorflow? Or why does Intel-Tensorflow 2.5 not work? Thanks.
LPOT is released in Intel® AI Analytics Toolkit and works with Intel Optimization of TensorFlow. The LPOT can run on any Intel CPU to quantize the AI model. Intel Optimized TensorFlow 2.5.0 requires setting environment variable
TF_ENABLE_MKL_NATIVE_FORMAT=0before running LPOT quantization or deploying the quantized model.Please refer to this for more information.
Could you please check whether you quantized the model in Tensorflow in 2.4 and running the inference on Tensorflow 2.5? A plausible explanation of the model not running in Tensorflow 2.5 and running in Tensorflow 2.4 is that the operators supporting Tensorflow 2.5 may not support the model created in Tensorflow 2.4.