How to convert TFLite model to quantized TFLite model?

1.7k Views Asked by At

I have a tflite file and I want to quantize it.

How to convert TFLite model to quantized TFLite model?

2

There are 2 best solutions below

1
On

Depending on whether you used keras or tfhub, you can simply do the following [this is assuming the TF model was not quantization aware]:

converter = tf.lite.TFLiteConverter.from_keras_model(non_qat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()

You can further refer the link here: https://www.tensorflow.org/lite/performance/post_training_quantization

0
On

Please note you'll need the source model to quantise it. It is not doable to quantise a tflite model due to the limitation of its format.

Your source model could be TF saved_model, Keras model instance, or ONNX. You can find all the supported source model formats HERE, e.g. converter.from_keras_model(nn_path). For ONNX conversion, please check this tool.

There are various ways for quantisation. The easiest way is to perform post-training-quantisation on the weights only, in which the TF engine applies a min-max quantisation to get the quantisation parameters such as scale and zero points.

I'd suggest checking the documents on the TFLite site: https://www.tensorflow.org/lite/performance/post_training_quantization

The direct codes are:

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT] # applies PTQ on weights, if possible
tflite_quant_model = converter.convert()

A few extra things: you can also check out the quantisation-aware training (link)and mix precision training (link) in TF to directly have quantised weights, by which you can get the quantised model via simple TFLite conversion.