I am working with Tensorflow/Keras and want to quantize model parameters and then implement the model with Numpy. I've build 1D CNN model ,train it, then quantize its parameters , to UINT8 ,using Tensorflow post training quantization , then i've extract the weights and biases and export it to .npy file. After build the same 1D CNN using Numpy (dtype UINT8) , with the extracted weights and biases , i check the results layer by layer and got different results compare to the quntized model results. when i compare the results of my Numpy implementation for Floating point model ( without quantization to UINT8) i do get the same outputs as Keras model outputs.( so i guess my Numpy model is working well :) ).
As far as i understood , interpreter.get_input_details() include the quantization scale and zero point parameters of the input tensor which required in case i want to convert the UINT8 weights to float - am i right?
I will bevery happy to suggestion how to get the same results as quantized Keras model