I have PoseNet Tensorflow
saved model that takes in an image and outputs heatmap and offset tensors.
PoseNet is an already trained model
from Google, and I've very little control over it. the model works fine, but I just want to add a layer to it that performs the post-processing
.
Currently, I'm extracting the final keypoints in Python code. How can I add post-processing layer to the model itself so it outputs the final keypoints?
There are models, such as Movenet, that output the final keypoint, and I want to do the same thing for PoseNet.
This image illustrates what I'm trying to accomplish:
I've looked at the following posts about adding post-processing layer, but I don't know how to apply it for my problem:
- How to add post-processing into a Tensorflow Model?
- Cannot add layers to saved Keras Model. 'Model' object has no attribute 'add'
- How to add another layer on a pre-loaded network?
- Add layer between two layers in saved model tensorflow
I fully understand the post-processing algorithm and have implemented it in Python. Now, I want to integrate this functionality directly into the model itself:
"""
heatmap shape [9, 9, 17]
offset shape [9, 9, 34]
"""
def parse_output(heatmap, offset):
# Get the number of joints - value is 17 for Posenet
joint_num = heatmap.shape[-1]
# Initialize an array to store the keypoints
pose_kps = np.zeros((joint_num, 3), np.uint32)
# Iterate over each joint
for i in range(heatmap.shape[-1]):
# select heatmap for the i-th joint
joint_heatmap = heatmap[..., i]
# Find the maximum probability and its position
max_prob = np.max(joint_heatmap)
# get the x, y coordinates of the max_prob position. eg: [4,7]
max_val_pos = np.squeeze(np.argwhere(joint_heatmap == max_prob))
# scale keypoints to the model input coordinates
remap = np.array(max_val_pos/8*257, dtype=np.int32)
# Assign the calculated values to the keypoints array
pose_kps[i, 0] = remap[0] + offset[max_val_pos[0], max_val_pos[1], i]
pose_kps[i, 1] = remap[1] + offset[max_val_pos[0], max_val_pos[1], i + joint_num]
pose_kps[i, 2] = max_prob
return pose_kps
The above parse_output
make sense to me and similar implementation is also done in the following projects:
I have created a sample PosetNetDemo project to show my current implementation.
I appreciate it if you point me to a resource or help me solve it. Thank you!
Your decoding function can be vectorized relatively straightforwardly. There's a lot of casting around because of TensorFlow stricter type requirement, to keep the result between your decoding function and my TensorFlow version similar. I also took the liberty to add a batch dimension to the function, as it will play better if you want to incorporate it in a Keras model.
Vectorized function:
You can check that the output of the two functions are similar (Due to casting/precision issues, there might be slightly different results, so I would encourage to test with a floating type instead, i.e remove the cast/flooring functions and do the same in your numpy implementation):
Using it in a Keras model:
You can simply use a Lambda Layer:
If you want to run the model on accelerated hardware, you might run into issues due to the use of
tf.shape
which is not always handled well by static folding optimizations. If that's the case, you might either want to create an actual Keras layer where you pre-compute those values during the building phase, or simply hardcode the values in your function.Implementation Details:
I chose to use
tf.math.topk
to get the maximum scores for each keypoints and tf.gather to do the indexing, but there are other methods that would work. Usingtopk
has the advantage to scale to multi-joint detection if needed.