MoveNet Pose Lightning Tensorflow inaccurate keypoints

303 Views Asked by At

I'm currently working on a project using TensorFlow's MoveNet for pose estimation on a video. The model is detecting keypoints quite well, but there's an issue with the keypoint positioning.

I'm experiencing misalignment between the detected keypoints and the actual body parts in the video frames. The keypoints are off by a significant margin. I'm wondering if this misalignment is a common issue or if there are any specific adjustments that need to be made.

Here's a summary of the relevant information:

  • I'm using TensorFlow and the MoveNet model for single-pose estimation.
  • The input video has a resolution of 1280x720 pixels.
  • I'm resizing the frames to 192x192 pixels while preserving the aspect ratio before passing them to the model.
  • The code for rendering keypoints and connections is based on standard practices for pose estimation.

The specific problem is that keypoints are not correctly aligned with the body parts they represent, e.g., the elbow keypoint is not in the correct position. I'm looking for guidance on how to address this issue.

How it looks

import tensorflow as tf
import tensorflow_hub as hub
import cv2
from matplotlib import pyplot as plt
import numpy as np

model = hub.load("https://tfhub.dev/google/movenet/singlepose/lightning/4")
movenet = model.signatures['serving_default']

cap = cv2.VideoCapture('federer.mp4')
while cap.isOpened():
    ret, frame = cap.read()
    
    # Resize image
    img = frame.copy()
    img = tf.image.resize_with_pad(tf.expand_dims(img, axis=0), 192,192)
    input_img = tf.cast(img, dtype=tf.int32)
    
    # Detection section
    results = movenet(input_img)
    keypoints_with_scores = results['output_0'].numpy()[:,:,:51].reshape((1,1,17,3))
    
    # Render keypoints 
    loop_through_people(frame, keypoints_with_scores, EDGES, 0.1)
    
    cv2.imshow('Movenet Multipose', frame)
    
    if cv2.waitKey(10) & 0xFF==ord('q'):
        break
cap.release()
cv2.destroyAllWindows()

# Function to loop through each person detected and render
def loop_through_people(frame, keypoints_with_scores, edges, confidence_threshold):
    for person in keypoints_with_scores:
        draw_connections(frame, person, edges, confidence_threshold)
        draw_keypoints(frame, person, confidence_threshold)

def draw_keypoints(frame, keypoints, confidence_threshold):
    y, x, c = frame.shape
    shaped = np.squeeze(np.multiply(keypoints, [y,x,1]))
    
    for kp in shaped:
        ky, kx, kp_conf = kp
        if kp_conf > confidence_threshold:
            cv2.circle(frame, (int(kx), int(ky)), 6, (0,255,0), -1)

EDGES = {
    (0, 1): 'm',
    (0, 2): 'c',
    (1, 3): 'm',
    (2, 4): 'c',
    (0, 5): 'm',
    (0, 6): 'c',
    (5, 7): 'm',
    (7, 9): 'm',
    (6, 8): 'c',
    (8, 10): 'c',
    (5, 6): 'y',
    (5, 11): 'm',
    (6, 12): 'c',
    (11, 12): 'y',
    (11, 13): 'm',
    (13, 15): 'm',
    (12, 14): 'c',
    (14, 16): 'c'
}


def draw_connections(frame, keypoints, edges, confidence_threshold):
    y, x, c = frame.shape
    shaped = np.squeeze(np.multiply(keypoints, [y,x,1]))
    
    for edge, color in edges.items():
        p1, p2 = edge
        y1, x1, c1 = shaped[p1]
        y2, x2, c2 = shaped[p2]
        
        if (c1 > confidence_threshold) & (c2 > confidence_threshold):      
            cv2.line(frame, (int(x1), int(y1)), (int(x2), int(y2)), (0,0,255), 4)


The specific problem is that keypoints are not correctly aligned with the body parts they represent, e.g., the elbow keypoint is not in the correct position. I'm looking for guidance on how to address this issue.

2

There are 2 best solutions below

1
On

I have been used same sized model created by PINTO0309 from Github transleted to be used with OPENVINO and it did not performed well in terms of the positional accuracy of keypoints. However, when I used the the model with sizes 256 by 256 or 192 by 256 the accuracy increased and it was actually great. From my experience, I could say that 192 by 192 is not enough to be accurate as you would expect for processing a 720p video.

0
On

I found out it had something to do the input image size resizing because the inaccuracy was worse for me on the bottom of the image, while it looked ok in the center of the image.

When resizing using tf.image.resize_with_pad, according to tensorflow documentation, if the input image aspect ratio does not match the desired one it adds zeros for it to match, this would cause to the keypoints coordinates to become displaced on the y-axis in my case (input image 480x640, resizing to 192x192).

What fixed it for me is changing this line on the while loop:

img = tf.image.resize_with_pad(tf.expand_dims(img, axis=0), 192,192)

With this line:

img = tf.image.resize(tf.expand_dims(img, axis=0), (192,192))

tf.image.resize will also resize the image, but instead of adding zeroes, it deforms it to the new aspect ratio, and the draw_keypoints function will convert the coordinates properly to the original input shape as it already does with this line shaped = np.squeeze(np.multiply(keypoints, [y,x,1])).