Mediapipe - Differentiated extraction of hand landmarks based on handedness (right and left) in a C++ framework

464 Views Asked by At

I am writing a calculator in C++ "Hand_Landmark_Write_To_File_Calculator" to write in a file the normalized position of the hand mark (x, y, z) as a function of the handedness.

The inputs to my calculator are :

  • input_stream: "HANDEDNESS:handedness" Handedness of the detected hand (i.e. is the hand left or right). (std::vector<ClassificationList>)

  • input_stream: "LANDMARKS:landmarks" Collection of detected/predicted hands, each represented as a list of landmarks. (std::vector<NormalizedLandmarkList>)

My calculator has been added to the "hand_trancking_desktop_live.pbtxt" graph provided in the Mediapipe example, so that both inputs come from the output of the "HandLandmarkTrackingCpu" node.

Graph

# CPU image. (ImageFrame)
input_stream: "input_video"

# CPU image. (ImageFrame)
output_stream: "output_video"

# Generates side packet cotaining max number of hands to detect/track.
node {
  calculator: "ConstantSidePacketCalculator"
  output_side_packet: "PACKET:num_hands"
  node_options: {
    [type.googleapis.com/mediapipe.ConstantSidePacketCalculatorOptions]: {
      packet { int_value: 2 }
    }
  }
}

# Detects/tracks hand landmarks.
node {
  calculator: "HandLandmarkTrackingCpu"
  input_stream: "IMAGE:input_video"
  input_side_packet: "NUM_HANDS:num_hands"
  output_stream: "LANDMARKS:landmarks"
  output_stream: "HANDEDNESS:handedness"
  output_stream: "PALM_DETECTIONS:multi_palm_detections"
  output_stream: "HAND_ROIS_FROM_LANDMARKS:multi_hand_rects"
  output_stream: "HAND_ROIS_FROM_PALM_DETECTIONS:multi_palm_rects"
}

node {
  calculator: "HandLandmarkWriteToFileCalculator"
  input_stream: "LANDMARKS:landmarks"
  input_stream: "HANDEDNESS:handedness"
}

# Subgraph that renders annotations and overlays them on top of the input
# images (see hand_renderer_cpu.pbtxt).
node {
  calculator: "HandRendererSubgraph"
  input_stream: "IMAGE:input_video"
  input_stream: "DETECTIONS:multi_palm_detections"
  input_stream: "LANDMARKS:landmarks"
  input_stream: "HANDEDNESS:handedness"
  input_stream: "NORM_RECTS:0:multi_palm_rects"
  input_stream: "NORM_RECTS:1:multi_hand_rects"
  output_stream: "IMAGE:output_video"
}

At the moment I am only trying to display the position of the hands in the standard output according to the Handedness label.

Calculators code :

#include <string>
#include <vector>

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/port/canonical_errors.h"
#include "mediapipe/tasks/cc/components/containers/landmark.h"
#include "mediapipe/framework/formats/classification.pb.h"

namespace mediapipe {

class HandLandmarkWriteToFileCalculator : public CalculatorBase {
 public:
  static absl::Status GetContract(CalculatorContract* cc) {
    cc->Inputs().Tag("LANDMARKS").Set<std::vector<NormalizedLandmarkList>>();
    cc->Inputs().Tag("HANDEDNESS").Set<std::vector<ClassificationList>>();
    return absl::OkStatus();
    }

  absl::Status Open(CalculatorContext* cc) final { return absl::OkStatus(); }
  absl::Status Process(CalculatorContext* cc) final {
    const auto& input_landmarks =
    cc->Inputs().Tag("LANDMARKS").Get<std::vector<NormalizedLandmarkList>>();
            
    const std::vector<ClassificationList>& classifications =
    cc->Inputs().Tag("HANDEDNESS").Get<std::vector<ClassificationList>>();
        
    std::string label;
    for (int i = 0; i < classifications.size(); i++)
    {
        label = classifications[i].classification(0).label();
        if (label.compare("Right") == 1) {
            std::cout << "Right : " << input_landmarks[0].landmark(0).x() << std::endl;
        } else {
            std::cout << "Left : " << input_landmarks[0].landmark(0).x() << std::endl;
        }
    }
    return absl::OkStatus();
    }
};
REGISTER_CALCULATOR(HandLandmarkWriteToFileCalculator);

}  // namespace mediapipe

If only one hand appears on the webcam, then this code correctly displays the x-coordinate with the correct hand label. However, if both hands appear on the webcam, the index 0 of the input_landmarks vector corresponds to the first hand that could be detected. So if my right hand is detected before my left hand the standard output will show for "Right: " & "Left: " the same x coordinate corresponding to my right hand. Conversely, if my left hand is detected first, the index 0 will now correspond to my left hand and the standard output will only display the x-coordinate of my left hand.

How can I match the Handedness to the coordinates of the corresponding hand, when two hands are detected ?

1

There are 1 best solutions below

0
Myosin On

I have solved my problem.

The issue was with the graph hand_tracking_desktop_live.pbtxt, where the output stream LANDMARKS:landmarks is of type std::vector<NormalizedLandmarkList>. This vector contains the landmarks of each hand, but the handedness associated with each index changes depending on the first hand detected.

To solve this problem, i moved the Hand_Landmark_Write_To_File_Calculator node into the HandLandMarkTrackingCpu calculator graph.

In this graph, each hand is first processed separately, and then the handedness and landmarks of each hand are collected and grouped within the vector.

So, I added the Hand_Landmark_Write_To_File_Calculator node after the detection of the landmarks and the handedness of each isolated hand and before the collection of this data into vectors. As follows :

Graph hand_landmark_tracking_cpu.pbtxt:

# Detect hand landmarks for the specific hand rect.
node {
  calculator: "HandLandmarkCpu"
  input_side_packet: "MODEL_COMPLEXITY:model_complexity"
  input_stream: "IMAGE:image_for_landmarks"
  input_stream: "ROI:single_hand_rect"
  output_stream: "LANDMARKS:single_hand_landmarks"
  output_stream: "WORLD_LANDMARKS:single_hand_world_landmarks"
  output_stream: "HANDEDNESS:single_handedness"
}

# [MY ADDED CALCULATOR] Write Handlandmarks to file.
node {
  calculator: "HandLandmarkWriteToFileCalculator"
  input_stream: "LANDMARKS:single_hand_landmarks"
  input_stream: "HANDEDNESS:single_handedness"
}

# Collects the handedness for each single hand into a vector. Upon receiving the
# BATCH_END timestamp, outputs a vector of ClassificationList at the BATCH_END
# timestamp.
node {
  calculator: "EndLoopClassificationListCalculator"
  input_stream: "ITEM:single_handedness"
  input_stream: "BATCH_END:hand_rects_timestamp"
  output_stream: "ITERABLE:multi_handedness"
}

# Calculate region of interest (ROI) based on detected hand landmarks to reuse
# on the subsequent runs of the graph.
node {
  calculator: "HandLandmarkLandmarksToRoi"
  input_stream: "IMAGE_SIZE:image_size_for_landmarks"
  input_stream: "LANDMARKS:single_hand_landmarks"
  output_stream: "ROI:single_hand_rect_from_landmarks"
}

I have adapted my C++ code like this:

#include <string>
#include <vector>

#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/port/canonical_errors.h"
#include "mediapipe/tasks/cc/components/containers/landmark.h"
#include "mediapipe/framework/formats/classification.pb.h"

namespace mediapipe {

class HandLandmarkWriteToFileCalculator : public CalculatorBase {
 public:
  static absl::Status GetContract(CalculatorContract* cc) {
    cc->Inputs().Tag("LANDMARKS").Set<NormalizedLandmarkList>();
    cc->Inputs().Tag("HANDEDNESS").Set<ClassificationList>();
    return absl::OkStatus();
    }

  absl::Status Open(CalculatorContext* cc) final { return absl::OkStatus(); }

  absl::Status Process(CalculatorContext* cc) final {
    const auto& input_landmarks =
    cc->Inputs().Tag("LANDMARKS").Get<NormalizedLandmarkList>();
            
    const ClassificationList& input_handedness =
    cc->Inputs().Tag("HANDEDNESS").Get<ClassificationList>();
        
    std::string label;
    label = input_handedness.classification(0).label();
    if (label.compare("Right") == 1) {
        std::cout << "Right: " << input_landmarks.landmark(0).x() << std::endl;
    } else {
        std::cout << "Left : " << input_landmarks.landmark(0).x() << std::endl;
    }
    return absl::OkStatus();
    }
};
REGISTER_CALCULATOR(HandLandmarkWriteToFileCalculator);

}  // namespace mediapipe

I hope this answer will help others.