Inputs A frame of video or an image, represented as an int32 tensor of shape: 192x192x3. Channels order: RGB with values in [0, 255].
Outputs A float32 tensor of shape [1, 1, 17, 3].
The first two channels of the last dimension represents the yx coordinates (normalized to image frame, i.e. range in [0.0, 1.0]) of the 17 keypoints (in the order of: [nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle, right ankle]).
The third channel of the last dimension represents the prediction confidence scores of each keypoint, also in the range [0.0, 1.0].
const MODEL_PATH = './model';
const EXAMPLE_IMG = document.getElementById('exampleImg');
let movenet = undefined;async function loadAndRunModel() {movenet = await tf.loadGraphModel(MODEL_PATH, {fromTFHub: true,});// let exampleInputTensor = tf.zeros([1, 192, 192, 3], 'int32');let imageTensor = tf.browser.fromPixels(EXAMPLE_IMG);console.log(imageTensor.shape);
let cropStartPoint = [15, 170, 0];
let cropSize = [345, 345, 3];
let croppedTensor = tf.slice(imageTensor, cropStartPoint, cropSize);
let resizedTensor = tf.image
.resizeBilinear(croppedTensor, [192, 192], true)
.toInt();
console.log(resizedTensor.shape);
let tensorOutput = movenet.predict(tf.expandDims(resizedTensor));
let arrayOutput = await tensorOutput;
console.log(arrayOutput);
}
loadAndRunModel();
I get an output of [1, 6, 56] shape tensor. According to the documentation it should return [1, 1, 17, 3] shape tensor. Why it returns different output?
Movenet comes in three "flavors":
The single pose versions return a tensor of shape
[1,1,17,3]
as described in your question, but the multipose returns a tensor of shape[1,6,56]
, which is described in the model card:You probably grabbed the multipose version. If you want the single pose versions, they're available here: