I want to get 3D points by using VNDetectHumanBodyPose3DRequest and VNImageRequestHandler in video session. And I have succeed by using VNImageRequestHandler.init(cmSampleBuffer:options:) without depthData, and it works well. But as Apple says, I can get better performance with depthData, so I use VNImageRequestHandler.init(cmSampleBuffer:depthData:orientation:options:) this new method in iOS17.But it seems to work worse, not better.And I cannot find any sample codes in Apple. So anyone who knows how to use both CMSampleBuffer and AVDepthData in iOS17 with video session to work well with VNDetectHumanBodyPose3DRequest? (video, not photo)
I have tried a lot to find sample codes in Apple documents, but find nothing.
Here is the code with only CMSampleBuffer to create VNImageRequestHandler and work well with VNDetectHumanBodyPose3DRequest
extension TempVideoViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let rotationAngle = connection.videoRotationAngle
let isMirrored = connection.isVideoMirrored
let orientation = cgImageOrientation(from: Float(rotationAngle), isMirrored: isMirrored)
DispatchQueue.global().async {
let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: orientation)
do {
let request = VNDetectHumanBodyPose3DRequest()
try requestHandler.perform([request])
if let observation = request.results?.first {
//The result of 3D key points is good
}
} catch {}
}
}
}
And when I use both CMSampleBuffer and AVDepthData, the result is bad
Here is the code:
extension TempVideoViewController: AVCaptureDataOutputSynchronizerDelegate {
func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer, didOutput synchronizedDataCollection: AVCaptureSynchronizedDataCollection) {
if !startted {
return
}
if synchronizedDataCollection.count == 2 {
if let videoData = synchronizedDataCollection.synchronizedData(for: videoOutput) as? AVCaptureSynchronizedSampleBufferData, !videoData.sampleBufferWasDropped,
let depthData = synchronizedDataCollection.synchronizedData(for: depthDataOutput) as? AVCaptureSynchronizedDepthData, !depthData.depthDataWasDropped {
let connection = videoOutput.connection(with: .video)!
let rotationAngle = connection.videoRotationAngle
let isMirrored = connection.isVideoMirrored
let orientation = cgImageOrientation(from: Float(rotationAngle), isMirrored: isMirrored)
DispatchQueue.global().async {
let requestHandler = VNImageRequestHandler.init(cmSampleBuffer: videoData.sampleBuffer, depthData: depthData.depthData, orientation: orientation)
do {
let request = VNDetectHumanBodyPose3DRequest()
try requestHandler.perform([request])
if let observation = request.results?.first {
//The result of 3D key points is bad
}
} catch {}
}
}
}
}
}
And the setting of depthData is copied from Apple docs, here is the code:
if session.canAddOutput(depthDataOutput) {
session.addOutput(depthDataOutput)
}
depthDataOutput.isFilteringEnabled = false
if let connection = depthDataOutput.connection(with: .depthData) {
connection.isEnabled = true
}
let depthFormats = videoDevice!.activeFormat.supportedDepthDataFormats
let filtered = depthFormats.filter({
CMFormatDescriptionGetMediaSubType($0.formatDescription) == kCVPixelFormatType_DepthFloat32
})
let selectedFormat = filtered.max(by: {
first, second in CMVideoFormatDescriptionGetDimensions(first.formatDescription).width < CMVideoFormatDescriptionGetDimensions(second.formatDescription).width
})
do {
try videoDevice!.lockForConfiguration()
videoDevice!.activeDepthDataFormat = selectedFormat
videoDevice!.unlockForConfiguration()
} catch {}
synchronizer = AVCaptureDataOutputSynchronizer(dataOutputs: [videoOutput, depthDataOutput])
synchronizer?.setDelegate(self, queue: DispatchQueue.main)
So I don't know where is wrong. As Apple docs says, VNDetectHumanBodyPose3DRequest with both video and AVDepthData should be better, but in my sample, it's worse. Any idea? Thanks