How to get 3D points with both CMSampleBuffer and AVDepthData in iOS17 with video session?

103 Views Asked by At

I want to get 3D points by using VNDetectHumanBodyPose3DRequest and VNImageRequestHandler in video session. And I have succeed by using VNImageRequestHandler.init(cmSampleBuffer:options:) without depthData, and it works well. But as Apple says, I can get better performance with depthData, so I use VNImageRequestHandler.init(cmSampleBuffer:depthData:orientation:options:) this new method in iOS17.But it seems to work worse, not better.And I cannot find any sample codes in Apple. So anyone who knows how to use both CMSampleBuffer and AVDepthData in iOS17 with video session to work well with VNDetectHumanBodyPose3DRequest? (video, not photo)

I have tried a lot to find sample codes in Apple documents, but find nothing.

Here is the code with only CMSampleBuffer to create VNImageRequestHandler and work well with VNDetectHumanBodyPose3DRequest

extension TempVideoViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
    let rotationAngle = connection.videoRotationAngle
    let isMirrored = connection.isVideoMirrored
    let orientation = cgImageOrientation(from: Float(rotationAngle), isMirrored: isMirrored)
    DispatchQueue.global().async {
        let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: orientation)
        do {
            let request = VNDetectHumanBodyPose3DRequest()
            try requestHandler.perform([request])
            if let observation = request.results?.first {
                //The result of 3D key points is good
            }
        } catch {}
    }
}

}

And when I use both CMSampleBuffer and AVDepthData, the result is bad

Here is the code:

extension TempVideoViewController: AVCaptureDataOutputSynchronizerDelegate {
func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer, didOutput synchronizedDataCollection: AVCaptureSynchronizedDataCollection) {
    if !startted {
        return
    }
    if synchronizedDataCollection.count == 2 {
        if let videoData = synchronizedDataCollection.synchronizedData(for: videoOutput) as? AVCaptureSynchronizedSampleBufferData, !videoData.sampleBufferWasDropped,
           let depthData = synchronizedDataCollection.synchronizedData(for: depthDataOutput) as? AVCaptureSynchronizedDepthData, !depthData.depthDataWasDropped {
            let connection = videoOutput.connection(with: .video)!
            let rotationAngle = connection.videoRotationAngle
            let isMirrored = connection.isVideoMirrored
            let orientation = cgImageOrientation(from: Float(rotationAngle), isMirrored: isMirrored)
            DispatchQueue.global().async {
                let requestHandler = VNImageRequestHandler.init(cmSampleBuffer: videoData.sampleBuffer, depthData: depthData.depthData, orientation: orientation)
                do {
                    let request = VNDetectHumanBodyPose3DRequest()
                    try requestHandler.perform([request])
                    if let observation = request.results?.first {
                        //The result of 3D key points is bad
                    }
                } catch {}
            }
        }
    }
}

}

And the setting of depthData is copied from Apple docs, here is the code:

if session.canAddOutput(depthDataOutput) {
                session.addOutput(depthDataOutput)
            }
            
            depthDataOutput.isFilteringEnabled = false
            if let connection = depthDataOutput.connection(with: .depthData) {
                connection.isEnabled = true
            }
            let depthFormats = videoDevice!.activeFormat.supportedDepthDataFormats
            let filtered = depthFormats.filter({
                CMFormatDescriptionGetMediaSubType($0.formatDescription) == kCVPixelFormatType_DepthFloat32
            })
            let selectedFormat = filtered.max(by: {
                first, second in CMVideoFormatDescriptionGetDimensions(first.formatDescription).width < CMVideoFormatDescriptionGetDimensions(second.formatDescription).width
            })
            do {
                try videoDevice!.lockForConfiguration()
                videoDevice!.activeDepthDataFormat = selectedFormat
                videoDevice!.unlockForConfiguration()
            } catch {}
            
            synchronizer = AVCaptureDataOutputSynchronizer(dataOutputs: [videoOutput, depthDataOutput])
            synchronizer?.setDelegate(self, queue: DispatchQueue.main)

So I don't know where is wrong. As Apple docs says, VNDetectHumanBodyPose3DRequest with both video and AVDepthData should be better, but in my sample, it's worse. Any idea? Thanks

0

There are 0 best solutions below