How to get corresponding depth pixel from Vision object detection

29 Views Asked by Michael Scholl At 28 July 2025 at 00:22

I'm building an iOS app which detects cars via Vision and then retrieves the distance to said car by a synchronized depthDataMap from the LiDAR sensor.

However, I'm having trouble finding the correct corresponding pixel in that depthDataMap. While the CGRect of the ObjectObservation ranges from 0 - 300 (x) and 0 - 600 (y), The width x height of the DepthDataMap is Only 320 x 180, so I can't get the right corresponding pixel. Any Idea on how to solve this?

This is my function in my AVCaptureDataOutputSynchronizerDelegate:

func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer,
                                didOutput synchronizedDataCollection: AVCaptureSynchronizedDataCollection) {
        // Retrieve the synchronized depth and sample buffer container objects.
        guard let syncedDepthData = synchronizedDataCollection.synchronizedData(for: depthDataOutput) as? AVCaptureSynchronizedDepthData,
              let syncedVideoData = synchronizedDataCollection.synchronizedData(for: videoOutput) as? AVCaptureSynchronizedSampleBufferData else { return }
        
        guard let pixelBuffer = syncedVideoData.sampleBuffer.imageBuffer else { return }
        
        
        let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: .up, options: [:])
        
        do {
            try imageRequestHandler.perform(self.requests)
        } catch let error {
            print(error)
        }
        
        depthData = syncedDepthData.depthData.converting(toDepthDataType: kCVPixelFormatType_DepthFloat16).depthDataMap
    }

and this is my function which in which I want to detect the middle of the object detection and retrieve it's depth pixel:

for observation in results where observation is VNRecognizedObjectObservation {
       guard let objectObservation = observation as? VNRecognizedObjectObservation else { continue }
       let topLabelObservation = objectObservation.labels[0]
            
            
       if topLabelObservation.identifier != "car" { return }
            
       let objectBounds = VNImageRectForNormalizedRect(objectObservation.boundingBox, Int(screenRect.size.width), Int(screenRect.size.height))
        
       let transformBounds = CGRect(x: objectBounds.minX, y: screenRect.size.height - objectBounds.maxY, width: objectBounds.maxX - objectBounds.minX, height: objectBounds.maxY - objectBounds.minY)
        
       let depthMapWidth = CVPixelBufferGetWidthOfPlane(depthData, 0) //always 180
       let depthMapHeight = CVPixelBufferGetHeightOfPlane(depthData, 0) // always 320
       let objMiddlePointX = Int(objectBounds.minX + (objectBounds.maxX - objectBounds.minX)/2)
       let objMiddlePointY = Int(screenRect.size.height - objectBounds.maxY + (objectBounds.maxY - objectBounds.minY)/2)
            
       let rowData = CVPixelBufferGetBaseAddress(depthData)?.assumingMemoryBound(to: Float16.self)
       let depthPoint = rowData?[objMiddlePointX * depthMapWidth + objMiddlePointY]
           
       let boxLayer = self.drawBoundingBox(transformBounds)
             
       detectionLayer.addSublayer(boxLayer)
}

I'm glad for every suggestion in the right direction.

Original Q&A

How to get corresponding depth pixel from Vision object detection

There are 0 best solutions below

Related Questions in IOS

Related Questions in SWIFT

Related Questions in XCODE

Related Questions in AVFOUNDATION

Related Questions in IOS-VISION

Trending Questions

Popular # Hahtags

Popular Questions