How to get corresponding depth pixel from Vision object detection

42 Views Asked by At

I'm building an iOS app which detects cars via Vision and then retrieves the distance to said car by a synchronized depthDataMap from the LiDAR sensor.

However, I'm having trouble finding the correct corresponding pixel in that depthDataMap. While the CGRect of the ObjectObservation ranges from 0 - 300 (x) and 0 - 600 (y), The width x height of the DepthDataMap is Only 320 x 180, so I can't get the right corresponding pixel. Any Idea on how to solve this?

This is my function in my AVCaptureDataOutputSynchronizerDelegate:

func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer,
                                didOutput synchronizedDataCollection: AVCaptureSynchronizedDataCollection) {
        // Retrieve the synchronized depth and sample buffer container objects.
        guard let syncedDepthData = synchronizedDataCollection.synchronizedData(for: depthDataOutput) as? AVCaptureSynchronizedDepthData,
              let syncedVideoData = synchronizedDataCollection.synchronizedData(for: videoOutput) as? AVCaptureSynchronizedSampleBufferData else { return }
        
        guard let pixelBuffer = syncedVideoData.sampleBuffer.imageBuffer else { return }
        
        
        let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: .up, options: [:])
        
        do {
            try imageRequestHandler.perform(self.requests)
        } catch let error {
            print(error)
        }
        
        depthData = syncedDepthData.depthData.converting(toDepthDataType: kCVPixelFormatType_DepthFloat16).depthDataMap
    }

and this is my function which in which I want to detect the middle of the object detection and retrieve it's depth pixel:

for observation in results where observation is VNRecognizedObjectObservation {
       guard let objectObservation = observation as? VNRecognizedObjectObservation else { continue }
       let topLabelObservation = objectObservation.labels[0]
            
            
       if topLabelObservation.identifier != "car" { return }
            
       let objectBounds = VNImageRectForNormalizedRect(objectObservation.boundingBox, Int(screenRect.size.width), Int(screenRect.size.height))
        
       let transformBounds = CGRect(x: objectBounds.minX, y: screenRect.size.height - objectBounds.maxY, width: objectBounds.maxX - objectBounds.minX, height: objectBounds.maxY - objectBounds.minY)
        
       let depthMapWidth = CVPixelBufferGetWidthOfPlane(depthData, 0) //always 180
       let depthMapHeight = CVPixelBufferGetHeightOfPlane(depthData, 0) // always 320
       let objMiddlePointX = Int(objectBounds.minX + (objectBounds.maxX - objectBounds.minX)/2)
       let objMiddlePointY = Int(screenRect.size.height - objectBounds.maxY + (objectBounds.maxY - objectBounds.minY)/2)
            
       let rowData = CVPixelBufferGetBaseAddress(depthData)?.assumingMemoryBound(to: Float16.self)
       let depthPoint = rowData?[objMiddlePointX * depthMapWidth + objMiddlePointY]
           
       let boxLayer = self.drawBoundingBox(transformBounds)
             
       detectionLayer.addSublayer(boxLayer)
}

I'm glad for every suggestion in the right direction.

0

There are 0 best solutions below