I am attempting to learn object detection in iOS, and then mark the place of the detected object. I have the model trained and installed in the project. My next step is to show an AR View on screen. That is working. When I turn my vision processing code on via a button, I end up with the image on screen rotated and distorted (most likely just stretching due to inverted axis).
I found a partial tutorial that I was using to help guide me, and they seem to have run into this issue, solved it, but did not show the solution. I have no way of contacting the author. The author's comment was: one slightly tricky aspect to this was that the coordinate system returned from Vision was different than SwiftUI’s coordinate system (normalized and the y-axis was flipped), but some simple transformations did the trick.
I have no idea which simple transformations they were, but I suspect they were simd related. If anyone has insight into this, I would appreciate solving the rotation and distortion issue.
I do have error codes that appear in the console as soon as Vision starts:
Messages similar to this:
2022-05-12 21:14:39.142550-0400 Find My Apple Remote[66143:9990936] [Assets] Resolving material name 'engine:BuiltinRenderGraphResources/AR/arInPlacePostProcessCombinedPermute7.rematerial' as an asset path -- this usage is deprecated; instead provide a valid bundle
2022-05-12 21:14:39.270684-0400 Find My Apple Remote[66143:9991089] [Session] ARSession <0x111743970>: ARSessionDelegate is retaining 11 ARFrames. This can lead to future camera frames being dropped.
2022-05-12 21:14:40.121810-0400 Find My Apple Remote[66143:9991117] [CAMetalLayer nextDrawable] returning nil because allocation failed.
The one that concerns me the most is the last one.
My code, so far, is:
struct ContentView : View {
@State private var isDetecting = false
@State private var success = false
var body: some View {
VStack {
RealityKitView(isDetecting: $isDetecting, success: $success)
.overlay(alignment: .top) {
Image(systemName: (success ? "checkmark.circle" : "slash.circle"))
.foregroundColor(success ? .green : .red)
}
Button {
isDetecting.toggle()
} label: {
Text(isDetecting ? "Stop Detecting" : "Start Detecting")
.frame(width: 150, height: 50)
.background(
Capsule()
.fill(isDetecting ? Color.red.opacity(0.5) : Color.green.opacity(0.5))
)
}
}
}
}
import SwiftUI
import ARKit
import RealityKit
import Vision
struct RealityKitView: UIViewRepresentable {
let arView = ARView()
let scale = SIMD3<Float>(repeating: 0.1)
let model: VNCoreMLModel? = RealityKitView.returnMLModel()
@Binding var isDetecting: Bool
@Binding var success: Bool
@State var boundingBox: CGRect?
func makeUIView(context: Context) -> some UIView {
// Start AR Session
let session = configureSession()
// Handle ARSession events via delegate
session.delegate = context.coordinator
return arView
}
func configureSession() -> ARSession {
let session = arView.session
let config = ARWorldTrackingConfiguration()
config.planeDetection = [.horizontal, .vertical]
config.environmentTexturing = .automatic
session.run(config)
return session
}
static func returnMLModel() -> VNCoreMLModel? {
do {
let detector = try AppleRemoteDetector()
let model = try VNCoreMLModel(for: detector.model)
return model
} catch {
print("RealityKitView:returnMLModel failed with error: \(error)")
}
return nil
}
func updateUIView(_ uiView: UIViewType, context: Context) {}
func makeCoordinator() -> Coordinator {
Coordinator(self)
}
class Coordinator: NSObject, ARSessionDelegate {
var parent: RealityKitView
init(_ parent: RealityKitView) {
self.parent = parent
}
func session(_ session: ARSession, didUpdate frame: ARFrame) {
// Start vision processing
if parent.isDetecting {
guard let model = parent.model else {
return
}
// I suspect the problem is here where the image is captured in a buffer, and then
// turned in to an input for the CoreML model.
let pixelBuffer = frame.capturedImage
let input = AppleRemoteDetectorInput(image: pixelBuffer)
do {
let request = VNCoreMLRequest(model: model) { (request, error) in
guard
let results = request.results,
!results.isEmpty,
let recognizedObjectObservation = results as? [VNRecognizedObjectObservation],
let first = recognizedObjectObservation.first
else {
self.parent.boundingBox = nil
self.parent.success = false
return
}
self.parent.success = true
print("\(first.boundingBox)")
self.parent.boundingBox = first.boundingBox
}
model.featureProvider = input
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: CGImagePropertyOrientation.right, options: [:])
try handler.perform([request])
} catch {
print(error)
}
}
}
}
}
After days of trying to figure this out, with research and more research, I came across this question and answer that provides the solution. Please note that both answers are valid, it just depends upon the structure of your app.
The crux of the issue is that causing a state change in
RealityKitView
causes theARView
to be re-instantiated. However, this time, it is instantiated with a size of 0, and that is what causes the error[CAMetalLayer nextDrawable] returning nil because allocation failed
as this causes it to return nil. However, initializing it with some size like this:resolves that issue.
For the sake of those who are attempting this in the future, here is the current working
UIViewRepresentable
: