I am making an iOS where a user takes a picture and then I want to use Google's MLKit from Firebase to detect text in the picture. I have set up a custom camera UIViewController
that we'll call CameraViewController
. There is a simple button that a user will press to take a picture. I have followed Firebase's documentation, here, but MLKit is not working for me. Here is the code I have for your refrence and then we'll talk about what the problem is.
1.Here are my imports, class delegates, and outlets:
import UIKit
import AVFoundation
import Firebase
class CameraViewController: UIViewController, AVCapturePhotoCaptureDelegate {
var captureSession: AVCaptureSession?
var videoPreviewLayer: AVCaptureVideoPreviewLayer?
var capturePhotoOutput: AVCapturePhotoOutput?
@IBOutlet var previewView: UIView!
@IBOutlet var captureButton: UIButton!
}
2.In the viewDidLoad, I set up the "previewView" so that the user has a "view finder":
override func viewDidLoad() {
super.viewDidLoad()
let captureDevice = AVCaptureDevice.default(for: .video)!
do {
let input = try AVCaptureDeviceInput(device: captureDevice)
captureSession = AVCaptureSession()
captureSession?.addInput(input)
videoPreviewLayer = AVCaptureVideoPreviewLayer(session: captureSession!)
videoPreviewLayer?.videoGravity = AVLayerVideoGravity.resizeAspectFill
videoPreviewLayer?.frame = view.layer.bounds
previewView.layer.addSublayer(videoPreviewLayer!)
captureSession?.startRunning()
capturePhotoOutput = AVCapturePhotoOutput()
capturePhotoOutput?.isHighResolutionCaptureEnabled = true
captureSession?.addOutput(capturePhotoOutput!)
} catch {
print(error)
}
}
3.Here is my action for the button that takes the image
@IBAction func captureButtonTapped(_ sender: Any) {
guard let capturePhotoOutput = self.capturePhotoOutput else { return }
let photoSettings = AVCapturePhotoSettings()
photoSettings.isAutoStillImageStabilizationEnabled = true
photoSettings.isHighResolutionPhotoEnabled = true
photoSettings.flashMode = .off
capturePhotoOutput.capturePhoto(with: photoSettings, delegate: self)
}
4.This is where I receive the picture taken using the didFinishProcessingPhoto
delegate method and start using MLKit
func photoOutput(_ captureOutput: AVCapturePhotoOutput, didFinishProcessingPhoto photoSampleBuffer: CMSampleBuffer?, previewPhoto previewPhotoSampleBuffer: CMSampleBuffer?, resolvedSettings: AVCaptureResolvedPhotoSettings, bracketSettings: AVCaptureBracketedStillImageSettings?, error: Error?) {
guard error == nil,
let photoSampleBuffer = photoSampleBuffer else {
print("Error capturing photo: \(String(describing: error))")
return
}
guard let imageData =
AVCapturePhotoOutput.jpegPhotoDataRepresentation(forJPEGSampleBuffer: photoSampleBuffer, previewPhotoSampleBuffer: previewPhotoSampleBuffer) else {
return
}
let capturedImage = UIImage.init(data: imageData , scale: 1.0)
captureNormal()
DispatchQueue.main.asyncAfter(deadline: .now()+0.1) {
self.captureSession?.stopRunning()
self.processText(with: capturedImage!)
// Here is where I call the function processText where MLKit is run
}
}
5.Lastly, here is my function processText(with:UIImage)
that uses MLKit
func processText(with image: UIImage) {
let vision = Vision.vision()
let textRecognizer = vision.onDeviceTextRecognizer()
let visionImage = VisionImage(image: image)
textRecognizer.process(visionImage) { result, error in
if error != nil {
print("MLKIT ERROR - \(error)")
} else {
let resultText = result?.text
print("MLKIT RESULT - \(resultText)")
}
}
}
Ok, that was a lot, thank you for reading all of that. Alright, so the problem is that this does not work. I do get a proper UIImage
in step 4 so it's not that. Here's a screenshot of an example of what I am trying to scan...
MLKit should be able to easily detect this text. But every time I try, result?.text
is always printed as nil
. I'm out of ideas. Does anyone have any ideas on how to fix this? If so, thanks a lot!