Improving the accuracy of text recognition when using iOS Vision Framework to scan a document

Question

Improving the accuracy of text recognition when using iOS Vision Framework to scan a document

1.9k Views Asked by Pranet At 28 July 2025 at 08:12

I am trying to build a document scanner that is able to read text off of any document/card. However, it sometimes has trouble identifying text correctly off of a credit card. The accuracy is decent, but there is definitely room for improvement. I used the VisionTextRecognition framework and have used all the standard settings which are the right ones for setting up text recognition.

This is what I had to setup the text recognition request

textRecognitionRequest = VNRecognizeTextRequest(completionHandler: { (request, error) in
            if let results = request.results, !results.isEmpty {
                if let requestResults = request.results as? [VNRecognizedTextObservation] {
                    var foundText = ""
                    for observation in recognizedText {
                        guard let candidate = observation.topCandidates(1).first else { continue }
                     foundText.append(candidate.string + "\n")
                    }
                }
            }
        }) 
        textRecognitionRequest.recognitionLevel = .accurate
        textRecognitionRequest.usesLanguageCorrection = true

Does anyone have any suggestions for improving the identification programmatically by either pre-processing or post-processing the scan at some point?

Original Q&A

There are 1 best solutions below

**Ethan Allen** · Answer 1

UPDATE: I've made a fully open source project that may help you do exactly what you need. Check it out: https://github.com/ethanwa/credit-card-scanner-and-validator

**

You can't do much to improve accuracy beyond adding some preset values to specifically look for, which doesn't make sense with CC numbers so I won't even bother showing that code. You'll need to rely on Apple to improve their text recognition model as iOS iterates for it to truly improve.

What I suggest in the meantime are these two things you can do:

Do validation on your credit card numbers that you think you're recieving. For example, Visa starts with 4, MasterCard starts with 5, Discover with 6, Amex with 3, etc. They have specific lengths and so on. See here: https://www.freeformatter.com/credit-card-number-generator-validator.html
Keep iterating over and over on a camera feed until you get a number that validates. I'm not sure if you are currently just taking a picture of the card, and processing that image (which it sounds like you are doing), but you should be processing many images per second until you get a valid CC. This is most likely how Apple does it when adding a card via Apple Pay on your phone, or when depositing checks digitally using banking apps (finding valid routing and account numbers).

Here's an example of what I mean...

I wrote this code that can pick out and validate ISBN numbers (basically 10 and 13 digit numbers that catalog books, which have a check digit for validation) in any given text and will keep looking until it finds all the numbers and then validates. It works extremely well and is very fast. Check out this Swift 5.3 code:

import UIKit
import Vision
import Photos
import AVFoundation

class ViewController: UIViewController, AVCaptureVideoDataOutputSampleBufferDelegate {
    
    var recognizedText = ""
    var finalText = ""
    var image: UIImage?
    var processing = false
    
    @IBOutlet weak var nameLabel: UILabel!
    @IBOutlet weak var setLabel: UILabel!
    @IBOutlet weak var numberLabel: UILabel!
    
    lazy var textDetectionRequest: VNRecognizeTextRequest = {
        let request = VNRecognizeTextRequest(completionHandler: self.handleDetectedText)
        request.recognitionLevel = .accurate
        request.usesLanguageCorrection = false
        return request
    }()
    
    private let videoOutput = AVCaptureVideoDataOutput()
    private let captureSession = AVCaptureSession()
    private lazy var previewLayer: AVCaptureVideoPreviewLayer = {
        let preview = AVCaptureVideoPreviewLayer(session: self.captureSession)
        preview.videoGravity = .resizeAspect
        return preview
    }()

    // MARK: AV
    
    override func viewDidLoad() {
        super.viewDidLoad()
        self.addCameraInput()
        self.addVideoOutput()
    }
    
    private func addCameraInput() {
        let device = AVCaptureDevice.default(for: .video)!
        let cameraInput = try! AVCaptureDeviceInput(device: device)
        self.captureSession.addInput(cameraInput)
    }
    
    override func viewDidLayoutSubviews() {
        super.viewDidLayoutSubviews()
        self.previewLayer.frame = self.view.bounds
    }
    
    private func addVideoOutput() {
        self.videoOutput.videoSettings = [(kCVPixelBufferPixelFormatTypeKey as NSString) : NSNumber(value: kCVPixelFormatType_32BGRA)] as [String : Any]
        self.videoOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "my.image.handling.queue"))
        self.captureSession.addOutput(self.videoOutput)
    }
    
    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection)
    {
        if !processing
        {
            guard let frame = CMSampleBufferGetImageBuffer(sampleBuffer) else {
                debugPrint("unable to get image from sample buffer")
                return
            }
            print("did receive image frame")
            // process image here
        
            self.processing = true
            
            let ciimage : CIImage = CIImage(cvPixelBuffer: frame)
            let theimage : UIImage = self.convert(cmage: ciimage)
            
            self.image = theimage
            processImage()
        }
    }

    // Convert CIImage to CGImage
    func convert(cmage:CIImage) -> UIImage
    {
         let context:CIContext = CIContext.init(options: nil)
         let cgImage:CGImage = context.createCGImage(cmage, from: cmage.extent)!
         let image:UIImage = UIImage.init(cgImage: cgImage)
         return image
    }
    
    // AV
    
    func processImage()
    {
        DispatchQueue.main.async {
            self.nameLabel.text = ""
            self.setLabel.text = ""
            self.numberLabel.text = ""
        }
        
        guard let image = image, let cgImage = image.cgImage else { return }
        
        let requests = [textDetectionRequest]
        let imageRequestHandler = VNImageRequestHandler(cgImage: cgImage, orientation: .right, options: [:])
        DispatchQueue.global(qos: .userInitiated).async {
            do {
                try imageRequestHandler.perform(requests)
            } catch let error {
                print("Error: \(error)")
            }
        }
    }
    
    fileprivate func handleDetectedText(request: VNRequest?, error: Error?)
    {
        self.finalText = ""
        
        if let error = error {
            print(error.localizedDescription)
            self.processing = false
            return
        }
        guard let results = request?.results, results.count > 0 else {
            print("No text was found.")
            self.processing = false
            return
        }

        if let requestResults = request?.results as? [VNRecognizedTextObservation] {
            self.recognizedText = ""
            for observation in requestResults {
                guard let candidiate = observation.topCandidates(1).first else { return }
                self.recognizedText += candidiate.string
                self.recognizedText += " "
            }
            
            var replaced = self.recognizedText.replacingOccurrences(of: "-", with: "")
            replaced = String(replaced.filter { !"\n\t\r".contains($0) })
            let replacedArr = replaced.components(separatedBy: " ")
            
            for here in replacedArr
            {
                let final = here.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines)

                if (final.count == 10 || final.count == 13) && final.containsISBNnums && Validate.isbn(final) // validate barcode
                {
                    self.finalText += final
                    print(final)
                    self.captureSession.stopRunning()
                    DispatchQueue.main.async {
                        self.previewLayer.removeFromSuperlayer()
                    }
                    break
                }
            }

            DispatchQueue.main.async {
                self.numberLabel.text = self.finalText
            }
        }
        
        self.processing = false
    }
    
    // MARK: Buttons

    // This is a live camera view that will start a capture session
    @IBAction func takePhoto(_ sender: Any) {
        self.view.layer.addSublayer(self.previewLayer)
        self.captureSession.startRunning()
    }
    
    @IBAction func choosePhoto(_ sender: Any) {
        presentPhotoPicker(type: .photoLibrary)
    }
    
    fileprivate func presentPhotoPicker(type: UIImagePickerController.SourceType) {
        let controller = UIImagePickerController()
        controller.sourceType = type
        controller.delegate = self
        present(controller, animated: true, completion: nil)
    }
}

extension ViewController: UIImagePickerControllerDelegate, UINavigationControllerDelegate {
    
    func imagePickerControllerDidCancel(_ picker: UIImagePickerController) {
        dismiss(animated: true, completion: nil)
    }
    
    func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
        
        dismiss(animated: true, completion: nil)
        image = info[.originalImage] as? UIImage
        processImage()
    }
}

extension String {
    var containsISBNnums: Bool {
        guard self.count > 0 else { return false }
        let nums: Set<Character> = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "X"]
        return Set(self).isSubset(of: nums)
    }
}

Improving the accuracy of text recognition when using iOS Vision Framework to scan a document

There are 1 best solutions below

Related Questions in IOS

Related Questions in TEXT-RECOGNITION

Related Questions in VISIONKIT

Trending Questions

Popular # Hahtags

Popular Questions