Apple VisionKit: scanning receipt structure accurately

209 Views Asked by At

I'm using Apple's VisionKit framework OCR scan receipt images. My goal is to extract each item with its respective price from the image. VisionKit's extraction works great -- the text is all pulled off of the image, however, the text structure always seems incorrect. For example using this receipt: receipt image

I get the following text output from VNRecognizeTextRequest:

"44\nBierhaus\nBierhaus NYC\n712 Third Avenue\nNew York, NY 10017\nServer: Tiffany R\nCheck #44\nGuest Count: 3\nOrdered:\n1 Lt Delicator\n1L Hofbräu Dunkel\n.5L Hofbräu Original\nBierhaus Burger\nWell Done\nBratwurst\nSpicy Mustard\nChicken Bratwurst\nSpicy Mustard\nSubtotal\nTax\nTotal\nTable 304\n12/31/23 7:27 PM\n$25.00\n$19.00\n$10.00\n$19.00\n$17.00\n$17.00\n$107.00\n$9.48\n$116.48\nSuggested Tip:\n22%: (Tip $23.54\nTotal $140.02)\n20%: (Tip $21.40 Total $137.88)\n18%: (Tip $19.26 Total $135.74)\nTip percentages are based on the check\nprice before taxes.\nOktoberfest All Year Round!\nPowered by Hofbräu Bier"

OCR results text

Are there config settings that I can change with VNRecognizeTextRequest to recognize the items inline with their respective prices? The resulting data seems super unstructured. How would you recommend I parse this data to achieve my goal (digitizing a list of the items with their prices). I'm looking for format (Total, 116.48)

My code:

import Foundation
import Vision
import VisionKit

final class TextRecognizer {
    let cameraScan: VNDocumentCameraScan?
    let uploadedImage: UIImage?

    init(cameraScan: VNDocumentCameraScan?, image: UIImage?) {
        self.cameraScan = cameraScan
        self.uploadedImage = image
    }

    private let queue = DispatchQueue(label: "scan-codes", qos: .default, attributes: [], autoreleaseFrequency: .workItem)

    func recognizeText(withCompletionHandler completionHandler: @escaping ([String]) -> Void) {
        queue.async {
            let minimumTextHeight: Float = 1

            if let cameraScan = self.cameraScan {
                let images = (0..<cameraScan.pageCount).compactMap({
                    cameraScan.imageOfPage(at: $0).cgImage
                })
                let imagesAndRequests = images.map({ (image: $0, request: VNRecognizeTextRequest()) })

                let textPerPage = imagesAndRequests.map { image, request -> String in
                    // Configure the request for receipt recognition
                    request.recognitionLevel = .accurate
                    request.usesLanguageCorrection = true
                    request.minimumTextHeight = minimumTextHeight

                    let handler = VNImageRequestHandler(cgImage: image, options: [:])
                    do {
                        try handler.perform([request])
                        guard let observations = request.results else { return "" }
                        return observations.compactMap({ $0.topCandidates(1).first?.string }).joined(separator: "\n")
                    } catch {
                        print(error)
                        return ""
                    }
                }

                DispatchQueue.main.async {
                    completionHandler(textPerPage)
                }
            } else if let image = self.uploadedImage {
                let images = (0...0).compactMap({ _ in
                    image.cgImage
                })
                let imagesAndRequests = images.map({ (image: $0, request: VNRecognizeTextRequest()) })

                let textPerPage = imagesAndRequests.map { image, request -> String in
                    // Configure the request for receipt recognition
                    request.recognitionLevel = .accurate
                    request.usesLanguageCorrection = false
                    request.minimumTextHeight = minimumTextHeight

                    let handler = VNImageRequestHandler(cgImage: image, options: [:])
                    do {
                        try handler.perform([request])
                        guard let observations = request.results else { return "" }
                        return observations.compactMap({ $0.topCandidates(1).first?.string }).joined(separator: "\n")
                    } catch {
                        print(error)
                        return ""
                    }
                }

                DispatchQueue.main.async {
                    completionHandler(textPerPage)
                }
            }
        }
    }
}
0

There are 0 best solutions below