For example, I'd like to detect a coded string like "A5b1x" written in handwriting. So I'd either split it up manually so that I have an image of each character, or try to have Vision recognize it immediately. Neither is working for now, as I'm not sure how to specify that it's not a language (or specify that it's singular characters). This is what I typed in a Google compute instance:
gcloud ml vision detect-document "weblink to image"
No result for image of "g": g
No result for image of "e": e
Result for image of "fxb3":fxb3
{
"responses": [
{
"fullTextAnnotation": {
"pages": [
{
"blocks": [
{
"blockType": "TEXT",
"boundingBox": {
"vertices": [
{
"x": 2433,
"y": 1289
},
{
"x": 1498,
"y": 1336
},
{
"x": 1468,
"y": 737
},
{
"x": 2403,
"y": 691
}
]
},
"confidence": 0.56,
"paragraphs": [
{
"boundingBox": {
"vertices": [
{
"x": 2433,
"y": 1289
},
{
"x": 1498,
"y": 1336
},
{
"x": 1468,
"y": 737
},
{
"x": 2403,
"y": 691
}
]
},
"confidence": 0.56,
"words": [
{
"boundingBox": {
"vertices": [
{
"x": 2433,
"y": 1289
},
{
"x": 1498,
"y": 1336
},
{
"x": 1468,
"y": 737
},
{
"x": 2403,
"y": 691
}
]
},
"confidence": 0.56,
"symbols": [
{
"boundingBox": {
"vertices": [
{
"x": 2433,
"y": 1289
},
{
"x": 2135,
"y": 1304
},
{
"x": 2105,
"y": 706
},
{
"x": 2403,
"y": 691
}
]
},
"confidence": 0.4,
"text": "\u0967"
},
{
"boundingBox": {
"vertices": [
{
"x": 2063,
"y": 1308
},
{
"x": 1788,
"y": 1322
},
{
"x": 1758,
"y": 723
},
{
"x": 2033,
"y": 710
}
]
},
"confidence": 0.62,
"text": "\u0967"
},
{
"boundingBox": {
"vertices": [
{
"x": 1750,
"y": 1323
},
{
"x": 1498,
"y": 1336
},
{
"x": 1468,
"y": 737
},
{
"x": 1720,
"y": 725
}
]
},
"confidence": 0.67,
"property": {
"detectedBreak": {
"type": "LINE_BREAK"
}
},
"text": "X"
}
]
}
]
}
]
}
],
"height": 2112,
"width": 4608
}
],
"text": "\u0967\u0967X\n"
},
"textAnnotations": [
{
"boundingPoly": {
"vertices": [
{
"x": 1467,
"y": 690
},
{
"x": 2432,
"y": 690
},
{
"x": 2432,
"y": 1335
},
{
"x": 1467,
"y": 1335
}
]
},
"description": "\u0967\u0967X\n",
"locale": "und"
},
{
"boundingPoly": {
"vertices": [
{
"x": 2433,
"y": 1289
},
{
"x": 1498,
"y": 1336
},
{
"x": 1468,
"y": 737
},
{
"x": 2403,
"y": 691
}
]
},
"description": "\u0967\u0967X"
}
]
}
]
}
The Google Cloud Vision API is not able to recognise single characters at this point. There is a feature request submitted with regard to character recognition here. Please star it so that you could receive updates about this feature request and do not hesitate to add additional comments to provide details of the desired implementation.
With respect to your question about recognising "coded" strings, the Vision API is able to do that. I have successfully tried to pass an image with fxb3 to the API and the results were good (here is image1 and image2). The response you are getting from the API is two consecutive unicode characters and "x". The quality of the writing is what is causing the response to be quite poor. The model for OCR is constantly being improved, but at this point it cannot properly detect what might be considered rather unclear handwriting.