Google Cloud Vision OCR has the following Output for a bounding box Object.
vertices {
x: 786
y: 967
}
Desired Output Format for Bounding Box
I want to go ahead and convert these co-ordinates to bounding box co-ordinates to write them in my hOCR format. Which includes the following format for writing those in the file.
<span class='ocr_line' title="bbox 348 797 1482 838; baseline -0.009 -6">
Questions?
- So how can I convert these x and y co-ordinates to these bbox (Bounding Box Co-ordinates).
- What are these
x
andy
co-ordinates is it (x_min
,y_max
) or (x_max
,y_min
). In general I want to also know what do these x and y represent?
Working on Image
I am working on the following image as my test.
As informed by @Christoph Rackwitz in the coments, this value is just a point. Each letter will be indicated by a set of 4 of this points, creating a bbox, like the following:
And the entire Page will be on the first object, as following:
url
of the upload on stack overflow as the image source (i.e"imageUri": "https://i.stack.imgur.com/9MXec.jpg"
)