Converting Google Cloud Vision OCR X and Y Co-ordinates to bbox Co-ordinates

Question

Converting Google Cloud Vision OCR X and Y Co-ordinates to bbox Co-ordinates

478 Views Asked by Muneeb Ahmad Khurram At 26 June 2025 at 07:04

Google Cloud Vision OCR has the following Output for a bounding box Object.

vertices {
  x: 786
  y: 967
}

Desired Output Format for Bounding Box

I want to go ahead and convert these co-ordinates to bounding box co-ordinates to write them in my hOCR format. Which includes the following format for writing those in the file.

  <span class='ocr_line' title="bbox 348 797 1482 838; baseline -0.009 -6">

Questions?

So how can I convert these x and y co-ordinates to these bbox (Bounding Box Co-ordinates).
What are these x and y co-ordinates is it (x_min, y_max) or (x_max, y_min). In general I want to also know what do these x and y represent?

Working on Image

I am working on the following image as my test.

Original Q&A

There are 1 best solutions below

**ewertonvsilva** · Accepted Answer

As informed by @Christoph Rackwitz in the coments, this value is just a point. Each letter will be indicated by a set of 4 of this points, creating a bbox, like the following:

{
          "description": "وأما",
          "boundingPoly": {
            "vertices": [
              {
                "x": 1088,
                "y": 230
              },
              {
                "x": 1145,
                "y": 230
              },
              {
                "x": 1145,
                "y": 289
              },
              {
                "x": 1088,
                "y": 289
              }
            ]
          }
        },

And the entire Page will be on the first object, as following:

        {
          "locale": "ar",
          "description": "وأما ثانيا : فلأنه يخرج منه من زنی مثلا ثم جب ذكرة فإنه\nلا يتأتی\nمنه غير الندم على ما مضی ، وأما العزم على عدم\nقال : إن الندم\nيكفي في حد التوبة ، وليس كما قال ؛ لانه لو ندم ولم يقلع\nوعزم على العود لم يكن تائبا اتفاقا ، قال : وقال بعض المحق قين :\nاختيار ترك ذنب سبق حقيقة. أو تقديرا لأجل ال له قال :\nالعود فلا يتصور منه ، قال : وبهذا اغتر من\nهي\nوهذا أسد العبارات وأجمعها لأن التائب لا يكون تار کا\nل لذنب الذي فرغ لأنه غير متمكن من عينه لا تركا ولا فعلا ،\nمثله حقيقة ، وكذا من لم يقع منه ذنب\nمتمكن\nوإنما هو\nمن\nإنما يصح منه اتقاء ما يمكن أن يقع لا ترك مثل ما وقع فيكون\nمتقيا لا تائبا ، قال : والباعث على هذا تنبيه إلهي لمن أراد\nمهلك يفوث على\nلأنه\nسم\nسعادته لقبح الذنب وضر ره ؛\nالإنسان سعادة الدنيا والآخرة ويحجبة عن معرفة ال له. تعالي في\nالدنيا ، وعن تقريبه في الآخرة\nقال : ومن تفقد نفسه وجدها مشحونة بهذا السم فإذا وفق\nانبعث منه خوف هجوم الهلاك عليه ، فيبادر بطلب ما يدفع\n",
          "boundingPoly": {
            "vertices": [
              {
                "x": 122,
                "y": 223
              },
              {
                "x": 1197,
                "y": 223
              },
              {
                "x": 1197,
                "y": 1688
              },
              {
                "x": 122,
                "y": 1688
              }
            ]
          }
        },

You can process the json with the 4 points to generate the object you need.
Check this page, where you can Try the API. I have used the image url of the upload on stack overflow as the image source (i.e "imageUri": "https://i.stack.imgur.com/9MXec.jpg")

Converting Google Cloud Vision OCR X and Y Co-ordinates to bbox Co-ordinates

Desired Output Format for Bounding Box

Questions?

Working on Image

There are 1 best solutions below

Related Questions in OCR

Related Questions in GOOGLE-CLOUD-VISION

Related Questions in HOCR

Trending Questions

Popular # Hahtags

Popular Questions