Clustering word boxes into text blocks

109 Views Asked by At

I have a huge collection of images where I perform Google OCR and retrieve the location of the texts (my main purpose is to detect the texts, not recognizing). Google does a pretty good job at detecting individual words or even letters. However the situation is not the same for recognizing text blocks or paragraphs.

The results for words:

words

The results for paragraphs:

paragraph

And text blocks:

text blocks

I want to seperate each text block (at least columns just for now) so that I can do some extra operations, which is out of the scope of this question. What is the way of merging these individual word boxes into one big block of text? I tried to fill each box with rectangle so it is easier to do some operations with OpenCV (like the following, a different picture from above)

filled rectangles example

Any suggestions?

0

There are 0 best solutions below