I have some text in a pdf that has been OCR'ed.
The OCR returns the bounding boxes of the words to me.
I'm able to draw the bounding boxes (wordRect
) on the pdf and everything seems correct.
But when i tell my fontsize to be the height of these bounding boxes, it all goes wrong. The text appears way smaller than it should be and doesn't match the height.
There's some conversion i am missing. How can i make sure the text is as high as the bounding boxes?
pdftron.PDF.Font font = pdftron.PDF.Font.Create(convertedPdf.GetSDFDoc(), pdftron.PDF.Font.StandardType1Font.e_helvetica);
for (int j = 0; j < ocrStream.pr_WoordList.Count; j++)
{
wordRect = (Rectangle) ocrStream.pr_Rectangles[j];
Element textBegin = elementBuilder.CreateTextBegin();
gStateTextRun = textBegin.GetGState();
gStateTextRun.SetTextRenderMode(GState.TextRenderingMode.e_stroke_text);
elementWriter.WriteElement(textBegin);
fontSize = wordRect.Height;
double descent;
if (hasColorImg)
{
descent = (-1 * font.GetDescent() / 1000d) * fontSize;
textRun = elementBuilder.CreateTextRun((string)ocrStream.pr_WoordList[j], font, fontSize);
//translate the word to its correct position on the pdf
//the bottom line of the wordrectangle is the baseline for the font, that's why we need the descender
textRun.SetTextMatrix(1, 0, 0, 1, wordRect.Left, wordRect.Bottom + descent );
The font_size is just a scaling factor, which in most cases does map to 1/72 inch (pt), but not always.
The transformations are:
GlyphSpace
->TextSpace
->UserSpace
(whereUserSpace
is essentially the page space, and is 1/72 inch)The
glyphs
in thefont
are defined inGlyphSpace
, and there is a font matrix that maps toTextSpace
. Typically, 1000 units maps to 1 unit in test space, but not always.Then the
text matrix
(element.SetTextMatrix
), thefont size
(variable in question here) and some additional parameters, transformTextSpace
coordinates toUserSpace
.In the end though, the exact height, depends on the glyph also.
This forum post shows how to go from the glyph data, to UserSpace. See
ProcessElements
https://groups.google.com/d/msg/pdfnet-sdk/eOATUHGFyqU/6tsUF0BHukkJ