How to find and provide advance and displacement values of a glyph into pdf content stream

172 Views Asked by At

I have to write a multi lingual text a pdf using C++. I have unicode values as well as glyph id values with their advances and displacements for the string input.

But I need to know how to position the dependent glyph with the independent base glyph. Suppose if I have a advance and displacement values using FreeType / HarfBuzz, how should I input these values into the pdf content stream along with the glyph ids in the input.

I have tried the output values of FreeType & HarfBuzz, which could print the individual glyphs properly, but the positioning of the glyphs with its base glyph is not proper still, even if i used the advance and displacement values given in their outputs.

I just need the logic of how to use the output values in the content stream to deliver a proper readable word/letter.


Example: Text = tamil letter + hindi letter.

I need to print this output.proper output

But currently only I am able to print this. improper output


Tamil combined letter:

வ = U+0BB5 TAMIL LETTER VA = base glyph

ா = U+0BBE TAMIL VOWEL SIGN AA = dependent glyph

HarfBuzz run:

hb-shape.exe -O json -u u+0bb5,u+0bbe --no-glyph-names "C:\\Windows\\Fonts\\Nirmala.ttf"

gid output:

[{"g":2953,"cl":0,"dx":0,"dy":0,"ax":2111,"ay":0},{"g":2959,"cl":0,"dx":0,"dy":0,"ax":1453,"ay":0}]

Hindi combined letter:

म = U+092E DEVANAGARI LETTER MA = base glyph

ि = U+093F DEVANAGARI VOWEL SIGN I = dependent glyph

HarfBuzz run:

hb-shape.exe -O json -u u+092e,u+093f --no-glyph-names "C:\\Windows\\Fonts\\Nirmala.ttf"

gid output:

[{"g":302,"cl":0,"dx":0,"dy":0,"ax":532,"ay":0},{"g":273,"cl":0,"dx":0,"dy":0,"ax":1379,"ay":0}]

Subjecting these output values into the formula, PDF doc formula

Assuming unity for all variables except width and advance, by obtaining the width value using FreeType and computing them.

Glyph Advance values for four glyphs in order:

tx = 1769
tx = 1132
tx = 1586
tx = 1448

If I provide these values in the content stream in the order as

<glyph id 1> tx 1 <glyph id 2> tx 2 <glyph id 3> tx 3 <glyph id 4> tx 4

Content stream:

/OC /oc2 BDC q BT /FXF1 1 Tf 70.866142 0.000000 0.000000 70.866142 28.346457 141.732285 Tm[<0B89>-1769<0B8F>-1132<0111>-1586<012E>-1448]TJ  ET Q EMC

PDF Doc says (+)ve value of advances will move the text towards left. Is it other way...? Or if the difference of the advances is to be obtained...?


Additional PDF objects:

Font descriptor object,Base font object,Font object.

I have tried using only advance values and only computed values also. The only problem is the horizontal & vertical space within combined glyphs, which also affects the spacing between subsequent glyphs.

Any of these does not render the glyphs as legible, atleast in a generalised programmatic manner.


From my analysis of @mkl at various stack overflow places, I suspect the need for individual transformation matrix or Td for each glyph. But is it that complex...? As per my thought, it must be easily be rendered.

If individual transformation matrix or Td is the need, then how to compute the values to be supplied in for them.


Any help & guidance is welcome and much appreciated. Thank you.

1

There are 1 best solutions below

4
On

It helps to work out pdf as plain text you can compile by save in notepad.

Here I am altering a batch.cmd (work in progress :-) to test my compiler handles the changes as text but you can use raw pdf in editor too. beware cut and paste may need a value or two changed Also unknown yet how you can easily reference non Latin fonts (next hurdle after images, which are almost done), so I used "symbol" font as illustrative of those positioning mods.

Note for specific queries @mkl is the expert I simply do programming by examples, that function not by the book.

enter image description here

%PDF-1.0
%µ¶µ¶

1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj

2 0 obj<</Type/Pages/Count 1/Kids[3 0 R]>>endobj

3 0 obj<</Type/Page/Parent 2 0 R/MediaBox [0 0 594 792]/Resources<</Font<< /F1 4 0 R /F2 5 0 R>>>>/Contents 6 0 R>>endobj

4 0 obj<</Type/Font/Subtype/Type1/BaseFont/Helvetica>>endobj

5 0 obj<</Type/Font/Subtype/Type1/BaseFont/Symbol>>endobj

%Comment the following /Length 0999 is a dummy value it should be altered to equal decimal stream length, but most readers will ignore or work around  invalid
6 0 obj<</Length 1326>>
stream
q
BT /F1 20 Tf 072 740 Td (20 units (default units usually = pts) high Headline) Tj ET
BT /F1 16 Tf 036 700 Td (All text is "Body" text. (no heads or tails)) Tj ET
BT /F1 10 Tf 004 780 Td (Text can be any order see "Body" text above. (Printed by Filename="C:\Users\K\Downloads\Programming\CMDaPDF\MAKE2PDF.cmd") spot the escape errors) Tj ET
BT /F1 12 Tf 036 675 Td (Here @ 12 units high you must include just enough text for parts of a line. PDF has no page feeds no wrapping,) Tj 0 -20 Td (nor \\new line feed, no ¶aragraphs) Tj 86 -15 Td (nor carriage \r\\return. \n\r  ) Tj 100 5 Td (       It is not \007\010\011\012\\tabular, each page is one row of multiple pages,) Tj 50 -15 Td (each   page   is   one   text   column   wide .[ ×] no yes check) Tj 0 -10 Td (each   row     is   one   text   column   wide .[x] no is yes) Tj 0 -10 Td (each   row     is   one   text   column   wide .  · bullet point OK) Tj ET
BT +0.50 Tc -1.4 Tw 999 TL /F1 1 Tf 15 001 10. 30 200.000 440.000 Tm [(Jane A)600(usten)] TJ ET
BT +0.50 Tc 0.00 Tw 000 TL /F2 1 Tf 15 000 000 15 200.000 430.000 Tm [(Ja)-1000(ne Austen)] TJ ET
BT -1.20 Tc 0.00 Tw 999 TL /F2 1 Tf 15 000 000 15 200.000 420.000 Tm [(J)-1200(a)800(ne Austen)] TJ ET
BT +0.00 Tc 0.00 Tw 000 TL /F2 1 Tf 15 000 000 15 200.000 410.000 Tm [(Jane A)100(us)-500(ten)] TJ ET
Q

endstream

xref
0 7 
0000000000 65535 f
0000000019 00000 n
0000000065 00000 n
0000000117 00000 n
0000000242 00000 n
0000000306 00000 n
0000000527 00000 n

trailer<</Size 7/Root 1 0 R>>
startxref
1903
%%EOF