How are connected letters in devanagari rendered?

1k Views Asked by At

Consider the letters in the picture below.

The first row shows the letters themselves, the second row numbers them and third row shows their unicode code point encoded as three hex UFT-8 bytes. For example, the letter 2 is DEVANAGARI LETTER MA with code point 0x92E (= 2350 decimal), which is encoded as three hex UTF-8 bytes: e0, a4, ae.

My question is regarding the rendering of the specific connected letter such as (1). How is this rendering handled by the rendering system? The way we typically input this connected letter is by first entering letter 2, then letter 4 (indicating our intent to join this letter with the next one) and then letter 3. Then, the rendering system respects the joining action by erasing the vertical line in the letter 2 and overlaying the letter 4 right there. It is not clear to me that the font for both complete letter 2 and its vertical-line-erased-half (shown with the faint red oval) is available in the chosen font.

Can someone explain how this works?

enter image description here

2

There are 2 best solutions below

0
On

Font files are more than a bunch of shapes for each letter. They contain various tables that dictate how glyphs behave.

There are:

  • Tables for positioning glyphs
  • Tables for substituting glyphs
  • Tables for classifying glyphs and for providing a ligature caret table
  • Tables for baseline placement
  • ...

See also: https://fontforge.github.io/gposgsub.html

Which font features are needed is depending on the writing system (Latin, Cyrillic, Arabic, Devanagari) and how their glyphs ought to behave. What tables are used is depending on the font designer the font file type (what is designed and what can be stored). What features are displayed is depending on the font renderer (sometimes font instructions are ignored by the renderer).

Back to your question. It is a substitution. What exactly happens is described by the information in the tables in the font file itself. If you really want to know what happens you have to open the font in an editor and inspect the various tables. I suggest to use FontForge (free and gratis).

The moral of the story is that font files are not only aesthetic letter shapes but pieces of software.

1
On

Read about decomposition and normalization in Unicode® Standard Annex #15 - UNICODE NORMALIZATION FORMS; for instance, both canonical and compatibility equivalences are explained in more detail in Chapter 2, General Structure, and Chapter 3, Conformance, in [Unicode] - The Unicode Standard:

A font and its associated rendering process define an arbitrary mapping from Unicode characters to glyphs. Some of the glyphs in a font may be independent forms for individual characters; others may be rendering forms that do not directly correspond to any single character.

Text rendering requires that characters in memory be mapped to glyphs. The final appearance of rendered text may depend on context (neighboring characters in the memory representation), variations in typographic design of the fonts used, and formatting information (point size, superscript, subscript, and so on). The results on screen or paper can differ considerably from the prototypical shape of a letter or character, as shown in Figure 2-3.

Figure 2-3

For the Latin script, this relationship between character code sequence and glyph is relatively simple and well known; for several other scripts, it is documented in this standard. However, in all cases, fine typography requires a more elaborate set of rules than given here. The Unicode Standard documents the default relationship between character sequences and glyphic appearance for the purpose of ensuring that the same text content can be stored with the same, and therefore interchangeable, sequence of character codes.