I am trying to represent devanagari characters on a screen, but in the dev environment where I'm programming I don't have unicode support. Then, to write characters I use binary matrices to color the related screen's pixels. I sorted these matrices according to the unicode order. For the languages that uses the latin alphabet I had no issues, I only needed to write the characters one after the other to represent a string, but for the devanagari characters it's different.
In the devanagari script some characters, when placed next to others can completely change the appearance of the word itself, both in the order and in the appearance of the characters. The resulting characters are considered as a single character, but when read as unicode they actually return 2 distinct characters.
This merging sometimes occurs in a simple way:
क + ् = क्
ग + ् = ग्
फ + ि = फि
But other times you get completely different characters:
क + ् + क = क्क
ग + ् + घ = ग्घ
क + ् + ष = क्ष
I found several papers describing the complex grammatical rules that determine how these characters merges (https://www.unicode.org/versions/Unicode8.0.0/UnicodeStandard-8.0.pdf), but the more I look into it the more I realize that I need to learn Hindi for understand that rules and then create an algorithm.
I would like to understand the principles behind these characters combinations but without necessarily having to learn the Hindi language. I wonder if anyone before me has already solved this problem or found an alternative solution and would like to share it with me.
Whether Devanagari text is encoded using Unicode or ISCII, display of the text requires a combination of shaping engine and font data that maps a string of characters into an appropriate sequence of positioned glyphs. The set of glyphs needed for Devanagari will be a fair bit larger than the initial set of characters.
The shaping steps involves an analysis of clusters, re-ordering of certain elements within clusters, substitution of glyphs, and finally positioning adjustments to the glyphs. Consider this example:
क + ् + क + ि = क्कि
The cluster analysis is needed to recognize elements against a general cluster pattern — e.g., which comprise the "base" consonant within the cluster, which are additional consonants that will conjoin to it, which are vowels and what the type of vowel with regard to visual positioning. In that sequence, the <ka, virama, ka> sequence will form a base that vowel or other marks are positioned relative to. The second ka is the "base" consonant and the inital <ka, virama> sequence will conjoin as a "half" form. And the short-i vowel is one that needs to be re-positioned to the left of the conjoined-consonant combination.
The Devanagari section in the Unicode Standard describes in a general way some of the actions that will be needed in display, but it's not a specific implementation guide.
The OpenType font specification supports display of scripts like Devanagari through a combination of "OpenType Layout" data in the font plus shaping implementations that interact with that data. You can find documentation specifically for Devanagari font implementations here:
https://learn.microsoft.com/en-us/typography/script-development/devanagari
You might also find helpful the specification for the "Universal Shaping Engine" that several implementations use (in combination with OpenType fonts) for shaping many different scripts:
https://learn.microsoft.com/en-us/typography/script-development/use
You don't necessarily need to use OpenType, but you will want some implementation with the functionality I've described. If you're running in a specific embedded OS environment that isn't, say, Windows IOT, you evidently can't take advantage of the OpenType shaping support built into Windows or other major OS platforms. But perhaps you could take advantage of Harfbuzz, which is an open-source OpenType shaping library:
https://github.com/harfbuzz/harfbuzz
This would need to be combined with Devanagari fonts that have appropriate OpenType Layout data, and there are plenty of those, including OSS options (e.g., Noto Sans Devanagari).