We are wondering if there is any method to split a Kannada word to get the syllabic clusters using JavaScript.
For example, I want to split the word ಕನ್ನಡ
into the syllabic clusters ["ಕ", "ನ್ನ", "ಡ"]
. But when I split it with split
, the actual array obtained is ["ಕ", "ನ", "್", "ನ", "ಡ"]
I cannot say that this is a complete solution. But works to an extent with some basic understanding of how words are formed:
As the comments in the code say, we keep appending chars to previous char as long as they are not
swara
orvyanjana
or previous char was avirama
. You might have to work with different words to make sure you cover different cases. This particular case doesn't cover the numbers.For Character codes you can refer to this link: http://www.unicode.org/charts/PDF/U0C80.pdf