I'm implementing for Lexical parsing in Tamil Language. I need to replace a Text Element value by following condition
string[] ugaramStrings = { "கு", "சு", "டு", "து", "பு", "று" };
string[] tamilvowels =
{
"அ",// "\u0b85"
"ஆ",//"\u0b86"
"இ",//"\u0b87"
"ஈ",//"\u0b88"
"உ",//"\u0b89"
"ஊ",//"\u0b8A"
"எ",// "\u0b8E"
"ஏ",//"\u0b8F"
"ஐ",//"\u0b90"
"ஒ",//"\u0b92"
"ஓ",//"\u0b93"
"ஔ"//"\u0b94"
};
if any word having element from ugaramStrings and tamil vowel element by consecutive. Is need to be replace ugaram string and return the value.
for eg.அமர்ந்*துஇ*னிது replaced as அமர்ந்*இ*னிது. i.e துஇ=>இ
I've done it by checking next string element using TextElementEnumerator Class. Is it any possiblity is avail so that replace it by using RegularExpression
Try this:
it seems to work correctly. The
str2
will contain the replaced string whilematches
will contain all the matchesNote that ugaram characters are composed characters, for example, so each ugaram "character" uses two C#
char
s.For example கு is 'க' + 'ு'.
This is illegal:
This is legal:
For this reason you can't simply
[குசுடுதுபுறு]
but you have to(கு|சு|டு|து|பு|று)
.