What is the tamil character range in UTF-8?

2.1k Views Asked by At

I am not a tamil speaker, however, for simple NLP applications I am developing, I have to detect whether characters in a python string (mixed with digits, punctuation, HTML tags) are tamil or not. If not, simply I have to remove the character. The concept is simple, but even after much searching, I am unable to find the tamil character range in UTF-8. Some help will be needed. Is it a continuous block of numbers such as 65 to 90 in ASCII capital letters? Or do I have to develop something more sophisticated to check each character?

1

There are 1 best solutions below

0
On

Wikipedia on Tamil script:

Unicode range: U+0B80–U+0BFF