Hindi words length

462 Views Asked by At

I am trying to find out the length of Hindi words in Python, like 'प्रवीण' has length of 3 as per my knowledge.

w1 = 'प्रवीण'
print(len(w1))

I tried this code but it didn't work.

3

There are 3 best solutions below

0
betelgeuse On

In the Hindi language, each character need not be of length one as is in English. For example, वी is not one character but rather two characters combined into one:

So in your case, the word प्रवीण is not of length 3 but rather 6.

w1 = "प्रवीण"
for w in w1:
    print(w)

And the output would be

प
्
र
व
ी
ण
0
Codeman On

As @betelgeuse has said, Hindi does not function the way you think it does. Here's some pseudocode (working) to do what you expect though:

w1 = 'प्रवीण'

def hindi_len(word):
    hindi_letts = 'कखगघङचछजझञटठडढणतथदधनपफबभमक़ख़ग़ज़ड़ढ़फ़यरलळवहशषसऱऴअआइईउऊऋॠऌॡएऐओऔॐऍऑऎऒ'
    # List of hindi letters that aren't halves or mantras
    count = 0
    for i in word:
        if i in hindi_letts:
            count += 1 if word[word.index(i) - 1] != '्' else 0 # Make sure it's not a half-letter
    return count

print(hindi_len(w1))

This outputs 3. It's up to you to customize it as you'd like, though.

Edit: Make sure you use python 3.x or prefix Hindi strings with u in python 2.x, I've seen some language errors with python 2.x non-unicode encoding somewhere before...

1
Rohit Singla On

Writing working kotlin code corresponding to the pseudo code provided by Codeman. This can help you get these 2 things:-

  1. Length of the string in terms of base characters
  2. Split the string into parts on the basis of base characters
const val HINDI_LETTERS = "कखगघङचछजझञटठडढणतथदधनपफबभमक़ख़ग़ज़ड़ढ़फ़यरलळवहशषसऱऴअआइईउऊऋॠऌॡएऐओऔॐऍऑऎऒ"

fun getHindiWordLength(word: String): Int{
    var count = 0
    var n = word.length
    for(i in 0..n-1){
        println(word[i])    //Just to see how each character in the string looks like
        if(word[i] in HINDI_LETTERS && (i == 0 || word[i-1] != '्'))        // Make sure not a half-letter
            count++
    }
    return count
}

fun splitHindiWordOnBaseLetter(word: String): MutableList<String>{
    var n = word.length
    var curWord = ""
    val splitWords: MutableList<String> = mutableListOf()
    for(i in 0..n-1){
        if(word[i] in HINDI_LETTERS && (i > 0 && word[i-1] != '्'))     // Make sure not a half-letter
        {
            splitWords.add(curWord)
            curWord = ""
        }
        curWord += word[i]
    }
    splitWords.add(curWord)         //last letter
    return splitWords
}

I have tested this code on these inputs:-

    println(getHindiWordLength("प्रवीण"))
    println(splitHindiWordOnBaseLetter("प्रवीण"))
    
    println(getHindiWordLength("आम"))
    println(splitHindiWordOnBaseLetter("आम"))
    
    println(getHindiWordLength("पेड़"))
    println(splitHindiWordOnBaseLetter("पेड़"))
    
    println(getHindiWordLength("अक्षर"))
    println(splitHindiWordOnBaseLetter("अक्षर"))
    
    println(getHindiWordLength("दिल"))
    println(splitHindiWordOnBaseLetter("दिल"))

This is the output that I am getting:-

प
्
र
व
ी
ण
3
[प्र, वी, ण]
आ
म
2
[आ, म]
प
े
ड
़
2
[पे, ड़]
अ
क
्
ष
र
3
[अ, क्ष, र]
द
ि
ल
2
[दि, ल]