I am trying to find out the length of Hindi words in Python, like 'प्रवीण' has length of 3 as per my knowledge.
w1 = 'प्रवीण'
print(len(w1))
I tried this code but it didn't work.
I am trying to find out the length of Hindi words in Python, like 'प्रवीण' has length of 3 as per my knowledge.
w1 = 'प्रवीण'
print(len(w1))
I tried this code but it didn't work.
On
As @betelgeuse has said, Hindi does not function the way you think it does. Here's some pseudocode (working) to do what you expect though:
w1 = 'प्रवीण'
def hindi_len(word):
hindi_letts = 'कखगघङचछजझञटठडढणतथदधनपफबभमक़ख़ग़ज़ड़ढ़फ़यरलळवहशषसऱऴअआइईउऊऋॠऌॡएऐओऔॐऍऑऎऒ'
# List of hindi letters that aren't halves or mantras
count = 0
for i in word:
if i in hindi_letts:
count += 1 if word[word.index(i) - 1] != '्' else 0 # Make sure it's not a half-letter
return count
print(hindi_len(w1))
This outputs 3. It's up to you to customize it as you'd like, though.
Edit: Make sure you use python 3.x or prefix Hindi strings with u in python 2.x, I've seen some language errors with python 2.x non-unicode encoding somewhere before...
On
Writing working kotlin code corresponding to the pseudo code provided by Codeman. This can help you get these 2 things:-
const val HINDI_LETTERS = "कखगघङचछजझञटठडढणतथदधनपफबभमक़ख़ग़ज़ड़ढ़फ़यरलळवहशषसऱऴअआइईउऊऋॠऌॡएऐओऔॐऍऑऎऒ"
fun getHindiWordLength(word: String): Int{
var count = 0
var n = word.length
for(i in 0..n-1){
println(word[i]) //Just to see how each character in the string looks like
if(word[i] in HINDI_LETTERS && (i == 0 || word[i-1] != '्')) // Make sure not a half-letter
count++
}
return count
}
fun splitHindiWordOnBaseLetter(word: String): MutableList<String>{
var n = word.length
var curWord = ""
val splitWords: MutableList<String> = mutableListOf()
for(i in 0..n-1){
if(word[i] in HINDI_LETTERS && (i > 0 && word[i-1] != '्')) // Make sure not a half-letter
{
splitWords.add(curWord)
curWord = ""
}
curWord += word[i]
}
splitWords.add(curWord) //last letter
return splitWords
}
I have tested this code on these inputs:-
println(getHindiWordLength("प्रवीण"))
println(splitHindiWordOnBaseLetter("प्रवीण"))
println(getHindiWordLength("आम"))
println(splitHindiWordOnBaseLetter("आम"))
println(getHindiWordLength("पेड़"))
println(splitHindiWordOnBaseLetter("पेड़"))
println(getHindiWordLength("अक्षर"))
println(splitHindiWordOnBaseLetter("अक्षर"))
println(getHindiWordLength("दिल"))
println(splitHindiWordOnBaseLetter("दिल"))
This is the output that I am getting:-
प
्
र
व
ी
ण
3
[प्र, वी, ण]
आ
म
2
[आ, म]
प
े
ड
़
2
[पे, ड़]
अ
क
्
ष
र
3
[अ, क्ष, र]
द
ि
ल
2
[दि, ल]
In the Hindi language, each character need not be of length one as is in English. For example,
वीis not one character but rather two characters combined into one:So in your case, the word
प्रवीणis not of length 3 but rather 6.And the output would be