Hello everyone I have a problem
I have this word "बन्दूक" that in the counting notepad there are 3 characters but with the following code "_charaters = list(line)" there are 6 characters.
How could I get only the 3 characters?
Example:
- ब
- न्दू
- क
Hello everyone I have a problem
I have this word "बन्दूक" that in the counting notepad there are 3 characters but with the following code "_charaters = list(line)" there are 6 characters.
How could I get only the 3 characters?
Example:
Andj
On
An alternative approach is to use pyicu for grapheme segmentation using a break iterator. ICU4C provides grapheme, word and sentence break iterators for a range of locales.
import icu
def get_boundaries(loc, s):
bi = icu.BreakIterator.createCharacterInstance(loc)
bi.setText(s)
boundaries = [*bi]
boundaries.insert(0, 0)
return boundaries
def get_graphemes(loc, text):
boundary_indices = get_boundaries(loc, text)
return [text[boundary_indices[i]:boundary_indices[i+1]] for i in range(len(boundary_indices)-1)]
print(get_graphemes(icu.Locale('hi'), "बन्दूक"))
# ['ब', 'न्दू', 'क']
Copyright © 2021 Jogjafile Inc.
Maybe you are looking for
pyuegcmodule:Example (partially commented, string "बन्दूक" hard-coded):
Result:
.\SO\78102711.py