How to normalize text in hindi language using Python?

486 Views Asked by n0obcoder At 07 June 2025 at 19:53

I am testing an Automatic Speech Recognition model on some audio files containing speech in Hindi language.

I am using WER, Word Error Rate as the metric.

reference (ground truth) - वह शादीशुदा नहीं है
hypothesis(model output) - वह शादी शुदा नहीं है

I need some way to normalize the reference and hypotheses sentences so that the WER makes more sense. The above example should actually have got WER = 0, but because of the space in between शादी शुदा, WER becomes 2/4=0.5

I am not able to find any way to do it for Hindi text.

Can somebody please help me with this? Thanks

Original Q&A

There are 1 best solutions below

AudioBubble On 04 May 2021 at 11:37

I've search 'Normalizing text in Hindi language using Python` on Google and I've got and I got a NLP library developed bt iitB for Hindi texts. You can check out the links below:

https://www.cse.iitb.ac.in/~anoopk/pages/softwares.html

https://github.com/anoopkunchukuttan/indic_nlp_library

Maybe it will help you.

How to normalize text in hindi language using Python?

There are 1 best solutions below

Related Questions in PYTHON-3.X

Related Questions in SPEECH-RECOGNITION

Related Questions in SPEECH-TO-TEXT

Related Questions in TEXT-NORMALIZATION

Trending Questions

Popular # Hahtags

Popular Questions