Statistical Machine Translation from Hindi to English using MOSES

2.2k Views Asked by AvinashK At 27 December 2014 at 17:01

I need to create a Hindi to English translation system using MOSES. I have got a parallel corpora containing about 10000 Hindi sentences and corresponding English translations. I followed the method described in the Baseline system creation page. But, just in the first stage, when I wanted to tokenise my Hindi corpus and tried to execute

~/mosesdecoder/scripts/tokenizer/tokenizer.perl -l hi < ~/corpus/training/hi-en.hi> ~/corpus/hi-en.tok.hi

, the tokeniser gave me the following output:

Tokenizer Version 1.1
Language: hi
Number of threads: 1
WARNING: No known abbreviations for language 'hi', attempting fall-back to English version...

I even tried with 'hin' but it still didn't recognise the language. Can anyone tell the correct way to make the translation system.

Original Q&A

There are 1 best solutions below

alvas On 28 December 2014 at 22:21 BEST ANSWER

Moses does not support Hindi for tokenization, the tokenizer.perl uses the nonbreaking_prefix.* files (from https://github.com/moses-smt/mosesdecoder/blob/master/scripts/tokenizer/tokenizer.perl#L516)

The languages available with nonbreaking prefixes from Moses are:

ca: Catalan
cs: Czech
de: German
el: Greek
en: English
es: Spanish
fi: Finnish
fr: French
hu: Hungarian
is: Icelandic
it: Italian
lv: Latvian
nl: Dutch
pl: Polish
pt: Portugese
ro: Romanian
ru: Russian
sk: Slovak
sl: Slovene
sv: Swedish
ta: Tamil

from https://github.com/moses-smt/mosesdecoder/tree/master/scripts/share/nonbreaking_prefixes

However all hope is not lost, you can surely tokenize your text with other tokenizers before training machine translation model with Moses, try Googling "Hindi Tokenziers", there are tonnes of them around.

Statistical Machine Translation from Hindi to English using MOSES

There are 1 best solutions below

Related Questions in HINDI

Related Questions in MACHINE-TRANSLATION

Related Questions in MOSES

Trending Questions

Popular # Hahtags

Popular Questions