Is there a way in Polyglot to permanently "fix" the language code of an Hebrew text from ''iw'' to ''he''?

519 Views Asked by At

I want to make a simple sentiment analysis on a Hebrew text using Polyglot in python 3.6. The problem is that Polyglot recognizes the text language code as "iw" and not as "he", and therefore is not able to process it.

As shown at: use polyglot package for Named Entity Recognition in hebrew I've already added hint_language_code = 'he' to the Text function call, but it only changes the initial form of the text, not its sub-forms (like sentences or words).

For example:

Input:

import polyglot
from polyglot.text import Text, Word

article='איך ניתן לנתח טקסט בעברית? והאם ניתן לשנות את הקידוד?'
txt = Text(article)
print(txt.language.code)

txt = Text(article,hint_language_code = 'he')
print(txt.language.code)

sent=txt.sentences[1]
print(sent.language.code)
print(sent)

Output:

iw
he
iw
והאם ניתן לשנות את הקידוד?

How can I permanently change the text language_code from 'iw' to 'he'?

0

There are 0 best solutions below