wrong language detection with google translate (multiple languages)

2.4k Views Asked by At

I am currently working on something where I am trying to translate a paragraph which includes more than one language.

Now I have realised with the google translate API if we have lets say: hello bye hola it will detect the language as English and if its: hello hola adios then it will detect Spanish.

So basically whichever language has the highest word count in the sentence/paragraph, it will detect that language. Now the funny thing is that on google translate they actually have this feature.

Is there any way that to fix this issue so that it will only detect the foreign language and not English?

1

There are 1 best solutions below

0
On BEST ANSWER

No, there's not a way to do that with the Google Translate API because there's just no mechanism for that exposed in their public API.

If you use an alternate language detection library, you can define a threshold under which to remove the content of the less-represented language. This would allow you to remove the English content if it makes up less than, say, 30% of the text in your overall sample.

For example, see the RemoveMinorityScriptsTextFilterTest class in the optimaize/language-detector project.