Language detection using pycld2

593 Views Asked by natt010 At 29 July 2025 at 06:13

I am trying to use the pycld2 package to detect multiple languages in text. This is the example I am testing out:

import pycld2 as cld2

text = '''The universal connection with an additional advantage: Push-in connection. Terminate solid and stranded (Class B 7 strands or less), as well as ferruled conductors, by simply pushing them in – no tools required. La connessione universale con un ulteriore vantaggio: Connessione push-in. Terminare solido e incagliato (trefoli di classe B 7 o meno), così come i conduttori a puntale, semplicemente spingendoli in – nessun attrezzo richiesto. Der universelle Anschluss mit zusätzlichem Vorteil: Push-in-Anschluss Vollständig und verseilt abschließen (Klasse B 7 Stränge oder weniger), sowie Aderendhülsen durch einfaches Aufschieben in – kein Werkzeug erforderlich.'''

reliable, index, top_3_choices,vecs = cld2.detect(text, returnVectors=True)

The top 3 detected languages are the following:

print(top_3_choices)
(('GERMAN', 'de', 34, 1089.0), ('ITALIAN', 'it', 33, 355.0), ('ENGLISH', 'en', 32, 953.0))

According to the documentation the confidence score is the fourth argument in each tuple and the third argument corresponds to the percentage of the original text detected in the respective language. I am struggling though how to interpret the score so I can flag the confidence of the detected language. Can I somehow normalize the score to get some form of interpretable probabilities?

Original Q&A

Language detection using pycld2

There are 0 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in TEXT-CLASSIFICATION

Related Questions in LANGUAGE-DETECTION

Related Questions in CLD2

Trending Questions

Popular # Hahtags

Popular Questions