Translate dataframe with DeepL

1.5k Views Asked by At

I would like to translate (using DeepL) the text inside the df ["text"] column, where in each line there is a sentence. The text is not written in a single language, so I'd like to automatically detect the language of the text and put the translation in a new column called df ["translated"].

Thank you

I have DeepL's free authentication key but I can't figure out how to use it, I am a rookie.

3

There are 3 best solutions below

0
On BEST ANSWER

You can use the DeepL-Python library.

According to the documentation you can just ommit the parameter source_lang and it will try to detect the language by its own.

import deepl
translator = deepl.Translator(auth_key)
result = translator.translate_text(text_to_translate)
translated_text = result.text
0
On

I can't test it because I don't have a API key, but by reading the documentation of the free Deepl API, everything is well indicated, with a CURL example:

curl https://api.deepl.com/v2/translate \
    -d auth_key=[yourAuthKey] \
    -d "text=Hello, world!"  \
    -d "target_lang=DE"

The documentation indicates that the source_lang parameter is optional and if it is omitted, the API will attempt to detect the language of the text and translate it.

So in python code, it should be

import requests
import json
url = "https://api-free.deepl.com/v2/translate"
data = f"auth_key={yourAuthKey}&text={YourText}&target_lang={LanguageCode}"
resp = requests.post(url, data=data)
translated_text = json.loads(resp.content)
print(translated_text)

with:

  • yourAuthKey - your API key.
  • YourText - the text you wish to translate.
  • LanguageCode - the language code (see API doc) into which the text should be translated.

It should be fine like this with standard libraries.

Or you can use official DeepL Python Library to make it even simpler.

0
On

I needed to translate a column in a dataframe for a recent project, and thought I would share my approach using DeepL's Python client library in case it's helpful.

import pandas as pd
import deepl
translator = deepl.Translator(auth_key)

d = {'Source': ['This is some English source text.', 'Another sentence in English.']}
df = pd.DataFrame(data=d)

df['Target'] = df['Source'].apply(lambda x: translator.translate_text(x,  
               target_lang="DE") if type(x) == str else x)

As mentioned above, the source_lang argument can be omitted if you'd like DeepL to auto-detect source language—that's what I do here.

You'll end up with:

    Source                              Target
0   'This is some English source text.' 'Dies ist ein englischer Ausgangstext.'
1   'Another sentence in English.'      'Ein weiterer Satz auf Englisch.'

(The if type(x) == str else x isn't necessary, but could be helpful in case you have null or other non-string values in your source text column that you'd rather just skip over.)