Using Microsoft Translator API for for multiple source languages in R

186 Views Asked by At

I have a large set of Tweets in 51 different languages which I would like to translate to Englisch. The Tweets are organized in a dataframe, where one column is the actual text.

Each row has 17 columns with different information, such as user_id, status_id, language and text. Looking like this:

A tibble: 642,581 × 17
   user_id  status_id  created_at screen_name text    source  favorite_count retweet_count quote_count reply_count hashtags
   <chr>    <chr>      <chr>      <chr>       <chr>   <chr>   <chr>          <chr>         <chr>       <chr>       <chr>   
 1 1235580… 138959693… 2021-05-0… China_Lyon  "la bo… "Twitt… 0              15            NA          NA          "c(\"Ch…
 2 NA       135549494… 2021-01-30 Ambassador… "rt : … "<a hr… 0              0             0           0            NA     

I have an Azure Account due to my university, so I have the Microsoft API key. The translation from one language to english does work with this code:

translated_tweets <- translateR::translate(dataset = Tweets,
                                             content.field = 'text', 
                                             microsoft.api.key  = 'my.api.key',
                                             source.lang = 'de',
                                             target.lang = 'en',
                                             microsoft.token = TRUE)

My question now is, how would the code look like, if I have multiple source languages? I tried to simply ad another source language, to the one that is already there, but that doesn't work. Would it be better if iterate this code snippet for every different language? Resulting in 51 dataframes, which should then only be merged together?

I am quite new to programming, so please bare with me, if these questions might sound stupid.

0

There are 0 best solutions below