Googletrans not working with large datasets

111 Views Asked by At

I'm trying to translate a column of a Dataframe of shape (13815, 2), that are English sentences. I want to translate them to Persian('fa'). but since the size of the data is too much, every time I try I'm faced with errors.

I tried multiple ways but with no progress. I've asked chatgpt, but no use there. finally I found this piece of code and was able to translate about 1800 rows but, again it stops after some time.

import copy
import pandas as pd
from googletrans import Translator

translatedList = []
for index, row in df.iterrows():
   # REINITIALIZE THE API
    translator = Translator()
    newrow = copy.deepcopy(row)
    try:
        # translate the 'text' column
       translated = translator.translate(row['question'], dest='fa')
        newrow['translated'] = translated.text
    except Exception as e:
       print(str(e))
       continue

   translatedList.append(newrow) 
1

There are 1 best solutions below

0
Daviid On

The library claims that Googletrans is a free and unlimited python library that implemented Google Translate API I HIGHLY doubt Google would allow you to translate things endlessly.

It also says:

Note on library usage
DISCLAIMER: this is an unofficial library using the web API of translate.google.com and also is not associated with Google.

The maximum character limit on a single text is 15k.

Due to limitations of the web version of google translate, this API does not guarantee that the library would work properly at all times (so please use this library if you don’t care about stability).

Important: If you want to use a stable API, I highly recommend you to use Google’s official translate API.

If you get HTTP 5xx error or errors like #6, it’s probably because Google has banned your client IP address.

So check you're not over 15k, not getting IP banned or rate limited. Check 'translated._response' to see what response you get right before your program stops.