Calling GCP Translate API within Dataproc pyspark map

116 Views Asked by At

I am trying to call the language detection method of the translate client api from pyspark for each row in a file.

I created a map method as the following but the job seems to just freeze with no error. If I remove the call to the translate API it executes fine. Is it possible to call Google client API methods within pySpark map ?

mapping method to do translation

def doTranslate(data):

translate_client = translate.Client()

# Get the message information
messageId = data[0]
messageContent = data[6]

detectedLang = translate_client.detect_language(messageContent)

r = []
r.append(detectedLang)
return r
1

There are 1 best solutions below

0
On

Figured it out!! your question led me in the right direction. thanks!

Turns out I was getting an exception from the call because I was going past the default quota for sizes of messages. I added a try/except block and determined this was the problem. Then cutting the message size down (I am just testing so dont want to mess with the quota) fixed the issue.