I am trying to call the language detection method of the translate client api from pyspark for each row in a file.
I created a map method as the following but the job seems to just freeze with no error. If I remove the call to the translate API it executes fine. Is it possible to call Google client API methods within pySpark map ?
mapping method to do translation
def doTranslate(data):
translate_client = translate.Client()
# Get the message information
messageId = data[0]
messageContent = data[6]
detectedLang = translate_client.detect_language(messageContent)
r = []
r.append(detectedLang)
return r
Figured it out!! your question led me in the right direction. thanks!
Turns out I was getting an exception from the call because I was going past the default quota for sizes of messages. I added a try/except block and determined this was the problem. Then cutting the message size down (I am just testing so dont want to mess with the quota) fixed the issue.