pywren and twint - Tweet download

113 Views Asked by At

The following code takes a username and scrapes their twitter history from a given date

import pandas as pd
import twint
import pywren

def scrape_user(username):
    c = twint.Config()
    c.Username = username
    c.Lang = 'es'
    c.Since = '2021-04-28'
    c.Hide_output = True
    c.Pandas = True

    twint.run.Search(c)

    return twint.storage.panda.Tweets_df

When I run the function, I get the intended result i.e., a Pandas dataframe e.g., scrape_user("DeLaCalleHum"). However, when I use pywren (on even a single username)

pwex = pywren.default_executor()
futures = pwex.map(scrape_user, "DeLaCalleHum")
tweet_list = pywren.get_all_results(futures)

I get this error.

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-31-15f9e00ead75> in <module>
----> 1 wc_list = pywren.get_all_results(futures)

~/macs30123/lib/python3.7/site-packages/pywren/wren.py in get_all_results(fs)
    117     """
    118     wait(fs, return_when=ALL_COMPLETED)
--> 119     return [f.result() for f in fs]

~/macs30123/lib/python3.7/site-packages/pywren/wren.py in <listcomp>(.0)
    117     """
    118     wait(fs, return_when=ALL_COMPLETED)
--> 119     return [f.result() for f in fs]

~/macs30123/lib/python3.7/site-packages/pywren/future.py in result(self, timeout, check_only, throw_except, storage_handler)
    146         if self._state == JobState.error:
    147             if throw_except:
--> 148                 raise self._exception
    149             else:
    150                 return None

OSError: [Errno 28] No space left on device

What am I doing wrong? I would appreciate any help.

1

There are 1 best solutions below

0
On BEST ANSWER

After some time I found the answer. I can automatically parallelize such function calls in PyWren as long as I add the ComprehendFullAccess policy to my pywren_exec_role_1 role in IAM