The following code takes a username and scrapes their twitter history from a given date
import pandas as pd
import twint
import pywren
def scrape_user(username):
c = twint.Config()
c.Username = username
c.Lang = 'es'
c.Since = '2021-04-28'
c.Hide_output = True
c.Pandas = True
twint.run.Search(c)
return twint.storage.panda.Tweets_df
When I run the function, I get the intended result i.e., a Pandas dataframe e.g., scrape_user("DeLaCalleHum"). However, when I use pywren (on even a single username)
pwex = pywren.default_executor()
futures = pwex.map(scrape_user, "DeLaCalleHum")
tweet_list = pywren.get_all_results(futures)
I get this error.
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-31-15f9e00ead75> in <module>
----> 1 wc_list = pywren.get_all_results(futures)
~/macs30123/lib/python3.7/site-packages/pywren/wren.py in get_all_results(fs)
117 """
118 wait(fs, return_when=ALL_COMPLETED)
--> 119 return [f.result() for f in fs]
~/macs30123/lib/python3.7/site-packages/pywren/wren.py in <listcomp>(.0)
117 """
118 wait(fs, return_when=ALL_COMPLETED)
--> 119 return [f.result() for f in fs]
~/macs30123/lib/python3.7/site-packages/pywren/future.py in result(self, timeout, check_only, throw_except, storage_handler)
146 if self._state == JobState.error:
147 if throw_except:
--> 148 raise self._exception
149 else:
150 return None
OSError: [Errno 28] No space left on device
What am I doing wrong? I would appreciate any help.
After some time I found the answer. I can automatically parallelize such function calls in PyWren as long as I add the ComprehendFullAccess policy to my pywren_exec_role_1 role in IAM