I am scraping the data from tweeter using a hashtag. My code below works perfectly. However, I would like to get 10 000 tweets and save them in the same JSON folder (Or save them in separate folder and then combine into one). When I run the code and print the length of my data frame, it prints only 100 tweets.
import json
credentials = {}
credentials['CONSUMER_KEY'] = ''
credentials['CONSUMER_SECRET'] = ''
credentials['ACCESS_TOKEN'] = ''
credentials['ACCESS_SECRET'] = ''
# Save the credentials object to file
with open("twitter_credentials.json", "w") as file:
json.dump(credentials, file)
# Import the Twython class
from twython import Twython
import json
# Load credentials from json file
with open("twitter_credentials.json", "r") as file:
creds = json.load(file)
# Instantiate an object
python_tweets = Twython(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'])
data = python_tweets.search(q='#python', result_type='mixed', count=10000)
with open('tweets_python.json', 'w') as fh:
json.dump(data, fh)
data1 = pd.DataFrame(data['statuses'])
print("\nSample size:")
print(len(data1))
OUTPUT:
Sample size:
100
I have seen some answers where I can use max_id. I have tried to write the code but this is wrong.
max_iters = 50
max_id = ""
for call in range(0,max_iters):
data = python_tweets.search(q='#python', result_type='mixed', count=10000, 'max_id': max_id)
File "<ipython-input-69-1063cf5889dc>", line 4
data = python_tweets.search(q='#python', result_type='mixed', count=10000, 'max_id': max_id)
^
SyntaxError: invalid syntax
Could you please tell me how can I get 10 000 tweets saved into one JSON file?
As from their docs here, you can use generator and get as much result as available.
Also, if you want to do max_id approach, argument should be passed as follows