How to get more than 100 tweets using twython python

354 Views Asked by At

I am scraping the data from tweeter using a hashtag. My code below works perfectly. However, I would like to get 10 000 tweets and save them in the same JSON folder (Or save them in separate folder and then combine into one). When I run the code and print the length of my data frame, it prints only 100 tweets.

import json
credentials = {}
credentials['CONSUMER_KEY'] = ''
credentials['CONSUMER_SECRET'] = ''
credentials['ACCESS_TOKEN'] = ''
credentials['ACCESS_SECRET'] = ''

# Save the credentials object to file
with open("twitter_credentials.json", "w") as file:
    json.dump(credentials, file)

# Import the Twython class
from twython import Twython
import json

# Load credentials from json file
with open("twitter_credentials.json", "r") as file:
    creds = json.load(file)

# Instantiate an object
python_tweets = Twython(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'])

data = python_tweets.search(q='#python', result_type='mixed', count=10000)

with open('tweets_python.json', 'w') as fh:
    json.dump(data, fh)

data1 = pd.DataFrame(data['statuses'])

print("\nSample size:")
print(len(data1))

OUTPUT:
Sample size:
100

I have seen some answers where I can use max_id. I have tried to write the code but this is wrong.

max_iters = 50
max_id = ""
for call in range(0,max_iters):
       data = python_tweets.search(q='#python', result_type='mixed', count=10000, 'max_id': max_id)

 File "<ipython-input-69-1063cf5889dc>", line 4
    data = python_tweets.search(q='#python', result_type='mixed', count=10000, 'max_id': max_id)
                                                                                       ^
SyntaxError: invalid syntax

Could you please tell me how can I get 10 000 tweets saved into one JSON file?

1

There are 1 best solutions below

2
George On

As from their docs here, you can use generator and get as much result as available.

results = python_tweets.cursor(twitter.search, q='python', result_type='mixed')
with open('tweets_python.json', 'w') as fh:
    for result in results:
        json.dump(result, fh)

Also, if you want to do max_id approach, argument should be passed as follows

python_tweets.search(q='#python', result_type='mixed', count=10000, max_id=max_id)