Twitter API v2 How to get a list of all tweets instead of streaming them

694 Views Asked by At

I am new to the Twitter API and I was just testing it using python. Here is the code I am using(which I got from Twitter's Github):

import requests
import os
import json

def create_headers(bearer_token):
    headers = {"Authorization": "Bearer {}".format(bearer_token)}
    return headers


def get_rules(headers, bearer_token):
    response = requests.get(
        "https://api.twitter.com/2/tweets/search/stream/rules", headers=headers
    )
    if response.status_code != 200:
        raise Exception(
            "Cannot get rules (HTTP {}): {}".format(response.status_code, response.text)
        )
    print(json.dumps(response.json()))
    return response.json()


def delete_all_rules(headers, bearer_token, rules):
    if rules is None or "data" not in rules:
        return None

    ids = list(map(lambda rule: rule["id"], rules["data"]))
    payload = {"delete": {"ids": ids}}
    response = requests.post(
        "https://api.twitter.com/2/tweets/search/stream/rules",
        headers=headers,
        json=payload
    )
    if response.status_code != 200:
        raise Exception(
            "Cannot delete rules (HTTP {}): {}".format(
                response.status_code, response.text
            )
        )
    print(json.dumps(response.json()))


def set_rules(headers, delete, bearer_token):
    # You can adjust the rules if needed
    sample_rules = [
        {"value": "dog has:images", "tag": "dog pictures"},
        {"value": "cat has:images -grumpy", "tag": "cat pictures"},
    ]
    payload = {"add": sample_rules}
    response = requests.post(
        "https://api.twitter.com/2/tweets/search/stream/rules",
        headers=headers,
        json=payload,
    )
    if response.status_code != 201:
        raise Exception(
            "Cannot add rules (HTTP {}): {}".format(response.status_code, response.text)
        )
    print(json.dumps(response.json()))


def get_stream(headers, set, bearer_token):
    response = requests.get(
        "https://api.twitter.com/2/tweets/search/stream", headers=headers, stream=True,
    )
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(
            "Cannot get stream (HTTP {}): {}".format(
                response.status_code, response.text
            )
        )
    for response_line in response.iter_lines():
        if response_line:
            json_response = json.loads(response_line)
            to_be_parsed = json.dumps(json_response, indent=4, sort_keys=True)
            a = json.loads(to_be_parsed)
            print(a['data']['text'])


def main():
    bearer_token = '<BEARER_TOKEN>'
    headers = create_headers(bearer_token)
    rules = get_rules(headers, bearer_token)
    delete = delete_all_rules(headers, bearer_token, rules)
    set = set_rules(headers, delete, bearer_token)
    get_stream(headers, set, bearer_token)


if __name__ == "__main__":
    main()

and I want to use the tweets which I get for a sentiment analysis project, so is there any way I can a get a list of tweets based on a certain keyword and from a certain time range instead of the constant stream I get with this code (sort of like the getoldtweets3 library)? Thank you in advance for your help.

1

There are 1 best solutions below

3
On BEST ANSWER

There are two main formats for accessing Twitter's API:

  • realtime (streaming): this is where you make a connection, and keep listening for everything that happens after that.
  • historical (RESTful): this is where you make queries for content that happened right now or in the past, but then stop.

What you're doing with a streaming connection is saying, please deliver me all the Tweets on this pattern / topic / query that happen from now on. It does not allow you to look backwards into the past.

What you're asking for, is the ability to look backwards "from a certain time range". In Twitter API v1.1, and in Twitter API v2 right now (more coming soon), the current option for that is to search for Tweets matching your query. The search API supports up to 7 days in the past, so you cannot ask for, say, all the Tweets from January to March 2019. For that, you need to look at the commercial APIs like full-archive premium search. In the future, v2 may enable a greater historical range.

In API v2 today, you can use the Recent Search API sample to get Tweets from the past 7 days. Some third party Python API libraries also support API v2 now.

The GetOldTweets library used web scraping to get data, which is officially against Twitter's terms of service. It is better to use the official API, which is a supported method of access.