I am analysing some Twitter data over a whole month and I want to identify the number of retweeted, quoted and replied tweets.
This should be the list of available fields in the dataset (standard v1.1):
['reply_count', 'in_reply_to_status_id', 'favorited',
'in_reply_to_status_id_str', 'entities', 'possibly_sensitive', 'id_str',
'in_reply_to_screen_name', 'coordinates', 'quote_count', 'retweeted',
'in_reply_to_user_id_str', 'contributors', 'id', 'truncated',
'in_reply_to_user_id', 'geo', 'timestamp_ms', 'text', 'user',
'favorite_count', 'retweet_count', 'created_at', 'place', 'lang',
'source', 'filter_level', 'is_quote_status', 'display_text_range',
'extended_entities', 'extended_tweet', 'quoted_status_id_str',
'quoted_status_permalink', 'quoted_status', 'quoted_status_id']
For the reply I am using the field "in_reply_to_status_id_str"
, for quoted "quoted_status_id_str"
and "is_quote_status"
(still thinking why I obtain 2 different results if I use one or the other), where I have problems is with the retweeted.
Reading the the documentation, the attribute "retweeted_status"
should appear in case of retweeting, but it is not present in the whole dataset.
The attributes "retweeted"
and "retweet_count"
are always False
and 0
respectively, then they cannot be use.
The only other option I found until now is to use "truncated"
, which has some True
values.
I am wondering if there are any other attribute I can use to identify the retweeted messages.
I know this is related to how the dataset was created and I don't have this information, but I was hoping that someone could suggest to use an attribute I didn't think about.
Any idea?