I have a csv file containing thousands of tweets. Lets say the data is as follows:
Tweet_id hashtags_in_the_tweet
Tweet_1 [trump, clinton]
Tweet_2 [trump, sanders]
Tweet_3 [politics, news]
Tweet_4 [news, trump]
Tweet_5 [flower, day]
Tweet_6 [trump, impeach]
as you can see, the data contains tweet_id and the hashtags in each tweet. What I want to do is to go to all the rows, and at last give me something like value count:
Hashtag count
trump 4
news 2
clinton 1
sanders 1
politics 1
flower 1
obama 1
impeach 1
Considering that the csv file contains 1 million rows (1 million tweets), what is the best way to do this?
So all the answers above were helpful, but didn't actually work! The problem with my data is: 1)the value of
'hashtags'
filed for some tweets arenan
or[]
. 2)The value of'hashtags'
field in the dataframe is one string! the answers above assumed that the values of the hashtags are lists of hashtag, e.g.['trump', 'clinton']
, while it actually is only anstr
:'[trump, clinton]'
. So I added some lines to @jpp 's answer: