Hive - Hashtag Counting

375 Views Asked by At

I get stuck by counting hashtags with HiveQL. My problem: I have these format of the hashtags in one row:


jurassicworld;movie;night;dino

jurassicWorld;book;yourtickets;movie

jurassicWorld;movie


I looked at the https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF from Hive, but there is no function, which i can choose a delimiter (;) to seperat these hashtags and count them.

my result should be look like this:

+---------------+-----------+
| Hashtag       | Count     |
+---------------+-----------+
| jurassicworld | 300       |
| movie         | 200       |
| night         | 100       |
| dino          | 250       | 
| book          | 50        |  
| etc...        | 100       |
+---------------+-----------+
1

There are 1 best solutions below

0
On BEST ANSWER

I have created following dummy table deli -

hive> describe deli;
OK
row1                    string                  None

I used following query -

select hashTag, count(*) as data from deli LATERAL VIEW explode(split(row1,'\\;')) t1 AS hashTag group by hashTag;

And, its giving me following result for your data-

book    1
dino    1
jurassicWorld   2
jurassicworld   1
movie   3
night   1
yourtickets     1