sum occurrence output of uniq -c

7.4k Views Asked by At

I want to sum up occurrence output of "uniq -c" command. How can I do that on the command line?

For example if I get the following in output, I would need 250.

 45 a4
 55 a3
  1 a1
149 a5
5

There are 5 best solutions below

0
On BEST ANSWER
awk '{sum+=$1} END{ print sum}'
0
On

This should do the trick:

awk '{s+=$1} END {print s}' file

Or just pipe it into awk with

uniq -c whatever | awk '{s+=$1} END {print s}'
0
On

for each line add the value of of first column to SUM, then print out the value of SUM

awk is a better choice

uniq -c somefile | awk '{SUM+=$1}END{print SUM}'

but you can also implement the logic using bash

uniq -c somefile | while read num other
do
   let SUM+=num;
done
echo $SUM
0
On

uniq -c is slow compared to awk. like REALLY slow.

{mawk/mawk2/gawk} 'BEGIN { OFS = "\t" } { freqL[$1]++; } END {  # modify FS for that
                                                                # column you want
   for (x in freqL) { printf("%8s %s\n", freqL[x], x) } }'      # to uniq -c upon

if your input isn't large like 100MB+, then gawk suffices after adding in the

PROCINFO["sorted_in"] = "@ind_num_asc";  # gawk specific, just use gawk -b mode

if it's really large, it's far faster to use mawk2 then pipe to to

   { mawk/mawk2 stuff... } | gnusort -t'\t' -k 2,2
0
On

While the aforementioned answer uniq -c example-file | awk '{SUM+=$1}END{print SUM}' would theoretically work to sum the left column output of uniq -c so should wc -l somefile as mentioned in the comment.

If what you are looking for is the number of uniq lines in your file, then you can use this command:

sort -h example-file | uniq | wc -l