I want to sum up occurrence output of "uniq -c" command. How can I do that on the command line?
For example if I get the following in output, I would need 250.
45 a4
55 a3
1 a1
149 a5
This should do the trick:
awk '{s+=$1} END {print s}' file
Or just pipe it into awk
with
uniq -c whatever | awk '{s+=$1} END {print s}'
for each line add the value of of first column to SUM, then print out the value of SUM
awk
is a better choice
uniq -c somefile | awk '{SUM+=$1}END{print SUM}'
but you can also implement the logic using bash
uniq -c somefile | while read num other
do
let SUM+=num;
done
echo $SUM
uniq -c is slow compared to awk. like REALLY slow.
{mawk/mawk2/gawk} 'BEGIN { OFS = "\t" } { freqL[$1]++; } END { # modify FS for that
# column you want
for (x in freqL) { printf("%8s %s\n", freqL[x], x) } }' # to uniq -c upon
if your input isn't large like 100MB+, then gawk suffices after adding in the
PROCINFO["sorted_in"] = "@ind_num_asc"; # gawk specific, just use gawk -b mode
if it's really large, it's far faster to use mawk2 then pipe to to
{ mawk/mawk2 stuff... } | gnusort -t'\t' -k 2,2
While the aforementioned answer uniq -c example-file | awk '{SUM+=$1}END{print SUM}'
would theoretically work to sum the left column output of uniq -c
so should wc -l somefile
as mentioned in the comment.
If what you are looking for is the number of uniq lines in your file, then you can use this command:
sort -h example-file | uniq | wc -l