uniq -cd but as percentage

109 Views Asked by At

I have a file containing these lines:

"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"

I was wondering if is there a Unix way to get a histogram percentage of these lines based on how many times it's repeated. This is my attempt:

sort bmc-versions.txt | uniq -cd
    321 "RedfishVersion":"1.0.0"
     19 "RedfishVersion":"1.0.2"

I want output like this:

"1.0.0"  50%
"1.0.2"  40%
2

There are 2 best solutions below

3
jared_mamrot On BEST ANSWER

Sorted by percentage (highest first) using GNU awk:

awk 'BEGIN{FS=":"; PROCINFO["sorted_in"] = "@val_num_desc"} {a[$2]++} END{for (i in a) {print i "  " int(a[i] / NR * 100 + 0.5) "%"}}' test.txt
"1.15.0"  54 %
"1.6.0"  46 %

Nicer formatting:

awk 'BEGIN {
    FS = ":"
    PROCINFO["sorted_in"] = "@val_num_desc"
}

{
    a[$2]++
}

END {
    for (i in a) {
        print i "  " int(a[i] / NR * 100 + 0.5) "%"
    }
}' test.txt
"1.15.0"  54 %
"1.6.0"  46 %

Sorted by percentage (highest first) using 'non-GNU' awk (e.g. posix awk):

awk 'BEGIN{FS=":"} {a[$2]++} END{for (i=NR; i>=0; i--) {for (h in a) {if(a[h] == i) {print h, int(a[h] / NR * 100 + 0.5), "%"}}}}' test.txt
"1.15.0" 54 %
"1.6.0" 46 %

Nicer formatting:

awk 'BEGIN {
    FS = ":"
}

{
    a[$2]++
}

END {
    for (i = NR; i >= 0; i--) {
        for (h in a) {
            if (a[h] == i) {
                print h, int(a[h] / NR * 100 + 0.5), "%"
            }
        }
    }
}' test.txt
"1.15.0" 54 %
"1.6.0" 46 %
0
Aswin P.M On
   awk -F'"' '{count[$4]++} END {for (version in count) {sum += count[version]} for (version in count) {printf "\"%s\" %.2f%%\n", version, (count[version] / sum) * 100}}' bmc-versions.txt | sort -k2,2nr

Here's what this command does:

  • -F'"': Sets the field separator to a double quote, so we can easily extract the version number.
  • {count[$4]++}: For each line, it increments the count for the corresponding version number.
  • END {for (version in count) {sum += count[version]} for (version in count) {printf ""%s" %.2f%%\n", version, (count[version] / sum) * 100}}: After processing all lines, this part calculates and prints the percentages for each version.
  • sum += count[version]: Calculates the total count of lines. printf ""%s" %.2f%%\n", version, (count[version] / sum) * 100: Prints the version and its percentage.
  • The sort -k2,2nr command sorts the lines based on the second column (the percentages) in descending numeric order.