Print the most frequently occuring letter in a string using AWK

77 Views Asked by At

I have text file

('1', 6310445)  [12, 20]_S:0.6:0:ACAAAAAAAAAAA_i_V
('1', 17704109) [12, 31]_S:0.387:0:CCCCCCCCCCCC_i_V
('1', 18922274) [8, 22]_S:0.364:0:AAAAAAAA_i_V
('1', 22750694) [8, 19]_S:0.421:0:TTTTTTTT_i_V
('1', 25564545) [9, 23]_S:0.391:0:AAAAAAAAA_i_V
('1', 29189562) [13, 34]_S:0.382:0:AAAAAAAAAAAAA_i_V
('1', 30166561) [14, 20]_S:0.7:0:TTTTTTTTTTTTTT_i_V
('1', 30450439) [9, 14]_S:0.643:0:AAAAAAAAA_i_V
('1', 30981321) [12, 23]_S:0.522:0:AAAAAAAAAAAA_i_V

And I want to print the most frequently occurring letter between the last ":" and first "_".

Which means

"ACAAAAAAAAAAA" => A, "CCCCCCCCCCCC": => C . . . .

The output will be

A C A T A A T A A

How can I do?

1

There are 1 best solutions below

0
jhnc On BEST ANSWER

You can use a simple reduce-style approach:

awk -F: -v ORS= '
    NF>1 && split($NF,a,/_/)>1 {
        for (i=length(s=a[1]); i>0; i--)
            if (++n[c=substr(s,i,1)] > n[r])
                r=c
        print r OFS

        split(r="",n) # reset state
    }
    END { print "\n" }
' textfile

If multiple characters appear most frequently (eg. ABCABCABC), then the first to reach the maximum will be printed.