I am trying to write a one-line command on terminal to count all the unique "gene-MIR" in a very large file. The "gene-MIR" are followed by a series of numbers ex. gene-MIR334223, gene-MIR633235, gene-MIR53453 ... etc, and there are multiples of the same "gene-MIR" ex. gene-MIR342433 may show up 10x in the script.
My question is, how do I write a command that will annotate the unique "gene-MIR" that are present in my file?
The commands I have been using so far is:
grep -c "gene-MIR" myfile.txt | uniq
grep "gene-MIR" myfile.txt | sort -u
The first command provides me with a count; however, I believe it does not include the number series after "MIR" and is only counting how many "gene-MIR" itself are present.
Thanks!
Assuming all the entries are are on separate lines, try this:
If the entries are mixed up with other text, and the system has GNU
grep
try this:To get the total count: