I have two files, each with two columns and sorted only by the second column, such as:
File 1:
176 AAATC
6 CCGTG
80 TTTCG
File 2:
20 AAATC
77 CTTTT
50 TTTTT
I would like to use comm command using options -13 and -23 to get two different files reporting the different lines between the two files with the corresponding count number, but only comparing the second columns (i.e. the strings). What I tried so far was something like:
comm -23 <(cut -d$'\t' -f2 file1.txt) <(cut -d$'\t' -f2 file2.txt)
But I could only have the strings in output, without the numbers:
CCGTG
TTTCG
While what I want would be:
6 CCGTG
80 TTTCG
Any suggestion?
You can use
joininstead ofcomm:It will output the matching lines, too, but you can remove them with
Explanation:
-1 2use the second column in file 1 for joining;-2 2similarly, use the second column in file 2;-a 1show also non-matching lines from file 1 - these are the ones you want in the end;-ospecifies the output format, here we want columns 1 and 2 from file 1 and column 2 from file 2 (this is just arbitrary, you can use column 1 as well, but the second command would be different:| grep -v '[ACTG] [0-9]').