I want to delete lines from file 1.txt that are in file 2.txt and save the output to 3.txt, I am using this bash command:
comm -23 1.txt 2.txt > 3.txt
When I check the output in file 3.txt, I find that some common lines between 1.txt and 2.txt are still in 3.txt, take as an example the word "registry" , what is the problem?
You can download the two files below:
file 1.txt : https://ufile.io/n7vn6
file 2.txt : https://ufile.io/p4s58
I'm not sure how you generated your text files, but the problem is that some of your
1.txt
and2.txt
lines don't have consistent line terminations. Some have a CR character (ctrl-M) but not the sole line feed Linux expects for text files. For example, one of them hasregistry^M
which doesn't matchregistry
(Linux programs that examine text will see^M
as another character or white space but not as a line termination that gets ignored). When you look at the file with some text editors, the^M
isn't visible so it appearsregistry
is the same in both places, but it isn't.You could try:
dos2unix
will make all of the line terminations correct (assuming they might be using the DOS CR). Note that this can affect the sort a little, so I'm also resorting them. You can try this without resorting, and if there's an issuecomm
will give an error that one of the files isn't sorted.