I have a file with dups records (dups are in columns). I want to keep only the last occurrence of the dup records in a file and move the all other dups in another file.
File : input
foo j
bar bn
bar b
bar bn
bar bn
bar bn
kkk hh
fjk ff
foo jj
xxx tt
kkk hh
I have used the following awk statement to keep the last occurrence --
awk '{line=$0; x[$1]=line;} END{ for (key in x) print x[key];}' input > output
File : output
foo jj
xxx tt
fjk ff
kkk hh
bar bn
How can I move the repeating records to another file (leaving the last occurrence)?
Moving foo j
in one file let say d_output
and keeping foo jj
in output file
Another option you could try, keeping the order by reading the input file twice:
output:
Duplicates:
@Sudo_O @WilliamPursell @user2018441. Sudo_O thank you for the performance test. I tried to reproduce them on my system, but it does not have
tac
available, so I tested with Kent's version and mine, but I could not reproduce those differences on my system.Update: I tested with Sudo_O's version using
cat
instead oftac
. Although on a system withtac
there was a difference of 0,2 seconds betweentac
andcat
when outputting to /dev/null (see at the bottom of this post)I got:
--
when using a file instead of the
seq
I got:Probably due to caching effects, which would be present also for larger files.. Creating the infile took:
Tested on a different system: