Let me start off by saying I don't want to print only the duplicate lines nor do I want to remove them.
I am trying to use grep with a pattern file to parse a large data file.
The Pattern file for example may look like this:
1243
1234
1234
1234
1354
1356
1356
1677
etc. with more single and duplicate entries.
The Input data file might look like this:
aatta 1243 qqqqqq
yyyyy 1234 vvvvvv
ttttt 1555 bbbbbb
ppppp 1354 pppppp
yyyyy 3333 zzzzzz
qqqqq 1677 eeeeee
iiiii 4444 iiiiii
etc. for 27000 lines.
when i use
grep -f 'Patternfile.txt' 'Inputfile.txt' > 'Outputfile.txt'
I get an output file that resembles this:
aatta 1243 qqqqqq
yyyyy 1234 vvvvvv
ppppp 1354 pppppp
how would can i get it to also report the duplicates so i end up with something like this?:
aatta 1243 qqqqqq
yyyyy 1234 vvvvvv
yyyyy 1234 vvvvvv
yyyyy 1234 vvvvvv
ppppp 1354 pppppp
qqqqq 1677 zzzzzz
Additionally I would also like to print a blank line should a query in the pattern file not match a substring in the input file.
Thank you!
One solution, not with
grep
, but withperl
:With
patternfile.txt
andinputfile.txt
with data of your original post. Next content ofscript.pl
should do the job (I assume that the string to match is the second column, otherwise it should be modified to use aregexp
instead. This way is faster):Run it like:
And gives next output: