file1

3   1234581 A   C   rs123456

file2 zipped file .gz

1   1256781 rs987656    T   C
3   1234581 rs123456    A   C
22  1792471 rs928376    G   T

output

3   1234581 rs123456    A   C

I tried

zcat file2.gz | awk 'NR==FNR{a[$1,$2,$5]++;next} a[$1,$2,$3]' file1.txt  - > output.txt

but it is not working

1

There are 1 best solutions below

4
On BEST ANSWER

Please try following awk code for your shown samples. Use zcat to read your .gz file and then pass it as 2nd input to awk program for reading, after its done reading with file1.

zcat your_file.gz | awk 'FNR==NR{arr[$1,$2,$5];next} (($1,$2,$3) in arr)' file1 -

Fixes in OP's attempt:

  • You need not to increment value of array while creating it in file1. Just existence of indexes in it will be enough.
  • While checking condition in reading file2(passed by zcat command) just check if respective fields are present in array if yes then print that line.