How to replace a column in .fam with .txt in unix

162 Views Asked by At

I am looking for some options in unix (may be awk or sed ) through which I can replace the last column in my .fam file with the last column (v8) of a .txt file. Something similar to the merge function in R.

My .fam file looks like this

20481 20481 0 0 2 -9
20483 20483 0 0 1 1
20488 20488 0 0 2 1
20492 20492 0 0 1 1

and my .txt file looks like this.

V1       V2     V3      V4      V6     V7_Pheno   V8
    2253792 20481   NA      DNA     1       Yes    2
    2253802 20483   NA      DNA     4       Yes    2
    2253816 20488   NA      DNA     0       No     1
    2253820 20492   NA      DNA     4       Yes    2

My outcome.fam file should looks like this

20481 20481 0 0 2 2
20483 20483 0 0 1 2
20488 20488 0 0 2 1
20492 20492 0 0 1 2
2

There are 2 best solutions below

0
Mathieu On
  • paste merges the lines

  • awk allow you to select column, so

    paste foo.fam bar.txt | awk '{ print $1 " " $2 " " $3 " " $4 " " $13 }'
    

should do what you want


If you want to suppress the header line of .txt file, you can call tail to skip the first line:

tail -n +2 bar.txt

You can hence integrate it in you command line (assuming you use bash)

paste foo.fam <(tail -n +2 bar.txt) | awk '{ print $1 " " $2 " " $3 " " $4 " " $13 }'
0
Paul Hodges On

awk can do it alone.

$: awk 'BEGIN{ getline < "f.txt" } 
     { gsub("[^ ]+$",""); l=$0; getline < "f.txt"; print l$7; }' f.fam
20481 20481 0 0 2 2
20483 20483 0 0 1 2
20488 20488 0 0 2 1
20492 20492 0 0 1 2

The BEGIN reads the header record on the .txt.
Then for each line of the .fam, strip off the last field and save to l.
getline used this way parses to fields also, so print l$7; prints the shortened record from .fam and adds the last field from .txt.