I have file where I want to update the values in the first field of a specific column (say 1 and 2) when there is a context (pipe i.e |) in the 5th field of that column.

I can use python but splitting the lines, substituting the values and joining them is going to be a long script. I am looking for a solution using awk (pefereable) else others are fine too that are short. Also I want to embed this within python script.

Below are two columns from my data with fields within column separate by (:).

0/1:42,19:61:99:0|1:5185_T_TTCTATC:560,0,1648       0/1:38,34:72:99:0|1:5185_T_TTCTATC:1145,0,1311

0/0:124,0,0:124:99:0,120,1800,120,1800,1800    0/0:165,0,0:165:99:0,120,1800,120,1800,1800

0/0:152,0:152:99:.:.:0,120,1800    0/1:145,34:179:99:0|1:5398_A_G:973,0,6088

So, when the 5th field in that column has '|' we update first field with 5th field value.

Expected result:

0|1:42,19:61:99:0|1:5185_T_TTCTATC:560,0,1648       0|1:38,34:72:99:0|1:5185_T_TTCTATC:1145,0,1311

0/0:124,0,0:124:99:0,120,1800,120,1800,1800    0/0:165,0,0:165:99:0,120,1800,120,1800,1800

0/0:152,0:152:99:.:.:0,120,1800    0|1:145,34:179:99:0|1:5398_A_G:973,0,6088

-Actually, there are lots of column. And, say this kind of column appear after 5th python index position, and I want to do the substitution in every column field after the 5th column, how can I approach the problem.

Thanks,

Thanks,

1

There are 1 best solutions below

4
On BEST ANSWER
$ awk '{ for (i=1;i<=NF;i++) { split($i,f,/:/); if (f[5]~/\|/) sub(/^[^:]+/,f[5],$i) } }1' file
0|1:42,19:61:99:0|1:5185_T_TTCTATC:560,0,1648 0|1:38,34:72:99:0|1:5185_T_TTCTATC:1145,0,1311
0/0:124,0,0:124:99:0,120,1800,120,1800,1800    0/0:165,0,0:165:99:0,120,1800,120,1800,1800
0/0:152,0:152:99:.:.:0,120,1800 0|1:145,34:179:99:0|1:5398_A_G:973,0,6088

The only caveat is that the 5th subfield can't contain &s since that would be a backreference metacharacter in the sub().

If you want to start the replacements at column 5, change i=1 to i=5 in the loop init part.

broken into lines:

$ awk '{
    for (i=1;i<=NF;i++) {
        split($i,f,/:/)
        if (f[5]~/\|/)
            sub(/^[^:]+/,f[5],$i)
    }
}1' file