Changing string in a specific column for a range of lines without losing spaces/format

81 Views Asked by At

I have a file with many lines, but I am hoping to change a string X in my fifth column for the first 4635 lines into another string A, without losing the original tabs/spacings between the columns.

I am hoping to change (for a certain range of lines)

ATOM   2732  HN  SER X 176     181.410 174.270 311.410  0.00  0.00
ATOM   2733  CA  SER X 176     180.170 172.920 310.330  0.00  0.00
ATOM   2734  HA  SER X 176     179.860 171.950 310.720  0.00  0.00
ATOM   2735  CB  SER X 176     179.010 173.910 310.790  0.00  0.00
ATOM   2736  HB1 SER X 176     178.020 173.710 310.340  0.00  0.00
ATOM   2737  HB2 SER X 176     178.910 173.930 311.900  0.00  0.00

into

ATOM   2732  HN  SER A 176     181.410 174.270 311.410  0.00  0.00
ATOM   2733  CA  SER A 176     180.170 172.920 310.330  0.00  0.00
ATOM   2734  HA  SER A 176     179.860 171.950 310.720  0.00  0.00
ATOM   2735  CB  SER A 176     179.010 173.910 310.790  0.00  0.00
ATOM   2736  HB1 SER A 176     178.020 173.710 310.340  0.00  0.00
ATOM   2737  HB2 SER A 176     178.910 173.930 311.900  0.00  0.00

I came up with the following code,

awk '{if (NR>=1&&NR<=4635) split($0, a, FS, seps); a[5]="A"; for (i=1;i<=NF;i++) printf("%s%s", a[i], seps[i]); print ""}' dat > tmp

but it seems that all lines in the file now have A in the fifth column, instead of lines 1-4635. Any suggestions would be much appreciated!

3

There are 3 best solutions below

2
On BEST ANSWER

Add curly brackets/braces and an else branch:

awk '{if (NR>=1&&NR<=4635) {split($0, a, FS, seps); a[5]="A"; for (i=1;i<=NF;i++) printf("%s%s", a[i], seps[i]); print ""} else {print}}' dat > tmp

Without curly brackets/braces if's body only contains a split command.

0
On

With GNU awk for the 3rd arg to match() and \s/\S shorthand:

$ awk 'NR<4636{match($0,/((\S+\s+){4}).(.*)/,a); $0=a[1] "A" a[3]} 1' file
ATOM   2732  HN  SER A 176     181.410 174.270 311.410  0.00  0.00
ATOM   2733  CA  SER A 176     180.170 172.920 310.330  0.00  0.00
ATOM   2734  HA  SER A 176     179.860 171.950 310.720  0.00  0.00
ATOM   2735  CB  SER A 176     179.010 173.910 310.790  0.00  0.00
ATOM   2736  HB1 SER A 176     178.020 173.710 310.340  0.00  0.00
ATOM   2737  HB2 SER A 176     178.910 173.930 311.900  0.00  0.00

or with any awk:

$ awk 'NR<4636{match($0,/([^[:space:]]+[[:space:]]+){4}./); $0=substr($0,1,RLENGTH-1) "A" substr($0,RLENGTH+1)} 1' file
ATOM   2732  HN  SER A 176     181.410 174.270 311.410  0.00  0.00
ATOM   2733  CA  SER A 176     180.170 172.920 310.330  0.00  0.00
ATOM   2734  HA  SER A 176     179.860 171.950 310.720  0.00  0.00
ATOM   2735  CB  SER A 176     179.010 173.910 310.790  0.00  0.00
ATOM   2736  HB1 SER A 176     178.020 173.710 310.340  0.00  0.00
ATOM   2737  HB2 SER A 176     178.910 173.930 311.900  0.00  0.00
1
On

If your input is fixed width fields like shown in the sample, then you can use FIELDWIDTHS with GNU awk:

awk -v FIELDWIDTHS='21 1 *' -v OFS= 'NR<=4635{$2="A"} 1'

Here, first field is made up of 21 characters, second field is 1 character and rest is third field. You can then change the second field only for required lines.


If input is not fixed width, then, you can use sed or perl:

# GNU sed
sed -E '1,4635 s/^((\S+\s+){4})\S+/\1A/'

# if \s and \S isn't supported
sed -E '1,4635 s/^(([^[:space:]]+[[:space:]]+){4})[^[:space:]]+/\1A/'

perl -pe 's/^(\S+\s+){4}\K\S+/A/ if $.<=4635'