Capitalize specific indices of string using awk or python

117 Views Asked by At

I have an input file where each line contains 99 lowercase letters,

bccdddcdccddddddabcdabcabdbacbdcaaccbbcabacbccabcacbcdcccbdbacdcbbcbcbcccacadaaccababadbcbaabbbccbb 
bccdddcdcddddcddabcdabcabdbacbddaacdbbcabacbcdbbcacbcccccbdbacdbbbcbcbacbacacaacccbabadbcbaabbbccbb 
bccdddcdcddddccdabcdabcabdbacbddaaddbbcabacbcdbbcacbcccccbdbacdbbbcbcbaccacadaaccbbabadbccacbbbccbb 
bccdddcdccdddccdabcdabcdbdbacbdcaaddcbcabacbccabcacbcdcccbdbacdbbbcbcbbccacadaaccbbabadbccaaabbccbb 

I have a list of positions, for example p = [10, 14, 89, 99].

I'd like to capitalize the letters at these positions in my input file.

Desired output:

bccdddcdcCdddDddabcdabcabdbacbdcaaccbbcabacbccabcacbcdcccbdbacdcbbcbcbcccacadaaccababadbCbaabbbccbB 
bccdddcdcDdddCddabcdabcabdbacbddaacdbbcabacbcdbbcacbcccccbdbacdbbbcbcbacbacacaacccbabadbCbaabbccbB 
bccdddcdcDdddCcdabcdabcabdbacbddaaddbbcabacbcdbbcacbcccccbdbacdbbbcbcbaccacadaaccbbabadbCcacbbccbbB 
bccdddcdcCdddCcdabcdabcdbdbacbdcaaddcbcabacbccabcacbcdcccbdbacdbbbcbcbbccacadaaccbbabadbCcaaabbccbB 

I'm using this awk command:

awk -vFS= -vOFS= '{$10=toupper($10)}1' input > output

But I'm not sure how to loop this over all the positions.

4

There are 4 best solutions below

1
On BEST ANSWER

You can use a generator expression with .upper() and enumerate() to capitalize only the specified indices:

p = [10, 14, 89, 99] # or use set([10, 14, 89, 99]) for faster lookup
with open('in.txt') as file:
    for line in file:
        line = line.rstrip()
        result = ''.join(c.upper() if i + 1 in p else c for i, c in enumerate(line))
        print(result)

This outputs:

bccdddcdcCdddDddabcdabcabdbacbdcaaccbbcabacbccabcacbcdcccbdbacdcbbcbcbcccacadaaccababadbCbaabbbccbB
bccdddcdcDdddCddabcdabcabdbacbddaacdbbcabacbcdbbcacbcccccbdbacdbbbcbcbacbacacaacccbabadbCbaabbbccbB
bccdddcdcDdddCcdabcdabcabdbacbddaaddbbcabacbcdbbcacbcccccbdbacdbbbcbcbaccacadaaccbbabadbCcacbbbccbB
bccdddcdcCdddCcdabcdabcdbdbacbdcaaddcbcabacbccabcacbcdcccbdbacdbbbcbcbbccacadaaccbbabadbCcaaabbccbB
0
On

One awk idea:

awk -v p="10,14,89,99" '
BEGIN { split(p,arr,",") }
      { for (i in arr)
            $0=substr($0,1,arr[i]-1) toupper(substr($0,arr[i],1)) substr($0,arr[i]+1)
        print
      }
' input

This generates:

bccdddcdcCdddDddabcdabcabdbacbdcaaccbbcabacbccabcacbcdcccbdbacdcbbcbcbcccacadaaccababadbCbaabbbccbB
bccdddcdcDdddCddabcdabcabdbacbddaacdbbcabacbcdbbcacbcccccbdbacdbbbcbcbacbacacaacccbabadbCbaabbbccbB
bccdddcdcDdddCcdabcdabcabdbacbddaaddbbcabacbcdbbcacbcccccbdbacdbbbcbcbaccacadaaccbbabadbCcacbbbccbB
bccdddcdcCdddCcdabcdabcdbdbacbdcaaddcbcabacbccabcacbcdcccbdbacdbbbcbcbbccacadaaccbbabadbCcaaabbccbB
0
On

I would harness GNU AWK for this task following way, let file.txt content be

bccdddcdccddddddabcdabcabdbacbdcaaccbbcabacbccabcacbcdcccbdbacdcbbcbcbcccacadaaccababadbcbaabbbccbb 
bccdddcdcddddcddabcdabcabdbacbddaacdbbcabacbcdbbcacbcccccbdbacdbbbcbcbacbacacaacccbabadbcbaabbbccbb 
bccdddcdcddddccdabcdabcabdbacbddaaddbbcabacbcdbbcacbcccccbdbacdbbbcbcbaccacadaaccbbabadbccacbbbccbb 
bccdddcdccdddccdabcdabcdbdbacbdcaaddcbcabacbccabcacbcdcccbdbacdbbbcbcbbccacadaaccbbabadbccaaabbccbb

then

awk 'BEGIN{FPAT=".";OFS="";arr[10];arr[14];arr[89];arr[99]}{for(i in arr){$i=toupper($i)};print}' file.txt

gives output

bccdddcdcCdddDddabcdabcabdbacbdcaaccbbcabacbccabcacbcdcccbdbacdcbbcbcbcccacadaaccababadbCbaabbbccbB 
bccdddcdcDdddCddabcdabcabdbacbddaacdbbcabacbcdbbcacbcccccbdbacdbbbcbcbacbacacaacccbabadbCbaabbbccbB 
bccdddcdcDdddCcdabcdabcabdbacbddaaddbbcabacbcdbbcacbcccccbdbacdbbbcbcbaccacadaaccbbabadbCcacbbbccbB 
bccdddcdcCdddCcdabcdabcdbdbacbdcaaddcbcabacbccabcacbcdcccbdbacdbbbcbcbbccacadaaccbbabadbCcaaabbccbB

Explanation: I inform GNU AWK that field is any single character using FPAT and field separator is empty string then I mention keys of array arr without caring about values as I will use only keys. For every line I iterate over keys of said array and apply tuupper to these position then I print line.

(tested in GNU Awk 5.0.1)

0
On

Here is an awk solution using FIELDWIDTHS option of GNU awk. Written and tested with ONLY shown samples.

awk -v FIELDWIDTHS="9 1 3 1 74 1 9 1" -v OFS="" '
function toUppeR(value){
  $value=toupper($value)
}
{
  for(i=2;i<=8;i+=2){
    toUppeR(i)
  }
}
1
' Input_file

Explanation: Simple explanation would be, using FIELDWIDTHS option of GNU awk by which we can define width of fields(so in given example we need to capitalize letters of 10, 14, 89, 99 positions, so I have made fields of to catch 10th position(9 1) to catch 14th position(9 1 3 1), to catch 89th position (9 1 3 1 74 1) and so on... you can clearly see its easy to catch fields by position here though it can do some problem with OFS(output field separator) thing but that is NOT applicable at least for your shown samples.