Splitting and renaming protein chain with biopython's BioPDB

681 Views Asked by At

I have a heterodimeric protein PDB file. However, unfortunately all residues have a chain ID of "A". I wish to change the chain ID to "B" for residues with residue number over 275.

Below two attempts at this. I'm able to change the chain names so that when I call get the full residue IDs of the model I see the correct chain ID associated with the correct residues. However, when I try to export this edited PDB the new format isn't there. Hoping it's a simple solution. (Will try an easier solution with bash in the meantime).

structure = PDBParser().get_structure(dirPath, 'heterodimer/heterodimer.pdb')
model = structure[0]
chains = structure.get_chains()

chainA = 275

for model in structure:
 for chains in model:
  for residues in chains:
   if residues.get_id()[1] > chainA:
    chains.id = "B"
   else:
    chains.id = "A"
   io.set_structure(chains)
   savename = "{}_edit.pdb".format(name)
   io.save(savename)

or

structure = PDBParser().get_structure(dirPath, 'heterodimer/heterodimer.pdb')
model = structure[0]
chainOriginal = model["A"]
residues = chainOriginal.get_residues()

chainA = 275

for res in residues:
 if res.get_full_id()[3][1] > chainA:
  chainOriginal.id = "B"
 else:
  chainOriginal.id = "A"
 io.set_structure(structure)
 savename = "{}_edit.pdb".format(name)
 io.save(savename)

Thank you in advance for any help. Seems like I'm missing something simple.

P.S. I've also tried converting the tuple from .get_full_id() to a list, editing the value of the chain index, then converting back to tuple. However, after that point I'm stuck.

1

There are 1 best solutions below

4
On

OK kind of figured out a way to do it.

My input test.pdb file, goes from res 28 to res 38:

ATOM      1  N   ARG A  28       5.140  67.453 130.620  1.00 92.14           N  
ATOM      2  CA  ARG A  28       6.590  67.605 130.291  1.00 92.98           C  
ATOM      3  C   ARG A  28       7.073  66.494 129.345  1.00 92.59           C  
ATOM      4  O   ARG A  28       6.604  65.355 129.412  1.00 94.41           O  
ATOM      5  CB  ARG A  28       7.424  67.592 131.579  1.00 94.44           C  
ATOM      6  CG  ARG A  28       8.892  68.009 131.399  1.00 98.47           C  
ATOM      7  CD  ARG A  28       9.057  69.524 131.173  1.00100.38           C  
ATOM      8  NE  ARG A  28       8.447  70.007 129.929  1.00103.21           N  
ATOM      9  CZ  ARG A  28       8.993  69.901 128.716  1.00102.43           C  
ATOM     10  NH1 ARG A  28      10.180  69.327 128.558  1.00100.66           N  
ATOM     11  NH2 ARG A  28       8.343  70.366 127.654  1.00101.21           N  
ATOM     12  N   THR A  29       8.016  66.838 128.470  1.00 90.22           N  
ATOM     13  CA  THR A  29       8.577  65.908 127.487  1.00 87.63           C  
ATOM     14  C   THR A  29       9.851  65.196 127.960  1.00 87.54           C  
ATOM     15  O   THR A  29      10.650  65.762 128.708  1.00 86.81           O  
ATOM     16  CB  THR A  29       8.899  66.653 126.179  1.00 86.80           C  
ATOM     17  OG1 THR A  29       7.677  67.052 125.551  1.00 86.42           O  
ATOM     18  CG2 THR A  29       9.692  65.775 125.233  1.00 86.58           C  
ATOM     19  N   VAL A  30      10.038  63.955 127.509  1.00 86.06           N  
ATOM     20  CA  VAL A  30      11.214  63.162 127.876  1.00 85.61           C  
ATOM     21  C   VAL A  30      11.819  62.420 126.680  1.00 83.99           C  
ATOM     22  O   VAL A  30      11.138  61.626 126.027  1.00 83.81           O  
ATOM     23  CB  VAL A  30      10.869  62.127 128.962  1.00 85.41           C  
ATOM     24  CG1 VAL A  30      10.443  62.836 130.236  1.00 86.86           C  
ATOM     25  CG2 VAL A  30       9.761  61.216 128.474  1.00 85.88           C  
ATOM     26  N   LYS A  31      13.095  62.687 126.400  1.00 82.01           N  
ATOM     27  CA  LYS A  31      13.801  62.049 125.285  1.00 80.21           C  
ATOM     28  C   LYS A  31      14.443  60.767 125.783  1.00 78.14           C  
ATOM     29  O   LYS A  31      15.316  60.794 126.657  1.00 76.23           O  
ATOM     30  CB  LYS A  31      14.896  62.962 124.733  1.00 80.16           C  
ATOM     31  CG  LYS A  31      15.442  62.532 123.376  1.00 79.27           C  
ATOM     32  CD  LYS A  31      16.752  63.253 123.060  1.00 81.24           C  
ATOM     33  CE  LYS A  31      16.866  63.612 121.584  1.00 79.14           C  
ATOM     34  NZ  LYS A  31      15.879  64.670 121.218  1.00 78.95           N  
ATOM     35  N   LEU A  32      14.023  59.646 125.211  1.00 76.32           N  
ATOM     36  CA  LEU A  32      14.540  58.355 125.631  1.00 74.56           C  
ATOM     37  C   LEU A  32      15.356  57.681 124.543  1.00 71.45           C  
ATOM     38  O   LEU A  32      14.991  57.707 123.367  1.00 70.89           O  
ATOM     39  CB  LEU A  32      13.373  57.457 126.034  1.00 76.38           C  
ATOM     40  CG  LEU A  32      13.683  56.242 126.895  1.00 78.46           C  
ATOM     41  CD1 LEU A  32      14.416  56.685 128.150  1.00 82.41           C  
ATOM     42  CD2 LEU A  32      12.383  55.547 127.262  1.00 79.03           C  
ATOM     43  N   LEU A  33      16.467  57.078 124.944  1.00 68.54           N  
ATOM     44  CA  LEU A  33      17.325  56.379 124.003  1.00 65.98           C  
ATOM     45  C   LEU A  33      17.421  54.882 124.255  1.00 64.69           C  
ATOM     46  O   LEU A  33      17.763  54.439 125.360  1.00 61.69           O  
ATOM     47  CB  LEU A  33      18.735  56.957 124.020  1.00 64.32           C  
ATOM     48  CG  LEU A  33      18.944  58.227 123.202  1.00 65.71           C  
ATOM     49  CD1 LEU A  33      20.435  58.515 123.129  1.00 62.52           C  
ATOM     50  CD2 LEU A  33      18.368  58.046 121.804  1.00 65.12           C  
ATOM     51  N   LEU A  34      17.108  54.111 123.216  1.00 62.63           N  
ATOM     52  CA  LEU A  34      17.203  52.662 123.271  1.00 57.92           C  
ATOM     53  C   LEU A  34      18.521  52.328 122.608  1.00 56.33           C  
ATOM     54  O   LEU A  34      18.633  52.392 121.388  1.00 58.81           O  
ATOM     55  CB  LEU A  34      16.069  52.013 122.482  1.00 58.79           C  
ATOM     56  CG  LEU A  34      14.715  51.943 123.175  1.00 61.10           C  
ATOM     57  CD1 LEU A  34      13.685  51.426 122.200  1.00 58.98           C  
ATOM     58  CD2 LEU A  34      14.815  51.031 124.402  1.00 59.55           C  
ATOM     59  N   LEU A  35      19.527  51.999 123.402  1.00 54.20           N  
ATOM     60  CA  LEU A  35      20.825  51.653 122.846  1.00 53.19           C  
ATOM     61  C   LEU A  35      21.157  50.197 123.158  1.00 54.07           C  
ATOM     62  O   LEU A  35      20.560  49.593 124.054  1.00 54.18           O  
ATOM     63  CB  LEU A  35      21.911  52.556 123.425  1.00 52.87           C  
ATOM     64  CG  LEU A  35      21.745  54.072 123.284  1.00 55.11           C  
ATOM     65  CD1 LEU A  35      23.086  54.737 123.612  1.00 52.86           C  
ATOM     66  CD2 LEU A  35      21.309  54.436 121.872  1.00 49.52           C  
ATOM     67  N   GLY A  36      22.118  49.642 122.421  1.00 52.59           N  
ATOM     68  CA  GLY A  36      22.527  48.263 122.639  1.00 50.76           C  
ATOM     69  C   GLY A  36      23.074  47.644 121.370  1.00 47.34           C  
ATOM     70  O   GLY A  36      22.770  48.118 120.284  1.00 46.63           O  
ATOM     71  N   ALA A  37      23.863  46.583 121.494  1.00 45.98           N  
ATOM     72  CA  ALA A  37      24.441  45.937 120.322  1.00 45.81           C  
ATOM     73  C   ALA A  37      23.359  45.294 119.479  1.00 46.39           C  
ATOM     74  O   ALA A  37      22.184  45.266 119.866  1.00 46.64           O  
ATOM     75  CB  ALA A  37      25.472  44.887 120.742  1.00 45.26           C  
ATOM     76  N   GLY A  38      23.756  44.768 118.323  1.00 45.96           N  
ATOM     77  CA  GLY A  38      22.785  44.149 117.436  1.00 43.61           C  
ATOM     78  C   GLY A  38      22.108  42.942 118.049  1.00 44.49           C  
ATOM     79  O   GLY A  38      22.775  42.085 118.640  1.00 47.30           O  

my code, aims to change res > 33 from chain A to chain B :

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Dec  9 20:35:20 2022

@author: bob

https://stackoverflow.com/questions/74735845/splitting-and-renaming-protein-chain-with-biopythons-biopdb

"""


from Bio.PDB import PDBParser, PDBIO

from Bio.PDB.Chain import Chain

from Bio.PDB.Model import Model

from Bio.PDB.Structure import Structure

dirPath = ''

name = 'result_A_'

structure = PDBParser().get_structure(dirPath, 'test.pdb')


chainA = 33

res_to_change = []

for model in structure:
    
    # print('\n model : ', model , model.get_id(),'\n')
    
    for chains in model:
        for residues in chains:
      
           print('residues.get_id() : ', residues.get_id())
          
           if residues.get_id()[1] > chainA:
             res_to_change.append(residues)
     

   
print('residue to change chain : ', res_to_change)

print('\n model : ', model , model.get_id(),'\n')



### SEE https://stackoverflow.com/questions/25884758/deleteing-residue-from-pdb-using-biopython-library

for model in structure:
    for chain in model:
        [chain.detach_child(res.get_id()) for res in res_to_change]
    


### SEE https://stackoverflow.com/questions/33364370/how-to-add-chain-id-in-pdb

my_chain = Chain("B")

model.add(my_chain)

for res in res_to_change:
    my_chain.add(res)
    
    
io = PDBIO()
io.set_structure(model)
savename = "{}_edit.pdb".format(name)


io.save(savename,  write_end = True, preserve_atom_numbering = True)


### above I detached B chain residues and reattached a new chain B to my model made with detached residues

### below I create an empty structure and attach both old structure chain A with deleted residues and chain B as above

my_structure = Structure('1')

my_model = Model('1')

my_structure.add(my_model)

my_model.add(model['A'])

my_model.add(my_chain)

print(my_model)

for i in my_model:
    print(i)
    
    for ii in i:
        print(ii)
        for iii in ii:
            print(iii)

io2 = PDBIO()

io2.set_structure(my_model)
savename = "{}_edit_new_model.pdb".format(name)

io2.save(savename,  write_end = False, preserve_atom_numbering = True)
    

it saves two pdb files, both correct to me. I tried both ways because I was getting wrong results in the chain A TER atom numbering such as:

ATOM     49  CD1 LEU A  33      20.435  58.515 123.129  1.00 62.52           C  
ATOM     50  CD2 LEU A  33      18.368  58.046 121.804  1.00 65.12           C  
TER      51      LEU A  33     <-------------- ERROR ??                                                       
ATOM     51  N   LEU B  34      17.108  54.111 123.216  1.00 62.63           N  
ATOM     52  CA  LEU B  34      17.203  52.662 123.271  1.00 57.92           C

Wasnt able to figure out why, got better results using:

io.save(savename, write_end = False, preserve_atom_numbering = True) ;

preserve_atom_numbering = True/False( as default) makes the difference see:

ATOM     49  CD1 LEU A  33      20.435  58.515 123.129  1.00 62.52           C  
ATOM     50  CD2 LEU A  33      18.368  58.046 121.804  1.00 65.12           C  
TER      50      LEU A  33   <------------------- Here !!!                                                    
ATOM     51  N   LEU B  34      17.108  54.111 123.216  1.00 62.63           N  
ATOM     52  CA  LEU B  34      17.203  52.662 123.271  1.00 57.92           C

dont understand why, hopefully somebody here could help.