High dot density gives negative interaction area in PyMOL

98 Views Asked by At

I am working out the area of interaction between a protein and ligand (in this example, PDB ID = 1r0r), and am experimenting with different dot densities. The file I am using has been fixed with PDBFixer and gives slightly different results to the one you can download from the PDB.

I am doing the following in PyMOL to calculate the area of interaction:

remove solvent
set dot_density, x
create 'sel1', E// #First selection is chain E
create 'sel2', * and not E// #Second selection is everything else, in this case chain I
create 'whole', * #To get the entire surface

print cmd.get_area('sel1') + cmd.get_area('sel2') - cmd.get_area('whole')

This should in theory give the area of the chains as standalone objects minus the area of them complexed together, thus leaving me with the area of interaction between them. This works fine with dot density 1 and 2 (though gives me vastly different answers), yet when I ramp it up to dot density 3 I start getting negative numbers and I can't work out why that is. Here is a visualisation of the molecule (selection 1 in pink, selection 2 is teal and the whole selection is pale green):

Whole complex

enter image description here

Interaction area

enter image description here

enter image description here


Using my own fixed file:

[in]
set dot_density, 1
print cmd.get_area('sel1') + cmd.get_area('sel2') - cmd.get_area('whole')
[out]
6462.04248046875

[in]
set dot_density, 2
print cmd.get_area('sel1') + cmd.get_area('sel2') - cmd.get_area('whole')
[out]
2663.0439453125

[in]
set dot_density, 3
print cmd.get_area('sel1') + cmd.get_area('sel2') - cmd.get_area('whole')
[out]
-357.0478515625

[in]
set dot_density, 4
print cmd.get_area('sel1') + cmd.get_area('sel2') - cmd.get_area('whole')
[out]
-152.4716796875

Using the file obtained directly from the PDB:

[in]
set dot_density, 1
print cmd.get_area('sel1') + cmd.get_area('sel2') - cmd.get_area('whole')
[out]
14053.78369140625

[in]
set dot_density, 2
print cmd.get_area('sel1') + cmd.get_area('sel2') - cmd.get_area('whole')
[out]
2272.84228515625

[in]
set dot_density, 3
print cmd.get_area('sel1') + cmd.get_area('sel2') - cmd.get_area('whole')
[out]
-735.10107421875

[in]
set dot_density, 4
print cmd.get_area('sel1') + cmd.get_area('sel2') - cmd.get_area('whole')
[out]
124.2021484375

I was hoping for these numbers to be at least similar (there's a difference of over 14,000 square Angstroms between all of these) and it would be great if they were all positive too. The higher dot density is supposed to give a more accurate measurement but I am not buying it right now :P

The end goal is to use a Python script that can take a list of PDB codes and do all this in one go and the script is all written out, I just need to figure out which PyMOL settings I should be using for the results to be reliable. I am also looking to use set dot_solvent, on and compare these results but clearly I'm not quite at that stage yet!

Many thanks,

Amy

1

There are 1 best solutions below

0
pippo1980 On

this is not an answer, just trying to figure out the question better. To help people to answer. Using same .pdb file 1r0r.pdb, https://www.rcsb.org/structure/1r0r; must underline that is .pdb cmd.fetch will dowload 1r0r.cif that could give different numbers.

with this code:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Jul 21 16:07:51 2023

@author: bob



"""
import pymol
import sys

from pymol import ( cmd ,
                    stored 
                    )

print('########## PYMOL VERSION ##########################################')
print('         ',  cmd.get_version() )
print('###################################################################')

pymol.finish_launching()

cmd.set('internal_gui_width' ,  '500')

cmd.load('1r0r.pdb' , '1r0r')

cmd.remove('resn HOH')

cmd.h_add('1r0r')

cmd.flag('ignore' , 'none')

cmd.flag('ignore' , 'solvent') #☺ redundant already removed HOH above

cmd.select('sel1' , 'chain E' )

cmd.create('sel1_obj' , 'sel1')

cmd.select('sel2' , '*  and not chain E' )

cmd.create('sel2_obj' , 'sel2')

cmd.select('whole' , '*' )

cmd.create('whole_obj' , 'whole')

cmd.set('solvent_radius' , 1.4)  # default 1.4 for solvent accessible surface area (SASA) 

cmd.set('surface_mode' ,'1')

# cmd.set('dot_hydrogens' , 0) #(boolean, default: on),
                               # controls whether or not dots are drawn on hydrogen atoms.
                               # not used by get_area 

print('\n\n----------------------------------------------------------------------------------')
print('Setting dot_solvent = 0 ,use vdW')
cmd.set('dot_solvent' , 0)

stored.vdw_list  =[]

for i in range(-2 , 10 , 1) :
    
    cmd.set('dot_density' , i)
    
    sel1 = cmd.get_area('sel1_obj') 
    
    sel2 = cmd.get_area('sel2_obj') 
    
    whole = cmd.get_area('whole_obj')

    print('\nfor dot_density : ' , i )
    print('result is : ' , sel1 ,' + ', sel2 ,' - ' , whole ,' = ' , sel1+sel2-whole)
    
    stored.vdw_list.append((sel1, sel2, whole, sel1+sel2-whole))

print('\n-\n---------------------------------------------------------------------------------')
print('Setting dot_solvent = 1 , use solvent accessible surface area (SASA) ')
cmd.set('dot_solvent' , 1)

stored.sasa_list = []

for i in range(-2 , 10 , 1) :
    
    cmd.set('dot_density' , i)
       
    sel1 = cmd.get_area('sel1_obj') 
    
    sel2 = cmd.get_area('sel2_obj') 
    
    whole = cmd.get_area('whole_obj')

    print('\nfor dot_density : ' , i )
    print('result is : ' , sel1 ,' + ', sel2 ,' - ' , whole ,' = ' , sel1+sel2-whole)

    stored.sasa_list.append((sel1, sel2, whole, sel1+sel2-whole))
 
cmd.sync(timeout=2000 , poll=0.5)

cmd.do('run runnable.py')

and this added code (same folder of previous script) , runnable.py:

#!/usr/bin/env python3

import sys
import matplotlib
matplotlib.use('Qt5Agg')

from pymol.Qt import QtWidgets, QtCore
from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg
from matplotlib.figure import Figure
from matplotlib import pyplot as plt

class MplCanvas(FigureCanvasQTAgg):

    def __init__(self, parent=None, width=5, height=4, dpi=100):
        fig = Figure(figsize=(width, height), dpi=dpi)
        self.axes = fig.add_subplot(111)
        super(MplCanvas, self).__init__(fig)


class MainWindow(QtWidgets.QMainWindow):

    def __init__(self, vdw_list, sasa_list , *args, **kwargs):
        super(MainWindow, self).__init__(*args, **kwargs)
        
        # Create the maptlotlib FigureCanvas object,
        # which defines a single set of axes as self.axes.
        sc = MplCanvas(self, width=5, height=4, dpi=100)
        # sc.axes.hist(self.y , bins = 5)
        
        fig , axes = plt.subplots(4,2)

        labels = [i for i in range(-2 , 10 , 1)]

        axes[0,0].plot(labels , [i[3] for i in vdw_list], 'tab:red')
        axes[0,0].set_title('vdW_interaction_surface')
        axes[1,0].plot(labels , [i[2] for i in vdw_list], 'tab:orange')
        axes[1,0].set_title('vdW_whole')
        axes[2,0].plot(labels , [i[1] for i in vdw_list], 'tab:orange')
        axes[2,0].set_title('vdW_sel2')
        axes[3,0].plot(labels , [i[0] for i in vdw_list], 'tab:orange')
        axes[3,0].set_title('vdW_sel1')
        axes[0,1].plot(labels , [i[3] for i in sasa_list], 'tab:blue')
        axes[0,1].set_title("SASA_interaction_surface")
        axes[1,1].plot(labels , [i[2] for i in sasa_list], 'tab:cyan')
        axes[1,1].set_title('SASA_whole')                 
        axes[2,1].plot(labels , [i[1] for i in sasa_list], 'tab:cyan')
        axes[2,1].set_title('SASA_sel2')  
        axes[3,1].plot(labels , [i[0] for i in sasa_list], 'tab:cyan')
        axes[3,1].set_title('SASA_sel1')
            
        for ax in axes.flat:
            ax.xaxis.label.set_size(16)
            ax.yaxis.label.set_size(16)
            ax.set(xlabel = 'dot_density' , ylabel = 'Surface Area')
            
        for ax in axes.flat:
            ax.label_outer()

        plt.show()
        
        self.setCentralWidget(sc)
        self.show()
        # matplotlib.plot.show()

vdw_list = stored.vdw_list
sasa_list = stored.sasa_list

w = MainWindow(vdw_list, sasa_list)
w.setWindowTitle('run runnable.py')

I get these results from pymol.cmd.get_area(.... calculating both vand der Walls Surface (vdW) and Solvent Accessible Surface (SASA) , didn't find a way to get the Solvent Excluded Surface / Molecular Surface.

NOTE: Not sure I got the names above right I am assuming this pic is correct, please correct me if wrong:

enter image description here

Results:

----------------------------------------------------------------------------------
Setting dot_solvent = 0 ,use vdW

for dot_density :  -2
result is :  27676.630859375  +  5604.74755859375  -  27798.70703125  =  5482.67138671875
for dot_density :  -1
result is :  27676.630859375  +  5604.74755859375  -  27798.70703125  =  5482.67138671875
for dot_density :  0
result is :  27676.630859375  +  5604.74755859375  -  27798.70703125  =  5482.67138671875
for dot_density :  1
result is :  27535.646484375  +  5617.2861328125  -  26300.912109375  =  6852.0205078125
for dot_density :  2
result is :  27579.734375  +  5631.95458984375  -  30639.333984375  =  2572.35498046875
for dot_density :  3
result is :  27638.85546875  +  5648.96044921875  -  33593.3515625  =  -305.53564453125
for dot_density :  4
result is :  27423.462890625  +  5653.36474609375  -  33075.28515625  =  1.54248046875
for dot_density :  5
result is :  27423.462890625  +  5653.36474609375  -  33075.28515625  =  1.54248046875
for dot_density :  6
result is :  27423.462890625  +  5653.36474609375  -  33075.28515625  =  1.54248046875
for dot_density :  7
result is :  27423.462890625  +  5653.36474609375  -  33075.28515625  =  1.54248046875
for dot_density :  8
result is :  27423.462890625  +  5653.36474609375  -  33075.28515625  =  1.54248046875
for dot_density :  9
result is :  27423.462890625  +  5653.36474609375  -  33075.28515625  =  1.54248046875

-
---------------------------------------------------------------------------------
Setting dot_solvent = 1 , use solvent accessible surface area (SASA) 

for dot_density :  -2
result is :  9964.12890625  +  3685.79638671875  -  15456.37109375  =  -1806.44580078125
for dot_density :  -1
result is :  9964.12890625  +  3685.79638671875  -  15456.37109375  =  -1806.44580078125
for dot_density :  0
result is :  9964.12890625  +  3685.79638671875  -  15456.37109375  =  -1806.44580078125
for dot_density :  1
result is :  9981.83203125  +  3607.128662109375  -  7254.7294921875  =  6334.231201171875
for dot_density :  2
result is :  9967.2080078125  +  3661.2802734375  -  12123.0185546875  =  1505.4697265625
for dot_density :  3
result is :  9951.7001953125  +  3649.372802734375  -  11556.5126953125  =  2044.560302734375
for dot_density :  4
result is :  9954.16796875  +  3648.069091796875  -  11638.8994140625  =  1963.337646484375
for dot_density :  5
result is :  9954.16796875  +  3648.069091796875  -  11638.8994140625  =  1963.337646484375
for dot_density :  6
result is :  9954.16796875  +  3648.069091796875  -  11638.8994140625  =  1963.337646484375
for dot_density :  7
result is :  9954.16796875  +  3648.069091796875  -  11638.8994140625  =  1963.337646484375
for dot_density :  8
result is :  9954.16796875  +  3648.069091796875  -  11638.8994140625  =  1963.337646484375
for dot_density :  9
result is :  9954.16796875  +  3648.069091796875  -  11638.8994140625  =  1963.337646484375

and the accompaining plot:

enter image description here

that tries to describe pymol calculated surfaces.

Note aside

set dot_density accepts both negative and positive integers (not float) but 0 and below seems to give same results while above 4 at visually and get_area results are same too. Wasn't able to get any google result for the algorithm used by pymol.

Some picture to visually depict the problem:

VdW Surface :

VdW Surface

SASA Surface:

SASA Surface

dots, dot density -> 0 ; solvent on (like SASA)

dots, dot density -> 0 ; solvent on (like SASA)

dots , dot density -> 4 ; solvent on (like SASA)

dots, dot density -> 4 ; solvent on (like SASA)

dots, dot density -> 4 ; solvent off (like vdW)

dots, dot density -> 4 ; solvent off (like vdW)

Biopython seems to have a SASA module https://biopython.org/docs/dev/api/Bio.PDB.SASA.html .

here the code, not sure its right, please check throughly:

from pymol import ( cmd ,
                    stored 
                    )

print('########## PYMOL VERSION ##########################################')
print('         ',  cmd.get_version() )
print('###################################################################')



cmd.load('1r0r.pdb' , '1r0r')

cmd.select('sel1' , 'chain E' )  # for 1r0r chain E , I 

cmd.select('sel2' , '*  and not chain E' ) # for 1r0r

cmd.select('whole' , '*' )


# cmd.load('1yu6_A_C.pdb' , '1yu6') 

# cmd.select('sel1' , 'chain A' ) #for 1yu6 chains A,C

# cmd.select('sel2' , '*  and not chain A' ) #for 1yu6 chains A,C

# cmd.select('whole' , '*' )



cmd.create('sel1_obj' , 'sel1')

cmd.create('sel2_obj' , 'sel2')

cmd.create('whole_obj' , 'whole')

cmd.remove('resn HOH')

cmd.save('whole_obj.pdb' , 'whole_obj')

cmd.save('sel1_obj.pdb' , 'sel1_obj')

cmd.save('sel2_obj.pdb' , 'sel2_obj')


from Bio.PDB import PDBParser

from Bio.PDB.SASA import ShrakeRupley as srm


parser_whole = PDBParser(PERMISSIVE = 1, QUIET=True)   ### sopprime warnings

structure_whole = parser_whole.get_structure("whole", "whole_obj.pdb") 

parser_sel1 = PDBParser(PERMISSIVE = 1, QUIET=True)   ### sopprime warnings

structure_sel1 = parser_whole.get_structure("sel1", "sel1_obj.pdb") 

parser_sel2 = PDBParser(PERMISSIVE = 1, QUIET=True)   ### sopprime warnings

structure_sel2 = parser_whole.get_structure("sel2", "sel2_obj.pdb") 



sasa_list = []

start = 1

n_points_max = 150

step = 2

baseline = 0

delta = []

for n_points in range(start ,n_points_max, step) :    # ValueError: Number of sphere points must be larger than 1: -5

    sr_whole = srm(probe_radius =1.40, n_points = n_points, radii_dict = None)
    sr_whole.compute(structure_whole, level = "S")
    
    # sasa_whole = structure_whole[0]['E'][1]['N'].sasa # for level 'A' (if level S , calculate Atom value)
                                                        #               ( if lever R , doest not give Structure value)
    
    sasa_whole = structure_whole.sasa
    
    # print('\n\n .------------> ', sasa_whole ,'\n\n')
    
    sr_sel1 = srm(probe_radius =1.40, n_points = n_points, radii_dict = None)
    sr_sel1.compute(structure_sel1, level="S")
    sasa_sel1 = structure_sel1.sasa
    
    sr_sel2 = srm(probe_radius =1.40, n_points = n_points, radii_dict = None)
    sr_sel2.compute(structure_sel2, level="S")
    sasa_sel2 = structure_sel2.sasa
    

    
    
    print('\n-\n---------------------------------------------------------------------------------')
    print('Printing solvent accessible surface area (SASA)')
    print("""n_points (int) – resolution of the surface of each atom """)
    print('\nfor n_points: ' , n_points )
    print('result is : ' , sasa_sel1 ,' + ', sasa_sel2 ,' - ' , sasa_whole ,' = ' ,
          
           sasa_sel1+sasa_sel2-sasa_whole)

    sasa_list.append((sasa_sel1, sasa_sel2, sasa_whole, sasa_sel1+sasa_sel2-sasa_whole))
    
    
    delta.append(abs((sasa_sel1+sasa_sel2-sasa_whole) -baseline))
    
    baseline = (sasa_sel1+sasa_sel2-sasa_whole)


print('\n-\n---------------------------------------------------------------------------------')
print('Printing solvent accessible surface area (SASA)')   
print(sasa_list[-1], '   --> size ' , len(sasa_list))

print(delta[-1], '   --> size ' , len(delta))


from runnable_CONVERGE import graph, graph_conv,  plt


graph(sasa_list , start , n_points_max , step)

graph_conv(delta, start, n_points_max, step)


plt.show()

and graph code runnable_CONVERGE.py:

from matplotlib import pyplot as plt


def graph(sasa_list, start, n_points_max, step):
        
        # fig , axes = plt.subplots(2,2, sharex=True, sharey = True)
        
        fig , axes = plt.subplots(2,2, sharex=True, sharey = False)


        labels = [i for i in range(start, n_points_max, step)]


        axes[0,1].plot(labels , [i[3] for i in sasa_list], 'tab:red')
        axes[0,1].set_title("SASA_interaction_surface")

        axes[0,0].plot(labels , [i[2] for i in sasa_list], 'tab:blue')
        axes[0,0].set_title('SASA_whole')
                            
        axes[1,1].plot(labels , [i[1] for i in sasa_list], 'tab:cyan')
        axes[1,1].set_title('SASA_sel2')
            
        axes[1,0].plot(labels , [i[0] for i in sasa_list], 'tab:cyan')
        axes[1,0].set_title('SASA_sel1')
            
        for ax in axes.flat:
            
            ax.xaxis.label.set_size(16)
            ax.yaxis.label.set_size(16)
            ax.set(xlabel = 'n_points' , ylabel = 'Surface Area')
            
        # for ax in axes.flat:
            
        #     ax.label_outer()


        # plt.show()
        
def graph_conv(delta,  start, n_points_max, step) :
    
    if len(delta) >= 20:

        import matplotlib.pyplot as plt
        
        import matplotlib.font_manager as font_manager
        
        y = [i for i in delta][-20 :]
        
        x = [n_points for n_points in range(start ,n_points_max, step)][-20 :]
        
        
        fig, axs = plt.subplots()
        
        axs.set_xticks(x)
        
        axs.set_xticklabels(x)
        
        axs.axhline(y=0 , color = 'red' , linestyle = 'dotted')
        
        # axs.axhline(y=25 , color = 'blue' , linestyle = 'dotted')
        
        axs.axhline(y=max(delta[-20:]),  color = 'blue' , linestyle = 'dotted', 
                    
                                label = str(round(max(delta[-20:]),2)))
        
        
        font = font_manager.FontProperties(weight = 'bold',
                                           size = 16)
        
        axs.legend(loc = 'lower center', prop = font)
        
        # axs.set_ylim(-1000,1000)
        
        axs.set_ylim(-100,100)
        
        # axs.set_ylim(-10,10)
        
        axs.xaxis.label.set_size(2)
        
        axs.plot(x,y, color = 'black')
        
        # plt.show()

that uses Bio.PDB.SASA.ShrakeRupley(probe_radius=1.40, n_points=100, radii_dict=None) with n_points [100 optimum as per Biopython documentation] going from 1 to 150 in step of 2 : enter image description here

and to better visualize how SASA goes toward the end of the alghoritm:

enter image description here

with calculated difference between last and second-last SASA values of 3.4508421817808994 and maximum difference between SASA and previous SASA in th last 20 calculated SASA of 17.7

Biopython SASA.py is at https://github.com/biopython/biopython/blob/master/Bio/PDB/SASA.py , quoting it :

"""Calculation of solvent accessible surface areas for Bio.PDB entities. Uses the "rolling ball" algorithm developed by Shrake & Rupley algorithm, which uses a sphere (of equal radius to a solvent molecule) to probe the surface of the molecule. Reference: Shrake, A; Rupley, JA. (1973). J Mol Biol "Environment and exposure to solvent of protein atoms. Lysozyme and insulin". """

Retriving the atom coordinates from a hacked version of SASA.py we can compare the number of atoms having a Solvent Accessible Surface (calculate with parameter n_points = 800) with the surface (surface_solvent on) generated by pymol for the whole 1r0r.pdb.

All the atoms (2310) sphere cgo placed at atom coords vs pymol surface:

enter image description here

Only the atoms (995) with a SASA > 0 :

enter image description here

Using parameter n_points = 1200 , and adding to the cgo spheres (around 1100 circa) the radius of relevant atom element plus the one of the probe , I get this pics of the entire SASA surface in green , compared to the one generated by pymol in rainbow by element (superposed to the first one) :

enter image description here

It was kind of tricky to get this image because I did not find a way to combine all the spheres to a single cgo object (https://pymol-users.narkive.com/wtPZzPwg/pymol-combine-two-objects-in-pymol )