Calculating pairwise distances from external file

521 Views Asked by At

I have data in a text file of the following format:

t  x      y      z
Pb 1.0000 1.5000 2.5000
Pb 1.1000 1.6000 2.6000
S  1.2000 1.4000 2.4000

I would like to be able to calculate pairwise distances for all of the coordinates I have, but somehow retain the atomic species identity (i.e. attach a string describing with t values were used in the calculation). The pairwise distance formula I am using is just the Euclidean distance matrix:

r_ij = abs(ri - rj)

Where ri/rj are the coordinates in 3D space.

I can easily find the values of r_ij using this method (and reformatting the data so it a 3xN numpy array of just coordinate data):

def r_ij_numpy(coords):
r = np.dot(coords, coords.T)
m = np.tile(np.diag(r), (N,1))
r = np.sqrt(m + m.T - 2*r)
r = np.triu(r).ravel()
return r[np.nonzero(r)]

But I can't seem to find a way to tag atom types with it (i.e. refactor the output to an array of tuples, with each tuple being the value of r_ij attached to a string of both atom types (i.e. 'Pb-S').

Thank you!

1

There are 1 best solutions below

0
On

I would use cdist to calculate pairwise distances and then add each pair to a dictionary of pairs, eg. 'Pb-Pb' and 'Pb-S'.

import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist
from collections import defaultdict

# load data in question
df = pd.read_csv('atoms.txt', delimiter='\t')
# get atom strings
atoms = df['t'].values

# get xyzzy values as array
loc = df[['x', 'y', 'z']].values

# compute pairwise distances
dists = cdist(loc, loc)
# only keep one count of each
dists = np.triu(dists, k=1)

# get indices of non-self distances
indices = np.column_stack(np.nonzero(dists))

# add pairs to dictionary
pairs = defaultdict(list)

for i, j in indices:
    pair = f'{atoms[i]}-{atoms[j]}'
    print(f'{pair}, distance: {dists[i, j]}')
    pairs[pair].append(dists[i, j])

>>> Pb-Pb, distance: 0.1732050807568879
    Pb-S, distance: 0.24494897427831783
    Pb-S, distance: 0.30000000000000016

pairs
>>> {'Pb-Pb': [0.1732050807568879],
     'Pb-S': [0.24494897427831783, 0.30000000000000016]}

This way you can see all the atomic spacings between different chemicals. By keeping the array order dists you keep the atom order from your dataset.