Anyone knows a more efficient way to run a pairwise comparison of hundreds of trajectories?

461 Views Asked by At

So I have two different files containing multiple trajectories in a squared map (512x512 pixels). Each file contains information about the spatial position of each particle within a track/trajectory (X and Y coordinates) and to which track/trajectory that spot belongs to (TRACK_ID). My goal was to find a way to cluster similar trajectories between both files. I found a nice way to do this (distance clustering comparison), but the code it's too slow. I was just wondering if someone has some suggestions to make it faster.

My files look something like this:

enter image description here

The approach that I implemented finds similar trajectories based on something called Fréchet Distance (maybe not to relevant here). Below you can find the function that I wrote, but briefly this is the rationale:

  • group all the spots by track using pandas.groupby function for file1 (growth_xml) and file2 (shrinkage_xml)
  • for each trajectories in growth_xml (loop) I compare with each trajectory in growth_xml
  • if they pass the Fréchet Distance criteria that I defined (an if statement) I save both tracks in a new table. you can see an additional filter condition that I called delay, but I guess that is not important to explain here.

so really simple:

def distance_clustering(growth_xml,shrinkage_xml):

coords_g = pd.DataFrame() # empty dataframes to save filtered tracks
coords_s = pd.DataFrame()

counter = 0  #initalize counter to count number of filtered tracks

for track_g, param_g in growth_xml.groupby('TRACK_ID'):

    # define growing track as multi-point line object 
    traj1 = [(x,y) for x,y in zip(param_g.POSITION_X.values, param_g.POSITION_Y.values)]

    for track_s, param_s in shrinkage_xml.groupby('TRACK_ID'):

        # define shrinking track as a second multi-point line object 
        traj2 = [(x,y) for x,y in zip(param_s.POSITION_X.values, param_s.POSITION_Y.values)]

        # compute delay between shrinkage and growing ends to use as an extra filter
        delay = (param_s.FRAME.iloc[0] - param_g.FRAME.iloc[0])

        # keep track only if the frechet Distance is lower than 0.2 microns

        if frechetDist(traj1, traj2) < 0.2 and delay > 0:

            counter += 1

            param_g = param_g.assign(NEW_ID = np.ones(param_g.shape[0]) * counter)
            coords_g = pd.concat([coords_g, param_g])

            param_s = param_s.assign(NEW_ID = np.ones(param_s.shape[0]) * counter)
            coords_s = pd.concat([coords_s, param_s])

coords_g.reset_index(drop = True, inplace = True)
coords_s.reset_index(drop = True, inplace = True)
return coords_g, coords_s

The main problem is that most of the times I have more than 2 thousand tracks (!!) and this pairwise combination takes forever. I'm wondering if there's a simple and more efficient way to do this. Perhaps by doing the pairwise combination in multiple small areas instead of the whole map? not sure...

2

There are 2 best solutions below

2
On

Have you tried to make a matrix (DeltaX,DeltaY) lookUpTable for the pairwise combination distance. It will take some long time to calc the LUT once, or you can write it in a file and load it when the algo starts. Then you'll only have to look on correct case to have the result instead of calc each time.

You can too make a polynomial regression for the distance calc, it will be less precise but definitely faster

0
On

Maybe not an outright answer, but it's been a while. Could you not segment the lines and use minimum bounding box around each segment to assess similarities? I might be thinking of your problem the wrong way around. I'm not sure. Right now I'm trying to work with polygons from two different data sets and want to optimize the processing by first identifying the polygons in both geometries that overlap. In your case, I think segments would you leave you with some edge artifacts. Maybe look at this paper: https://drops.dagstuhl.de/opus/volltexte/2021/14879/pdf/OASIcs-ATMOS-2021-10.pdf or this paper (with python code): https://www.austriaca.at/0xc1aa5576_0x003aba2b.pdf