Compute "substring" distances between sequences

Question

Compute "substring" distances between sequences

66 Views Asked by Thomas At 28 July 2025 at 01:30

My dataset (first line = header) is the following:

ID;Activity 1;Activity 2; ... ;Activity 20;
Company_X;A1A3T1D1O1R1R8;A1A3T2O1R2;...;A1A3T6D2O1O2R2
Company_Y;A1A3T1O1R1;A1A3T2O1R2;...;A1A3T11O1O3R5
Company Z;A1A3T1D8O1R1R8;A1A3T2O1R2;...;A1A3T6D2O1R2

where for each activity, each pair (one letter + one number) represents on part of a sequence. A1=actor1, A3=actor3, O1=object1. What I try to do is to compute the difference between the activities of companies. For instance the activity1 of company_x should have a difference of - e.g., 2 with the activity1 of company_y since they have in common A1A3T1O1R1 but not D1 and R8.

Can any packages in TraMineR do that? Which means comparing, within each event, a predefined number of chars?

Thank you very much for your help

Original Q&A

There are 1 best solutions below

**Gilbert** · Accepted Answer

From what I understand, each string (activity) like A1A3T6D2O1O2R2 should be considered as a sequence of pairs and you want to compare such sequences.

The seqdef function of TraMineR can read sequences in string form. However, when each element is defined by more than a single character, you have to introduce a separator (e.g., A1-A3-T6) for that. Then, to pair your sequences with company names you may also need to organize your data in table form with each sequence (activity) in a separate row, something like

ID         Activity
company_x  A1-A3-T6-D2-O1-O2-R2
company_y  A1-A3-T1-O1-R1
...

Then, you can compute dissimilarities using measures applicable to sequences of different lengths. Optimal matching (OM), for instance, is the minimal cost of transforming one sequence into the other given the indel and substitution costs. This should give you what you expect. Depending on the substitution costs, the distance between A1A3T6D2O1O2R2 and A1A3T6D2O1R2, could be different than between A1A3T6D2O1O2R2 and A3T4

Compute "substring" distances between sequences

There are 1 best solutions below

Related Questions in TRAMINER

Trending Questions

Popular # Hahtags

Popular Questions