My dataset (first line = header) is the following:
ID;Activity 1;Activity 2; ... ;Activity 20;
Company_X;A1A3T1D1O1R1R8;A1A3T2O1R2;...;A1A3T6D2O1O2R2
Company_Y;A1A3T1O1R1;A1A3T2O1R2;...;A1A3T11O1O3R5
Company Z;A1A3T1D8O1R1R8;A1A3T2O1R2;...;A1A3T6D2O1R2
where for each activity, each pair (one letter + one number) represents on part of a sequence. A1=actor1, A3=actor3, O1=object1. What I try to do is to compute the difference between the activities of companies. For instance the activity1 of company_x should have a difference of - e.g., 2 with the activity1 of company_y since they have in common A1A3T1O1R1 but not D1 and R8.
Can any packages in TraMineR do that? Which means comparing, within each event, a predefined number of chars?
Thank you very much for your help
From what I understand, each string (activity) like
A1A3T6D2O1O2R2
should be considered as a sequence of pairs and you want to compare such sequences.The
seqdef
function of TraMineR can read sequences in string form. However, when each element is defined by more than a single character, you have to introduce a separator (e.g., A1-A3-T6) for that. Then, to pair your sequences with company names you may also need to organize your data in table form with each sequence (activity) in a separate row, something likeThen, you can compute dissimilarities using measures applicable to sequences of different lengths. Optimal matching (OM), for instance, is the minimal cost of transforming one sequence into the other given the indel and substitution costs. This should give you what you expect. Depending on the substitution costs, the distance between A1A3T6D2O1O2R2 and A1A3T6D2O1R2, could be different than between A1A3T6D2O1O2R2 and A3T4