I hope you can help me with this problem, I have the following data like this:
ID,colour
1,base_yellow
1,blue
1,base_red
1,blue
1,pink
1,blue
1,base_yellow
2,base_yellow
2,blue
2,base_red
2,blue
2,pink
2,blue
2,base_yellow
3,base_yellow
3,blue
3,pink
3,blue
3,base_yellow
4,base_yellow
4,blue
4,green
4,blue
4,green
4,blue
4,pink
4,blue
4,base_yellow
Every time meet with base (base_yellow, base_red), it creates new group, the output that is expected as shown below, which gives a new variable:
ID,colour
1,base_yellow; blue; base_red
1,base_red; blue; pink;blue;base_yellow
2,base_yellow; blue; base_red
2,base_red; blue; pink;blue; base_yellow
3,base_yellow;blue;pinkblue;base_yellow
4,base_yellow; blue;green;blue;green;blue;pink;blue;base_yellow
This is something you might be able to adapt for your needs.
First, create a vector
vecthat includes row positions wherecolourstarts with "base".Then, you can use
map2_dfrfrompurrrthat will providecolourthat ranges from start to end positions based onvec. This will help with situations where the samecolouris used in more than one row in the end. A grouping variablegroupis also created in this step.After grouping by
group, you can keep onlycolourgroups that have more than onecolourandstr_cto collapse them together for the samegroup.Output