I hope you can help me with this problem, I have the following data like this:
ID,colour
1,base_yellow
1,blue
1,base_red
1,blue
1,pink
1,blue
1,base_yellow
2,base_yellow
2,blue
2,base_red
2,blue
2,pink
2,blue
2,base_yellow
3,base_yellow
3,blue
3,pink
3,blue
3,base_yellow
4,base_yellow
4,blue
4,green
4,blue
4,green
4,blue
4,pink
4,blue
4,base_yellow
Every time meet with base (base_yellow, base_red), it creates new group, the output that is expected as shown below, which gives a new variable:
ID,colour
1,base_yellow; blue; base_red
1,base_red; blue; pink;blue;base_yellow
2,base_yellow; blue; base_red
2,base_red; blue; pink;blue; base_yellow
3,base_yellow;blue;pinkblue;base_yellow
4,base_yellow; blue;green;blue;green;blue;pink;blue;base_yellow
This is something you might be able to adapt for your needs.
First, create a vector
vec
that includes row positions wherecolour
starts with "base".Then, you can use
map2_dfr
frompurrr
that will providecolour
that ranges from start to end positions based onvec
. This will help with situations where the samecolour
is used in more than one row in the end. A grouping variablegroup
is also created in this step.After grouping by
group
, you can keep onlycolour
groups that have more than onecolour
andstr_c
to collapse them together for the samegroup
.Output