From a list of URLs, I want to check for each value in complete_path if it is a subfolder of another row.
The criteria for subfolder is:
- A subfolder starts with and fully contains the URL of a parent row URL
- A subfolder has more count of the backslash \ than the parent.
Here's my pandas dataframe sample.
ID complete_path
1 Ajax
2 Ajax\991\1
3 Ajax\991
4 BVB
5 BVB\Christy
6 BVB_Christy
Here's my output sample
ID complete_path dependency
1 Ajax None
2 Ajax\991\1 1,3
3 Ajax\991 1
4 BVB None
5 BVB\Christy 4
6 BVB_Christy None
This sound like a network problem.
networkx
is helpful.Output: