Complex partial string matching in pandas

989 Views Asked by William At 05 March 2022 at 06:20

Given a dataframe with the following structure and values json_path -

json_path	Reporting Group	Entity/Grouping
data.attributes.total.children.[0]	Christian Family	Abraham Family
data.attributes.total.children.[0].children.[0]	Christian Family	In Estate
data.attributes.total.children.[0].children.[0].children.[0].children.[0]	Christian Family	Cash
data.attributes.total.children.[0].children.[0].children.[1].children.[0]	Christian Family	Investment Grade Fixed Income

How would I filter on the json_path rows which containchildren four times? i.e., I want to filter on index position 2-3 -

json_path	Reporting Group	Entity/Grouping
data.attributes.total.children.[0].children.[0].children.[0].children.[0]	Christian Family	Cash
data.attributes.total.children.[0].children.[0].children.[1].children.[0]	Christian Family	Investment Grade Fixed Income

I know how to obtain a partial match, however the integers in the square brackets will be inconsistent, so my instinct is telling me to somehow have logic that counts the instances of children (i.e., children appearing 4x) and using that as a basis to filter.

Any suggestions or resources on how I can achieve this?

Original Q&A

There are 1 best solutions below

Shubham Sharma On 05 March 2022 at 06:52 BEST ANSWER

As you said, a naive approach would be to count the occurrence of .children and compare the count with 4 to create boolean mask which can be used to filter the rows

df[df['json_path'].str.count(r'\.children').eq(4)]

A more robust approach would be to check for the consecutive occurrence of 4 children

df[df['json_path'].str.contains(r'(\.children\.\[\d+\]){4}')]

                                                                   json_path   Reporting Group                Entity/Grouping
2  data.attributes.total.children.[0].children.[0].children.[0].children.[0]  Christian Family                           Cash
3  data.attributes.total.children.[0].children.[0].children.[1].children.[0]  Christian Family  Investment Grade Fixed Income

Complex partial string matching in pandas

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in STRING-MATCHING

Related Questions in PARTIAL-MATCHES

Trending Questions

Popular # Hahtags

Popular Questions