My problem is that I have a dataset of our campaign like this:
| Customer | Province | District | City | Age | No. of Order |
| -------- | ------- | -------- | -----| ----| ------- |
| A | P1 | D1 | C1 | 21 | 5 |
| B | P2 | D2 | C2 | 22 | 9 |
....
And I need to find the most impactful group of customers (usually there will be >20 categorical groups). For example: "Customers from Province P1, District D1, Age 25 are the most promising group because they contributed 50% total order while being 10% of our customer base".
I'm currently using Pandas to loop through all the combinations of [2,3,4] from all my categorical features and calculate the sale proportion for each group but it is very time-consuming
I want to ask if there is already a Python package that can help to find that kind of group?
You can automate that by using Decision Trees.
Not all features may be useful. Eliminate trivial ones using PCA (principal component analysis)
You may use scikit-learn package for both of above.