Hi would be great to get some help on this for Python Pandas.TIA I have a dataframe which has 1M rows with below columns:
| PID | lurn_fls | locality | Comparision |
|---|---|---|---|
| ACT933 | 2 | Kambah | mbn:match both non-empty |
| ACT934 | 3F | Charwood | xne:mismatch neither empty |
| ACT935 | 3R | Glenden | mbe:match both empty |
| . | . | . | . |
| . | . | . | . |
| ACT155 | 4 | Glebe | xhe:mismatch h_empty |
I need to do a groupby "lurn_fls" and also count the different 'Comparision' column results per group, so that my result should look like below table. for eg:
| lurn_fls | mbn:match both non-empty | xhe:mismatch h_empty | xne:mismatch neither empty | Total |
|---|---|---|---|---|
| 1 | 600 | 12 | 15 | XXX |
| 2 | 700 | 10 | 14 | XXX |
| 3F | 800 | 8 | 10 | XXX |
| 3R | 900 | 6 | 12 | XXX |
| 4 | 500 | 4 | 10 | XXX |
| 5 | 400 | 2 | 14 | XXX |
IIUC, you can use
pd.crosstab:Prints: