I have a historical table that keeps track of the status of a task over time. The table looks similar to the below, where the 'ID' is unique to the task, 'Date' changes whenever an action is taken on the task, 'Factor1, Factor2, etc' are columns that contain details of the underlying task.
I want to flag on an 'ID' level, what 'Factor' columns are changing over time. Once I identify which 'Factor' columns are changing, I am planning on doing analysis to see which 'Factor' columns are changing the most, the least, etc.
I am looking to:
- Sort by 'Date' ascending
- Groupby 'ID'
- Loop through each column that has 'Factor' in the column name and for each column, identify if the 'Factor' data changed by looping through each row for each ID
- Create a new column for each 'Factor' column to flag if the underlying factor row changed overtime for that specific ID
Python code for sample data:
import pandas as pd
data = [[1,'12/12/2021','A',500],[2,'10/20/2021','D',200],[3,'7/2/2022','E',300],
[1,'5/2/2022','B',500],[1,'8/2/2022','B',500],[3,'10/2/2022','C',200],
[2,'1/5/2022','D',200]]
df = pd.DataFrame(data, columns=['ID', 'Date','Factor1','Factor2'])
My desired output is this:
OUTPUT: