I have the following Pandas data frame (called df).
+--------+--------+------+--------+
| Person | Animal | Year | Number |
+--------+--------+------+--------+
| John | Dogs | 2000 | 2 |
| John | Dogs | 2001 | 2 |
| John | Dogs | 2002 | 2 |
| John | Dogs | 2003 | 2 |
| John | Dogs | 2004 | 2 |
| John | Dogs | 2005 | 2 |
| John | Cats | 2000 | 1 |
| John | Cats | 2001 | NaN |
| John | Cats | 2002 | NaN |
| John | Cats | 2003 | 4 |
| John | Cats | 2004 | 5 |
| John | Cats | 2005 | 6 |
| Peter | Dogs | 2000 | NaN |
| Peter | Dogs | 2001 | 1 |
| Peter | Dogs | 2002 | NaN |
| Peter | Dogs | 2003 | 5 |
| Peter | Dogs | 2004 | 5 |
| Peter | Dogs | 2005 | 5 |
| Peter | Cats | 2000 | NaN |
| Peter | Cats | 2001 | 4 |
| Peter | Cats | 2002 | 4 |
| Peter | Cats | 2003 | 4 |
| Peter | Cats | 2004 | 4 |
| Peter | Cats | 2005 | 4 |
+--------+--------+------+--------+
My target is to get the following, which means using the interpolate method to fill the NaN values, but based on the other column value. In other words, it should
- partition the df using the
PersonandAnimalcolumns - order by
Year(asc) - apply the interpolate method
.
+--------+--------+------+--------+
| Person | Animal | Year | Number |
+--------+--------+------+--------+
| John | Dogs | 2000 | 2 |
| John | Dogs | 2001 | 2 |
| John | Dogs | 2002 | 2 |
| John | Dogs | 2003 | 2 |
| John | Dogs | 2004 | 2 |
| John | Dogs | 2005 | 2 |
| John | Cats | 2000 | 1 |
| John | Cats | 2001 | 2 |
| John | Cats | 2002 | 3 |
| John | Cats | 2003 | 4 |
| John | Cats | 2004 | 5 |
| John | Cats | 2005 | 6 |
| Peter | Dogs | 2000 | NaN |
| Peter | Dogs | 2001 | 1 |
| Peter | Dogs | 2002 | 3 |
| Peter | Dogs | 2003 | 5 |
| Peter | Dogs | 2004 | 5 |
| Peter | Dogs | 2005 | 5 |
| Peter | Cats | 2000 | NaN |
| Peter | Cats | 2001 | 4 |
| Peter | Cats | 2002 | 4 |
| Peter | Cats | 2003 | 4 |
| Peter | Cats | 2004 | 4 |
| Peter | Cats | 2005 | 4 |
+--------+--------+------+--------+
What I have done
I can filter for each Person and each Animal and then apply the interpolate methods. Finally, merge all together, but this sounds dull and long if you have many columns.
You can try:
For multiple columns, just use the same operation: