I have the following problem: I have a dataframe with several columns. (See below) I am trying to fill in missing values. Concretely, I only want to fill in values when I have a datapoint before and one after the missing value and when they are equal. Note, I also have different IDs, so I want to do this for each ID. In the example below for example I would like to fill in a 1 for row 2 in year 2017 and in the penultimate row for ID 2 I would like to fill in a 2 (in both cases I have the same value "surrounding" the missing value). I do not want to fill in the last rows' value.
ID | Year | value |
---|---|---|
1 | 2016 | 1 |
1 | 2017 | -8 |
1 | 2018 | 1 |
2 | 2016 | -8 |
2 | 2017 | 2 |
2 | 2018 | 2 |
2 | 2019 | - 8 |
2 | 2020 | 2 |
3 | 2017 | 4 |
3 | 2018 | 4 |
3 | 2019 | -9 |
I am somewhat clueless as to how to approach this problem. I have tried using group_by but I don't see a clear way to do it.
Thank you so much for the help!
Using an
if_else
andlead
andlag
you could do:Note 1: Instead of the default
NA
I used a-99
and a-999
as thedefault
forlag
andlead
. Otherwise theif_else
would replace some values with anNA
.Note 2:: In row 7 of your example data the missing value was a
- 8
. I assumed that this was a typo and replaced it by a-8
.DATA