I am looking to compute null values 'inside' a dataframe. Basically, each of the boundary 'cells' of this dataframe contain a value, and all the interior values are null.
So I want to fill these null values by summing the surrounding 4 cells and dividing by 4, such that the value at any given cell is h i,j = (1/4)(h i-1,j + h i+1,j + h i, j-1 + h i, j+1).
Col0 | Col1 | Col2 | Col4 |
---|---|---|---|
100 | 95 | 90 | 85 |
95 | NaN | NaN | 80 |
90 | NaN | NaN | 75 |
85 | NaN | NaN | 70 |
80 | NaN | NaN | 65 |
75 | 70 | 65 | 60 |
I am unsure how to iterate over this dataset and apply the above formula.
My expected output, based on my Excel version of this:
Col0 | Col1 | Col2 | Col4 |
---|---|---|---|
100 | 95 | 90 | 85 |
95 | 90 | 85 | 80 |
90 | 85 | 80 | 75 |
85 | 80 | 75 | 70 |
80 | 75 | 70 | 65 |
75 | 70 | 65 | 60 |
My initial idea was to to use the following loop:
for i in df:
i.fillna(
(i[:, :, 1:] + i[:, :, :-1] + i[:, :-1, :] + i[:, 1:, :])/4, inplace=True
)
I.e. fill each NaN value with the sum of the four surrounding cells divided by four.
But this doesn't work, it just returns 'cannot unpack non-iterable int object'
Does anyone have an idea of how I can (a) Correctly develop a formula to access all surrounding cell values; and (b) How to actually apply this to calculating these values?
I can do this straightforwardly in Excel which allows you to iterate this type of calculation relatively easily, but I am struggling to conceptually transfer it to Python.
I tried the above code, but it doesn't work and I can't apply my conceptual understanding to Python well.
Given the input csv as below:
And given the desired output csv as below:
This is the code:
This is different from your sample output because I am not sure of the logic of your sample output.