Fill null values in simple dataframe with sum of surrounding values

Question

Fill null values in simple dataframe with sum of surrounding values

75 Views Asked by Ross Clark At 16 June 2025 at 00:33

I am looking to compute null values 'inside' a dataframe. Basically, each of the boundary 'cells' of this dataframe contain a value, and all the interior values are null.

So I want to fill these null values by summing the surrounding 4 cells and dividing by 4, such that the value at any given cell is h i,j = (1/4)(h i-1,j + h i+1,j + h i, j-1 + h i, j+1).

Col0	Col1	Col2	Col4
100	95	90	85
95	NaN	NaN	80
90	NaN	NaN	75
85	NaN	NaN	70
80	NaN	NaN	65
75	70	65	60

I am unsure how to iterate over this dataset and apply the above formula.

My expected output, based on my Excel version of this:

Col0	Col1	Col2	Col4
100	95	90	85
95	90	85	80
90	85	80	75
85	80	75	70
80	75	70	65
75	70	65	60

My initial idea was to to use the following loop:

for i in df:
        i.fillna(
(i[:, :, 1:] + i[:, :, :-1] + i[:, :-1, :] + i[:, 1:, :])/4, inplace=True
)

I.e. fill each NaN value with the sum of the four surrounding cells divided by four.

But this doesn't work, it just returns 'cannot unpack non-iterable int object'

Does anyone have an idea of how I can (a) Correctly develop a formula to access all surrounding cell values; and (b) How to actually apply this to calculating these values?

I can do this straightforwardly in Excel which allows you to iterate this type of calculation relatively easily, but I am struggling to conceptually transfer it to Python.

I tried the above code, but it doesn't work and I can't apply my conceptual understanding to Python well.

Original Q&A

There are 1 best solutions below

**workhandle** · Answer 1

Given the input csv as below:

Col0,Col1,Col2,Col4
100,95,90,85
95,NaN,NaN,80
90,NaN,NaN,75
85,NaN,NaN,70
80,NaN,NaN,65
75,70,65,60

And given the desired output csv as below:

Col0,Col1,Col2,Col4
100.0,95.0,90.0,85.0
95.0,95.0,88.3,80.0
90.0,92.5,85.3,75.0
85.0,88.8,81.3,70.0
80.0,79.6,72.7,65.0
75.0,70.0,65.0,60.0

This is the code:

import sys

with open('input.csv', 'r') as file:
    lines1 = file.readlines()

lines2 = []

for x in range(len(lines1)):
    lines1[x] = lines1[x].strip()
    if len(lines1[x]) > 0:
        lines2.append(y.strip() for y in lines1[x].split(","))

header_list = list(lines2[0])

lines2 = [[float(x) if x != 'NaN' else None for x in inner_list] for inner_list in lines2[1:]]

row_index_exceeded = len(lines2)
column_index_exceeded = len(lines2[0])

def generate_neighbours(row_index,column_index):
    global row_index_exceeded
    global column_index_exceeded
    return_list = []
    if row_index-1 != -1:
        return_list.append([row_index-1,column_index])
    if row_index+1 != row_index_exceeded:
        return_list.append([row_index+1,column_index])
    if column_index-1 != -1:
        return_list.append([row_index,column_index-1])
    if column_index+1 != column_index_exceeded:
        return_list.append([row_index,column_index+1])
    return return_list

def process_list_of_lists(input_list):
    for row_index in range(len(input_list)):
        for column_index in range(len(input_list[row_index])):
            current_cell = input_list[row_index][column_index]
            if current_cell == None:
                neighbours = generate_neighbours(row_index,column_index)
                num_values = 0
                sum_values = 0
                for x in neighbours:
                    if input_list[x[0]][x[1]] != None:
                        num_values = num_values + 1
                        sum_values = sum_values + input_list[x[0]][x[1]]
                if num_values == 0:
                    print("Critical error. Top-left cells require values.")
                    sys.exit()
                input_list[row_index][column_index] = float(sum_values/num_values)
                return input_list

while True:
    break_inner_loop = False
    for x in range(len(lines2)):
        for y in lines2[x]:
            if (y == None) and (break_inner_loop == False):
                lines2 = process_list_of_lists(lines2)
                break_inner_loop = True
    if break_inner_loop == False:
        break

with open('output.csv', 'w') as file:
    file.write(','.join(header_list)+"\n")
    for x in lines2:
        y = [f'{num:.1f}' for num in x]
        file.write(','.join(y)+"\n")

print("Execution complete. Check output csv file.")

This is different from your sample output because I am not sure of the logic of your sample output.

Fill null values in simple dataframe with sum of surrounding values

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in NULL

Related Questions in FILLNA

Trending Questions

Popular # Hahtags

Popular Questions