I'm new to Python and I'm hoping someone can help me understand my problem.
In fact, I'm working on data (count, min, max, sdt) extracted from a one-year LAI (Leaf Index Area) time series. My aim is to perform smoothing on the mean of the maximum LAI of my time series for segment_id (plot), using linear interpolation while applying a Savitzky-Golay filter. The results I obtained, even after adjusting the window and polyorder parameters, were not satisfactory.
import pandas as pd
import matplotlib.pyplot as plt
from scipy.signal import savgol_filter
import numpy as np
# import data
file_path = '......../LAI.csv'
df = pd.read_csv(file_path).drop(['Unnamed: 0'], axis=1)
# convert to dataframe
df = pd.DataFrame(df)
# sort by segment_id
df = df.sort_values(by=['segment_id'], ascending=True)
# convert date to datetime
df['date'] = pd.to_datetime(df['date'], format = '%Y%m%d')
df['date'] = df['date'].dt.dayofyear
# calculate mean by each segment_id
mean_max = df.groupby(['segment_id']).mean()
# Define segment_ids
segment_ids = list(range(0,1)) # plot only 2 segment_ids
# Iterate over each segment_id in segment_ids
for segment_id in segment_ids:
df_filtered = df[df['segment_id'] == segment_id].copy()
# linear interpolation
x = df_filtered['date']
y = df_filtered['mean']
# remove negative values max
y = y.mask(y < 0)
if y.isna().all(): # if all values are NaN, skip this segment_id
continue
xnew = np.linspace(40, 400)
nan_indices = y.isna()
yinterp = np.interp(xnew, x[~nan_indices], y[~nan_indices])
# savgol filter
yinterp1 = savgol_filter(yinterp, 15, 3, mode ='interp', cval = 1) # window size 15, polynomial order 3
plt.figure(figsize=(10, 5))
plt.plot(x, y, 'o',label='data')
plt.plot(xnew, yinterp1, '-', label='linear')
plt.legend(loc='best')
plt.xlabel('Day of year')
plt.ylabel('Mean of Max LAI')
plt.show()
Both images with and without smoothing:
I'd like to obtain a smoothed curve along the data points, as shown in the image, by adjusting the filter parameters:
My problem fixed it's liked to data sorting
I just changed this x = df_filtered['date'].sort_index() y = df_filtered['mean'].sort_index()/1000