scipy: interpolation, cubic & linear

Question

scipy: interpolation, cubic & linear

1.9k Views Asked by erthalion At 18 June 2015 at 10:34

I'm trying to interpolate my set of data (first columnt is the time, third columnt is the actual data):

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d

data = np.genfromtxt("data.csv", delimiter=" ")

x = data[:, 0]
y = data[:, 2]

xx = np.linspace(x.min(), x.max(), 1000)
y_smooth = interp1d(x, y)(xx)
#y_smooth = interp1d(x, y, kind="cubic")(xx)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(xx, y_smooth, "r-")

plt.show()

but I see some strange difference between linear and cubic interpolation. Here is the result for linear: enter image description here

Here is the same for cubic: enter image description here

I'm not sure, why is graph jumping all the time and y_smooth contains incorrect values?

ipdb> y_smooth_linear.max()
141.5481144
ipdb> y_smooth_cubic.max()
1.2663431888584225e+18

Can anybody explain to me, how can I change my code to achieve correct interpolation?

UPD: here is data.cvs file

Original Q&A

There are 2 best solutions below

unutbu On 18 June 2015 at 11:50

Given cfh's observation that x has duplicate values, you could use np.unique to select a unique value of y for each x:

x2, idx = np.unique(x, return_index=True)
y2 = y[idx]

return_index=True causes np.unique to return not only the unique values, x2, but also the locations, idx, of the unique xs in the original x array. Note that this selects the first value of y for each unique x.

If you'd like to average all the y values for each unique x, you could use stats.binned_statistic:

import scipy.stats as stats
x2, inv = np.unique(x, return_inverse=True)
y2, bin_edges, binnumber = stats.binned_statistic(
    x=inv, values=y, statistic='mean', bins=inv.max()+1)

return_inverse=True tells np.unique to return indices from which the original array can be reconstructed. Those indices can also serve as categorical labels or "factors", which is how they are being used in the call to binned_statistic above.

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import scipy.stats as stats

data = np.genfromtxt("data.csv", delimiter=" ")

x = data[:, 0]
y = data[:, 1]

x2, idx, inv = np.unique(x, return_index=True, return_inverse=True)
y_uniq = y[idx]
y_ave, bin_edges, binnumber = stats.binned_statistic(
    x=inv, values=y, statistic='mean', bins=inv.max()+1)

xx = np.linspace(x.min(), x.max(), 1000)
y_smooth = interp1d(x, y)(xx)
y_smooth2 = interp1d(x2, y_uniq, kind="cubic")(xx)
y_smooth3 = interp1d(x2, y_ave, kind="cubic")(xx)

fig, ax = plt.subplots(nrows=3, sharex=True)

ax[0].plot(xx, y_smooth, "r-", label='linear')
ax[1].plot(xx, y_smooth2, "b-", label='cubic (first y)')
ax[2].plot(xx, y_smooth3, "b-", label='cubic (ave y)')
ax[0].legend(loc='best')
ax[1].legend(loc='best')
ax[2].legend(loc='best')
plt.show()

enter image description here

**cfh** · Accepted Answer · 2015-06-18T11:38:33.957000

Your data contains several y values for the same x value. This violates the assumptions of most interpolation algorithms.

Either discard the rows with duplicate x values, average the y values for each individual x, or obtain a better resolution for the x values such that they aren't the same anymore.

scipy: interpolation, cubic & linear

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in SCIPY

Related Questions in INTERPOLATION

Trending Questions

Popular # Hahtags

Popular Questions