I want to calculate the modulus by linear fitting of the strain-stress curve. However, since the pressure data obtained scatters a lot, sometimes the fitting results are not good. I think two things can improve the situation. First, some extremely large or small values need to be ignored, Second, maybe the data of pressure can be processed in some way before the fitting. For example, maybe a smooth or running average could help.
The following is the Python code I used for linear fitting.
import sys
import numpy as np
from scipy.optimize import curve_fit
from pathlib import Path
import os
def f(x, a, b):
return a + b * x
def get_modulus(BoxValue, PressureValue, strain_cutoff = 0.012):
strain_all = np.array(BoxValue / BoxValue[0] - 1)
stress_all = np.array(- PressureValue / 10000)
strain = strain_all[strain_all <= strain_cutoff]
stress = stress_all[strain_all <= strain_cutoff]
popt, pcov = curve_fit(f, strain, stress)
# modulus = popt[1]
# modulus_std_dev = np.sqrt(np.diag(pcov))[1]
return popt[1], np.sqrt(np.diag(pcov))[1]
BoxValue = np.loadtxt("box-xx_0water_150peg.dat"))[:,1]
PressureValue = np.loadtxt(f"pres-xx_0water_150peg"))[:,1]
modulus, modulus_std = get_modulus(BoxValue, PressureValue)
Two sets of data can be downloaded here.
The first one: box data, and pressure data
The second one: box data, and pressure data
The pressure data of the second one has an abnormal point on the 180th line, which I think should be excluded from the fitting.
Could you please tell me what is the best practice to do this and provide the code as well?
Any suggestions or comments are welcome.
TL; DR
Your dataset at too far away from linearity to be handled uniquely by Robust Regression. You definitely needs dataset pre-processing before being able to regress modulus.
Robust regression
First we process your data:
Then we perform Robust Linear Regression on it:
First dataset returns (probably not linear at all):
Second dataset returns (multiple setup):
Where the strong outlier does not affect the regression:
But if we zoom, we detect at least two different behaviours:
Conclusions