Let's say we have the following set of data. 2.33, 2.19, 4.7, 2.69, 2.8, 2.12, 3.01, 2.5, 1.98, 2.34

How do I pick the consistent data from the above sample by eliminating the outliers using JavaScript or any other mathematical method which can be implemented in JavaScript?

I approached the following way of calculating: Average value, Standard deviation, Min value (avg - std dev), Max value (avg + std dev). And considered the data which falls in the range between Min and Max values.

Are there are any better approaches we can follow to obtain accuracy?

1

There are 1 best solutions below

0
On

I don't think your approach is sufficient, you need to make sure a number is really extremely high or extremely low before deciding whether its an outlier . to achieve this we need to find Q1 and Q1 to calculate IQR which Q3 – Q1.
Q3 && Q1 are Quartiles learn more :https://www.statisticshowto.com/what-are-quartiles/ IQR is (interquartile range) learn more : https://www.statisticshowto.com/probability-and-statistics/interquartile-range/

will all of this we can check for outliers which are extremely low and high value :
extremely high value is any value that is greater than Q3 + ( 1.5* IQR )
extremely low value is any value that is lower than Q1 - ( 1.5* IQR )

so in code

// sort array ascending
const dataSet= [2, 2.5, 2.25, 4, 1, -3, 10, 20];
const asc = arr => arr.sort((a, b) => a - b);

const quartile = (arr, q) => {
    const sorted = asc(arr);
    const pos = (sorted.length - 1) * q;
    const base = Math.floor(pos);
    const rest = pos - base;
    if (sorted[base + 1] !== undefined) {
        return sorted[base] + rest * (sorted[base + 1] - sorted[base]);
    } else {
        return sorted[base];
    }
};

const Q1 = quartile(dataSet, .25);
const Q3 = quartile(dataSet, .75);
const IQR = Q3 - Q1;

let noneOutliers=[]
dataSet.forEach(number => {
    if(number > (Q3 + (1.5 * IQR)) || number < (Q1 - (1.5 * IQR))) {
        console.log('number is outlier');
    }
    else {
        noneOutliers.push(number);
    }
});

the quartile function I used is from this answer How to get median and quartiles/percentiles of an array in JavaScript (or PHP)?

for the method you can check this video https://www.youtube.com/watch?v=9aDHbRb4Bf8