I have a list of sensor measurements for air quality with geo-coordinates, and I would like to implement outlier detection. The list of sensors is relatively small (~50).
The air quality can gradually change with the distance, but abrupt local spikes are likely outliers. If one sensor in the group of closely located sensors shows a higher value it could be an outlier. If the same higher value is shown by more distant sensors it might be OK.
Of course, I can ignore coordinates and do simple outlier detection assuming the normal distribution, but I was hoping to do something more sophisticated. What would be a good statistical way to model this and implement outlier detection?
The above statement,
("If one sensor in the group of closely located sensors shows a higher value it could be an outlier. If the same higher value is shown by more distant sensors it might be OK.")
, would indicate that sensors that are closer to each other tend to have values that are more alike.Tobler’s first law of geography - “everything is related to everything else, but near things are more related than distant things”
You can quantify an answer to this question. The focus is should not be on the location and values from outlier sensors. Use global
spatial autocorrelation
to answer the degree to which sensors that are near each other tend to be more alike.As a start, you will first need to define
neighbors
for each sensor.