Numpy: Comparing two data sets for fitness

765 Views Asked by At

I'm drawing a blank on this.

I have two data sets:

d1 = [(x1,y1), (x2,y2)...] 
d2 = [(x1,y1), (x2,y2)...]

I would like to get some type of statistical value, maybe something like an r-value, that tells me how well d2 fits to d1.

1

There are 1 best solutions below

0
On

It dependents on what are those two vectors. you may want to be more specific.

If they are something like X-Y coordinates in Cartesian system, distance correlation is probably the most appropriate (http://en.wikipedia.org/wiki/Distance_correlation#Alternative_formulation:_Brownian_covariance).

If the x values are the same and d1 has the expected y under each x values based on a certain model (i.e. a linear model) and d2 has the observed y values, then Pearson's r may be a good choose scipy.stats.pearsonr (http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient).

If both d1 and d2 are relative frequency data (observed y count of events of value x), then some type of goodness of fit test may be the right direction to go. scipy.stats.chisquare, scipy.stats.chi2_contingency, scipy.stats.ks_2samp, to name a few.