Is the linear regression the best fit for this study?

26 Views Asked by At

Since I found out that there is a correlation between Timeliness and Semantic Accuracy (I'm studying linked data quality dimensions assessment, trying to evaluate a dimension quality -in this case Timeliness- from another dimension (Semantic Accuracy)), I presumed that regression analysis is the next step in this matter and I used IBM SPSS statistics for this study.

-the Semantic accuracy formula I used is: msemTriple = |G ∧ S| / |G|

msemTriple measures the extent to which the triples in the repository G (the original LOD dataset) and in the gold standard S have the same values.

-the Timeliness formula I used is:

Timeliness((de)) = 1-max⁡{1-Currency/Volatility,0}

where :

Currency((de)) = (1-(lastmodificationTime(de )-lastmodificationTime(pe ))/(currentTime-startTime))*Ratio (the Ratio measures the extent to which the triples in the the LOD dataset (in my case wikidata) and in the gold standard (wikipedia) have the same values.)

and

Volatility((de)) = (ExpiryTime(de )-InputTime(de ))/(ExpiryTime(pe )-InputTime(pe ) )

(de is the entity document of the datum in the linked data dataset and pe is the correspondent entity document in the gold standard).

NB: I worked on Covid-19 statistics per country as a dataset sample, precisely Number of cases, recoveries and deaths

this is my spss file: https://drive.google.com/file/d/1DqMqVv4JHPbo3-pAXmavuC91pMlImFlu/view?usp=drive_link

this is the output of my spss file: https://drive.google.com/file/d/1JxVf542Kq9KfxeWIqmm1deLfJv67HOUh/view?usp=drive_link

This is the normality test between the two variables : enter image description here

and this is the scatterplot: enter image description here

0

There are 0 best solutions below