VertexAI AutoML training: Cramer's V correlation number is above 1

14 Views Asked by At

I am training an AutoML data model on a set of power plant output data. In this case, AT is the ambient temperature (and one of the features), and the target is PE (the power output by the plant).

I am in the process of creating a VertexAI pipeline run, and in the AutoML "Training options" section, I have requested the calculation of the correlation between each feature and the target. enter image description here

The correlation given between AT and PE is 1.157. However, this is supposed to be a Cramer's V number, and therefore somewhere between 0 (no correlation) and 1 (perfect correlation). The tooltip for the correlation column says "The Cramér's V correlation statistic between this column and the target column. This ranges from zero to one, where zero indicates no correlation and one indicates perfect correlation. A low correlation suggests that the column can be excluded from the model without much performance penalty. An unusually high correlation is indicative of target leakage and/or a categorical feature with a very high cardinality relative to number of rows."

So how can the correlation value given for the AT feature be above 1? Has anyone seen this before?

I have done a very basic linear regression between AT and PE in Google Sheets and it's giving me an R2 of 89.89%, so the correlation is indeed high, but I don't know why the Cramer's V given by Google Cloud is above 1. There are no missing values for AT or PE in my data.

0

There are 0 best solutions below