I am slightly confused by what the following code returns for X and y:
from sklearn import datasets
X, y = datasets.load_iris(return_X_y=True)
I am seeing that print(X) gives the iris -data of shape 150x4, which seems correct. However, I am trying to understand what print(y) exactly gives - it simply returns this vector:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
I assume that 0,1 and 2 refer to classes in the iris data that correspond to the class labels, i.e. 'setosa', 'versicolor' and 'virginica'. Am I correct? Could someone elaborate on this and perhaps make it slightly more intuitive?
Broadly speaking there's two types of datasets -- for regression and classification. Here you have classification where the
Xare the predictors andyare the group memberships.Output:
As you can see from the comments as well that
Setosacorresponds to0and so on.