I am reading the book "The elements of statistical learning" Ch 2 and on page 16 there is this line - 'First we generate 10 means m_k from a bivariate gaussian distribution N((1,0),I) and labelled BLUE. Similarly, 10 more were drawn from N((0,1),I) and labelled ORANGE. Then for each class we generate 100 observations,...'
I m unable to understand this paragraph. I have following questions: Q1 What does generating 10 means from a bivariate gaussian distribution implies ? How can we generate mean ? If there is any mathematical formula please do tell.
Q2 Difference between N((1,0),I) and N((0,1),I) ? Does 1st implies mean = 1 and variance = 0 and second one's vice-versa ?
I don't know about clustering yet since I thought I was going through supervised learning and clustering comes under the category of unsupervised learning. Should I learn about clustering first to understand this paragraph ?
Based on what you have said, you have two distributions: first is
N((1,0),I)
(BLUE points) and second isN((0,1),I)
(ORANGE points) wherethe identity matrix. The bivariate normal distribution has a pdf that you can draw from using any rudimentary technique such as the
MASS
package'srmvrnorm
or MCMC techniques. Since you have 10 from the first and 10 from the second if you plot the points on a 2-D graph you'll have the blue points centered around(1,0)
in a circle and the orange points around(0,1)
. Then it sounds like you have a classification task which is indeed unsupervised learning.