Page 1 of 1

In Rstudio, you will generate simulatted data and then perform PCA and k-means clustering on the data. First run the fol

Posted: Fri Nov 26, 2021 8:33 am
by answerhappygod
In Rstudio, you will generate simulatted data and then perform
PCA and k-means clustering on the data. First run the following to
obtain the data.
library(mvtnorm)
n <- 20
p <- 10
x <- rmvnorm(n*3, rep(0, p))
# shift means
x[seq_len(n), ] <- x[seq_len(n), ] +
matrix(rep(runif(p, min = 1, max = 3), n), nrow = n, byrow =
TRUE)
x[seq_len(n) + 2*n, ] <- x[seq_len(n) + 2*n, ] +
matrix(rep(runif(p, min = -3, max = -1), n), nrow = n, byrow =
TRUE)
# add class labels
y <- c(rep("-1", n), rep("0", n), rep("1",
n))
a) Perform PCA on the 60 observations and plot the first
two principal comonent score vectors. Use a different color to
indicate the observations in each of the true classes (`y`).
b) Perform K means clustering of the observations with K =
3. How well do the clusters you obtained in k-means clustering
compare to the true class labels? (**Hint:** `table()` may be
useful here.)