4. In this question, we study the effect of rescaling in clustering through a simulation study. (a) Run the following R
Posted: Wed May 11, 2022 12:00 pm
4. In this question, we study the effect of rescaling in clustering through a simulation study. (a) Run the following R codes to generate a dataset following a Gaussian mixture: = set.seed (2022) n = 500 y = rbinom(n,1, prob 0.6) x1 = rnorm(n,mean 0, sd = x2 = rnorm(n,mean = 0, sd = X = cbind (x1,x2) 1)* y + 2)* y + rnorm(n,mean = rnorm (n,mean 3, sd = 3, sd = 1) * (1-y) 2)* (1-y) Here, y indicates the true group label. y Present a scatter plot of X so that different groups are illustrated with different colors.
(b) For this part, perform the K-means algorithm to the dataset X (with 2 clusters). Please report the Rand index between the cluster labels obtained by the K-means algorithm and the true label y. (c) In this part, please first rescale the data using the function scale, and then perform the K-means algorithm. Please report the Rand index between the cluster label by the clustering algorithm and y. Does the method in this part perform better than that of part (b), based on the Rand index? (d) In this part, repeat the analysis in part (a) – (c) for 100 times by Monte Carlo simulation (note that you need remove the setseed to get random numbers each time). Does your findings in part (c) still hold?
(b) For this part, perform the K-means algorithm to the dataset X (with 2 clusters). Please report the Rand index between the cluster labels obtained by the K-means algorithm and the true label y. (c) In this part, please first rescale the data using the function scale, and then perform the K-means algorithm. Please report the Rand index between the cluster label by the clustering algorithm and y. Does the method in this part perform better than that of part (b), based on the Rand index? (d) In this part, repeat the analysis in part (a) – (c) for 100 times by Monte Carlo simulation (note that you need remove the setseed to get random numbers each time). Does your findings in part (c) still hold?