4. In this question, we study the effect of rescaling in clustering through a simulation study. (a) Run the following R codes to generate a dataset following a Gaussian mixture: = set.seed (2022) n = 500 y = rbinom(n,1, prob 0.6) x1 = rnorm(n,mean 0, sd = x2 = rnorm(n,mean = 0, sd = X = cbind (x1,x2) 1)* y + 2)* y + rnorm(n,mean = rnorm (n,mean 3, sd = 3, sd = 1) * (1-y) 2)* (1-y) Here, y indicates the true group label. y Present a scatter plot of X so that different groups are illustrated with different colors.
(b) For this part, perform the K-means algorithm to the dataset X (with 2 clusters). Please report the Rand index between the cluster labels obtained by the K-means algorithm and the true label y. (c) In this part, please first rescale the data using the function scale, and then perform the K-means algorithm. Please report the Rand index between the cluster label by the clustering algorithm and y. Does the method in this part perform better than that of part (b), based on the Rand index? (d) In this part, repeat the analysis in part (a) – (c) for 100 times by Monte Carlo simulation (note that you need remove the setseed to get random numbers each time). Does your findings in part (c) still hold?
4. In this question, we study the effect of rescaling in clustering through a simulation study. (a) Run the following R
-
answerhappygod
- Site Admin
- Posts: 899604
- Joined: Mon Aug 02, 2021 8:13 am
4. In this question, we study the effect of rescaling in clustering through a simulation study. (a) Run the following R
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!