Question 2 (20 marks) 2 To develop an algorithm which can identify whether a banknote is genuine or fake, data were extr

Post by **answerhappygod** » Sat Feb 26, 2022 11:17 am

: Question 2 20 Marks 2 To Develop An Algorithm Which Can Identify Whether A Banknote Is Genuine Or Fake Data Were Extr 1 (200.59 KiB) Viewed 46 times

: Question 2 20 Marks 2 To Develop An Algorithm Which Can Identify Whether A Banknote Is Genuine Or Fake Data Were Extr 2 (95.59 KiB) Viewed 46 times

Question 2 (20 marks) 2 To develop an algorithm which can identify whether a banknote is genuine or fake, data were extracted from images that were taken from genuine (G) and fake (F) banknotes. An image processing tool was then use to extract the following variables: Variable Description variance Describes how each pixel varies from the neighbouring pixels skewness A measure of the lack of symmetry entropy Amount of information which must be coded for by a compression algorithm class G = genuine, F = fake An analyst would like to use K-Means clustering to study the characteristics of the notes. a What is the value of k to be used for K-Means clustering? Briefly explain. 2 marks b 2 marks A parallel coordinates plot of the data is given below. Comment on any attributes which can help characterize the clusters. F G variance skewness entropy class

с Explain why normalization is important in K-Means clustering. 2 marks d 4 marks A small subset of the data is given below. Use Min-max normalization to fill in the blanks (A), (B), (C) and (D) below. Min-Max Normalisation ID Variance Skewness Class 1 1.635 3.286 2 3.23 7.838 G 3 3.912 2.974 G 4 3.78 -3.311 G 5 -1.6 -9.583 F 6 -3.59 -6.572 F 7 -0.878 3.257 F ID Variance Skewness Class 1 (B) 0.739 G 2 0.909 1 G 3 1 0.721 G 4 0.982 0.360 G 5 0.262 (D) F 6 0 0.173 F 7 0.361 0.737 F T) T) (i) (B) 11 = 1.635-(-3.59) (A) -9.583-(C) 17.421 (ii) (D) (A) (B) (C) (D)

Statistics and Analytics for Engineers MS2215/MS4215/MS6215 e 4 marks A scatterplot of the data in part (d) above with two clusters is given below. The cluster centroid, F is (0.208,0.303). Write down the cluster centroid for cluster G. Show your workings clearly. (Hint: Refer to table in part (d) above) 0.8 A * 0.6 0.4 A 0-2 -0.2 12 014 016 0.8 1.2 -0.2 f 6 marks Suppose a new note has the measurements variance = 2.20 and skewness = 6.00 (before standardization). Compute the Euclidean distance of the new note from each of the centroids, F and G. Which cluster is the new note likely to belong to? Explain and show your workings clearly.