I need help with parts b, c and d in the question below. My answer to part (a) is attached in the bottom figure below th
Posted: Mon Jun 06, 2022 6:24 pm
I need help with parts b, c and d in the question below. My
answer to part (a) is attached in the bottom figure below the
question. Thanks
In the following table, we have 5 instances with 3 attributes Suburb, Area, New, a Class Label. Each row is showing an instance. (N.B. Calculations up to two decimal points) Suburb Area New Class 1 S1 Large N 1 2 S2 Large N 1 3 S3 Large Y 1 4 S4 Large Y 2 5 S5 Medium Y 2 6 S6 Large Y 3 7 S4 Large Y 3 8 S7 Small N 3 (a) Calculate the information gain and gain ratio of "New" feature on the dataset. [7 marks] (N.B. use log₂ to compute the results of each step to get full marks.) (b) Does a decision tree exist, which can perfectly classify the given instances? If yes, draw that decision tree, otherwise, explain why not, by referring to the data. [2 marks] (c) If we use "Area" to build a decision stump, what is the the predicted label of decision stump for each of the 8 instances in the data set? [4 marks] (d) If we use "Suburb" to build a decision stump, what would you expect to see for the accuracy of the decision stump given an evaluation dataset that you have not seen before? Explain why the stump has good/bad accuracy. [2 marks]
ܕܐ [ Q/_ H (had) = - = 4, ²+ = log 3 + 1 kg, 3 ) = 1.5 Q6/a) 8 1·5 H(N) = -√ == /₂₁ = + = log₂ = +0] = 0.918 log₂ to H (Y) = − 1 = log₂ = + = = log₂ == + = log₂ ²3 ) = 1.52 mean info (New) = P(X) H(Y) + P(N) H(N) = 동 X 1:52 + 롱 x 0.918 = 1.29 H (Root) - mean info (New) = 1.5 -1.29 = 0.21 SI = - P(N) log₁ P(N) + P(Y) log₂ (P(X)) = -( / / log₂ = = + = log₂ & T = 0.95 Grain Ratio = = 0-21 = 0.22 0.95 14 ST New N N MY y Y Y Y N Class 1 1 2 2 3 3 3
answer to part (a) is attached in the bottom figure below the
question. Thanks
In the following table, we have 5 instances with 3 attributes Suburb, Area, New, a Class Label. Each row is showing an instance. (N.B. Calculations up to two decimal points) Suburb Area New Class 1 S1 Large N 1 2 S2 Large N 1 3 S3 Large Y 1 4 S4 Large Y 2 5 S5 Medium Y 2 6 S6 Large Y 3 7 S4 Large Y 3 8 S7 Small N 3 (a) Calculate the information gain and gain ratio of "New" feature on the dataset. [7 marks] (N.B. use log₂ to compute the results of each step to get full marks.) (b) Does a decision tree exist, which can perfectly classify the given instances? If yes, draw that decision tree, otherwise, explain why not, by referring to the data. [2 marks] (c) If we use "Area" to build a decision stump, what is the the predicted label of decision stump for each of the 8 instances in the data set? [4 marks] (d) If we use "Suburb" to build a decision stump, what would you expect to see for the accuracy of the decision stump given an evaluation dataset that you have not seen before? Explain why the stump has good/bad accuracy. [2 marks]
ܕܐ [ Q/_ H (had) = - = 4, ²+ = log 3 + 1 kg, 3 ) = 1.5 Q6/a) 8 1·5 H(N) = -√ == /₂₁ = + = log₂ = +0] = 0.918 log₂ to H (Y) = − 1 = log₂ = + = = log₂ == + = log₂ ²3 ) = 1.52 mean info (New) = P(X) H(Y) + P(N) H(N) = 동 X 1:52 + 롱 x 0.918 = 1.29 H (Root) - mean info (New) = 1.5 -1.29 = 0.21 SI = - P(N) log₁ P(N) + P(Y) log₂ (P(X)) = -( / / log₂ = = + = log₂ & T = 0.95 Grain Ratio = = 0-21 = 0.22 0.95 14 ST New N N MY y Y Y Y N Class 1 1 2 2 3 3 3