Ensemble of Decision Tree. Say we have a dataset shown below. There are 15 data points, with 7 in label "-" and 8 in label "+". We would like to use Decision Trees to solve this binary clas- sification problem. However, in our problem setting, each Decision Tree has access to only ONE line. That is to say, our Decision Tree would have access to only one attribute, and so has max-depth of 1. By accessing this line, the Decision Tree could know (and only know) whether the data point is on the right side of this line or the left side. (Unofficial definition: let's assume the right side of a line shares the same direction with the green normal vector of that line.) Finally, please use majority vote strategy to make classification decision at each leaf. +++ Figure 3: Decision Tree Ensemble (a) In Figure 3, if we train only 1 Decision Tree, what is the best/lowest error rate? Note that we have in total 15 data points. Round to 4 decimal places after the decimal point. Best/Lowest Error Rate (b) If we could use 2 Decision Trees in Figure 3, what is the best/lowest error rate? If we have two Decision Trees, then each would predict each data point with '+' or '-'. Then, we would like to combine these predictions as the final result. If both trees predict '+', then the result is '+'. The same with '-'. However, if one predicts '+' while one predicts '-', then we always choose '-' as the final result to break ties. Round to 4 decimal places after the decimal point. Best/Lowest Error Rate +++ +++
Now let's train 3 Decision Trees as a forest in Figure 3. What is the best/lowest error rate? The ensemble strategy is now unanimous voting. That is, if every Decision Tree agrees, then the model predicts a positive label. However, if one of them has a different answer from the other two, then we predict negative. That means, we train each Decision Tree individually, and each Decision Tree choose one unique line as its decision boundary such that it would try its best to achieve maximum accuracy. And, next, if all the Decision Trees agree, then we assign the point a positive label. Round to 4 decimal places after the decimal point. Best/Lowest Error Rate
Ensemble of Decision Tree. Say we have a dataset shown below. There are 15 data points, with 7 in label "-" and 8 in lab
-
answerhappygod
- Site Admin
- Posts: 899604
- Joined: Mon Aug 02, 2021 8:13 am
Ensemble of Decision Tree. Say we have a dataset shown below. There are 15 data points, with 7 in label "-" and 8 in lab
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!