Consider again the TopUniversities data used in class. In addition to the existing attributes, U.S. News & World Report
Posted: Tue Jul 05, 2022 9:57 am
Consider againthe TopUniversities data used in class.In addition to the existing attributes, U.S. News &World Report also provided rankings for the 25universities. The rank order is the same as the position of theuniversity in the dataset, e.g., Harvard is ranked #1, Princeton#2, …, and Texas A&M #25 (see the list on the horizontal axisin the last screen in Question 29). The first output screen belowis generated by Weka’s SVR algorithm (SMOreg), using theuniversity’s rank as the target attribute. Then, we replaced thenumeric ranking attribute with a 2-class attribute by grouping thefirst 15 universities to class A and the remaining 10 universitiesto class B. Based on this grouped dataset, the second output screenis generated by Weka’s SVM algorithm (SMO) and the third outputscreen is generated by Weka’s decision tree algorithm (J48). Answerquestions (a), (b), (c) and (d) following the output screens.
Weka Explorer Preprocess Classify Cluster Associate Select attributes Visualize Classifier Test options Choose SMOreg -C 1.0-N 0-I "weka.classifiers.functions.supportVector.RegSMOImproved -T 0.001 -V-P 1.0E-12-L 0.001 -W Use training set Supplied test set Set.... Cross-validation Folds. 10 Percentage split % 66 (Num) Rank More options... Start Result list (right-click for options) 12:20:59 -functions.SMOreg Status OK Stop Classifier output Attributes: Test mode: SMOreg +++ 7 AvgSAT === Classifier model (full training set) PctTop10Student PctAccept StuFacRatio. Expenses GradRate weights (not support vectors): === Rank 10-fold cross-validation === 0.5489 (normalized) AvgSAT 0.1014 0.3414 0.2047 * (normalized) StuFacRatio 0.3871 (normalized) Expenses 0.2656 * (normalized) GradRate. 1.0324 Number of kernel evaluations: 325 (93.169% cached) Time taken to build model: 0 seconds (normalized) PctTop10Student (normalized) PctAccept === Cross-validation === Summary === Correlation coefficient Mean absolute error Root mean squared error 0.8499 3.4401 4.2907 O Log x XO
Weka Explorer Preprocess Classify Cluster Associate Select attributes Visualize Classifier Test options Choose SMO-C 1.5-L 0.001 -P 1.0E-12-N 0-V-1-W1 -K"weka.classifiers.functions.supportVector.PolyKernel -E 1.0-C 250007" -ca Use training set O Supplied test set Set.... Cross-validation Folds. 10 O Percentage split % 66 More options... (Nom) Class Start Result list (right-click for options) 12:38:10-functions.SMO 12:41:13-functions.SMO Stop Status OK Classifier output Attributes: Test mode: SMO 7 AvgSAT PctTop10Student PctAccept StuFacRatio === Classifier model (full training set) === + Expenses GradRate Class 10-fold cross-validation Kernel used: Linear Kernel: K(x, y) =<x, y> Classifier for classes: A, B BinarySMO Machine linear: showing attribute weights, not support vectors. -1.0853 (normalized) AvgSAT -0.6185* (normalized) PctTop10Student 1.2777 (normalized) PctAccept. 0.589 * (normalized) StuFacRatio -1.7641 (normalized) Expenses * * -1.1692 (normalized) GradRate 1.4822 * Number of kernel evaluations: 48 (56.757% cached) Time taken to build model: 0 seconds. === Stratified cross-validation === === Summary === Correctly Classified Instances Incorrectly Classified Instances. OO 21 4 84 16 Log X . XO
Weka Explorer Preprocess Classify Cluster Associate Select attributes Visualize Classifier Choose J48-C 0.25-M1 Test options Use training set Supplied test set Set.... Cross-validation Folds. 10 Percentage split % 66 (Nom) Class Start Result list (right-click for options) OK More options... 12:39:30-trees.J48 Status Stop Classifier output Attributes: Test mode: 7 AvgSAT PctTop10Student PctAccept StuFacRatio Expenses GradRate Class 10-fold cross-validation Classifier model (full training set) J48 pruned tree AvgSAT <= 12.6 1 Expenses <= 25.026: B (10.0) I Expenses > 25.026: A (1.0) AvgSAT > 12.6: A (14.0) === Number of Leaves. : Size of the tree : 3 Time taken to build model: 0 seconds. 5 === Stratified cross-validation Summary === Correctly Classified Instances. Incorrectly Classified Instances. === 21 4 84 16 to Log x Х0
Weka Explorer Preprocess Classify Cluster Associate Select attributes Visualize Classifier Test options Choose SMOreg -C 1.0-N 0-I "weka.classifiers.functions.supportVector.RegSMOImproved -T 0.001 -V-P 1.0E-12-L 0.001 -W Use training set Supplied test set Set.... Cross-validation Folds. 10 Percentage split % 66 (Num) Rank More options... Start Result list (right-click for options) 12:20:59 -functions.SMOreg Status OK Stop Classifier output Attributes: Test mode: SMOreg +++ 7 AvgSAT === Classifier model (full training set) PctTop10Student PctAccept StuFacRatio. Expenses GradRate weights (not support vectors): === Rank 10-fold cross-validation === 0.5489 (normalized) AvgSAT 0.1014 0.3414 0.2047 * (normalized) StuFacRatio 0.3871 (normalized) Expenses 0.2656 * (normalized) GradRate. 1.0324 Number of kernel evaluations: 325 (93.169% cached) Time taken to build model: 0 seconds (normalized) PctTop10Student (normalized) PctAccept === Cross-validation === Summary === Correlation coefficient Mean absolute error Root mean squared error 0.8499 3.4401 4.2907 O Log x XO
Weka Explorer Preprocess Classify Cluster Associate Select attributes Visualize Classifier Test options Choose SMO-C 1.5-L 0.001 -P 1.0E-12-N 0-V-1-W1 -K"weka.classifiers.functions.supportVector.PolyKernel -E 1.0-C 250007" -ca Use training set O Supplied test set Set.... Cross-validation Folds. 10 O Percentage split % 66 More options... (Nom) Class Start Result list (right-click for options) 12:38:10-functions.SMO 12:41:13-functions.SMO Stop Status OK Classifier output Attributes: Test mode: SMO 7 AvgSAT PctTop10Student PctAccept StuFacRatio === Classifier model (full training set) === + Expenses GradRate Class 10-fold cross-validation Kernel used: Linear Kernel: K(x, y) =<x, y> Classifier for classes: A, B BinarySMO Machine linear: showing attribute weights, not support vectors. -1.0853 (normalized) AvgSAT -0.6185* (normalized) PctTop10Student 1.2777 (normalized) PctAccept. 0.589 * (normalized) StuFacRatio -1.7641 (normalized) Expenses * * -1.1692 (normalized) GradRate 1.4822 * Number of kernel evaluations: 48 (56.757% cached) Time taken to build model: 0 seconds. === Stratified cross-validation === === Summary === Correctly Classified Instances Incorrectly Classified Instances. OO 21 4 84 16 Log X . XO
Weka Explorer Preprocess Classify Cluster Associate Select attributes Visualize Classifier Choose J48-C 0.25-M1 Test options Use training set Supplied test set Set.... Cross-validation Folds. 10 Percentage split % 66 (Nom) Class Start Result list (right-click for options) OK More options... 12:39:30-trees.J48 Status Stop Classifier output Attributes: Test mode: 7 AvgSAT PctTop10Student PctAccept StuFacRatio Expenses GradRate Class 10-fold cross-validation Classifier model (full training set) J48 pruned tree AvgSAT <= 12.6 1 Expenses <= 25.026: B (10.0) I Expenses > 25.026: A (1.0) AvgSAT > 12.6: A (14.0) === Number of Leaves. : Size of the tree : 3 Time taken to build model: 0 seconds. 5 === Stratified cross-validation Summary === Correctly Classified Instances. Incorrectly Classified Instances. === 21 4 84 16 to Log x Х0