Consider data given in CSV file HW6DataA and the following data description: Table 1: Data Description Field Description StdID Student ID index) Statistical background Whether the student has a background in statistics. Python background The student background in python (Excellent, Good, Fair) Gender The student gender (Male or Female) Class level The student class level (Freshman, Sophomore, Junior, Senior) Weekly studying hours Average number of hours student studies per week. Previous exams Number of previous exams solved. Absences Number of absences throughout the semester Class size Number of students in the class. Mid Midterm score Project score Project score Final Final score (output variable) Note: Solve all the above questions using Python. Use Pandas, Seaborn, Sklearn, etc. libraries for all the above analysis Do the following tasks using data given in HW6DataA and Table-1: A-1: Regression. Given a regression problem along with the input columns and output column, describe the steps to build a regression model. Explain how the regression model can be used for predicting the output column values. A-2: Regularization. Discuss in detail the potential use of both Ridge and LASSO regression? How are they different from the OLS regression? A-3: Cross-Validation. In both Ridge and LASSO regression, which technique do we use to select the best value for a? A-4: Given Data. Read and display the data given in HW6DataA. Refer to Table-1 for the data description A-5: OLS Regression. Build an OLS regression model for predicting the Final score of each student. Consider the following: • All the variables except StdID, Gender, and Final shall be considered as input variables. Train the model using 70% of the data and use the rest for testing. Set random.state to 42. A-6: LASSO and Ridge. Using the same training data from OLS model (task A-5), estimate the coefficients (betas) using LASSO and Ridge regression. Obtain the best value of a among {10-3,10-2, 10-4, 109, 10", 102, 103} using 10 fold cross validation. Compare and comment on the coefficients of the three models. Compare the performance of the OLS model against LASSO and Ridge models on the testing data. A-7: SISO Regression. Using the closed form method (formula), build a SISO regression model to predict the Finale score. Use the variable with the highest regression coefficient obtained by LASSO as input variable (say, top variable). Using the corresponding testing data, compare the performance of SISO model (top variable vs Final score) with that of LASSO reported in A-6. Also, depict top variable vs Final score.
o StdID s38893 s13237 S42562 543697 S37267 s14869 Statistical ba Python backg Gender Yes Good M Yes Excellent F F NO Good M No Fair M No Fair M Yes Excellent M Class level Weekly study Previous exa Absences FR 7 4 SR 1 2 JR 7 2 JR 5 2 FR 7 3 SR 1 2 Class size Mid score Project score Final 5 12 22 5 0 12 29 10 4 17 18 6 4 13 21 6 5 11 21 5 1 18 24 9 38 50 39 32 35 50
Consider data given in CSV file HW6DataA and the following data description: Table 1: Data Description Field Description
-
answerhappygod
- Site Admin
- Posts: 899604
- Joined: Mon Aug 02, 2021 8:13 am
Consider data given in CSV file HW6DataA and the following data description: Table 1: Data Description Field Description
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!