with python coding please
Question 1
We want to use a linear regression model to predict the housing
price (column 8) based on six predictors (columns 2 to 7) [5
marks]:
Data is available above
Write scripts to fit a linear regression to all data and print
the R^2. Based on the results of the fitted model, which predictors
do NOT have coefficient significantly far from zero (at a
significance level of 0.05)? use statsmodels to fit the data.
Remember to add constant for intercept.
Write scripts to use the fit model to predict the training data
and obtain the predicted y. Calculate the Pearson’s correlation
coefficient R between predicted and observed y. What is the
difference between the Pearson’s R^2 and the SSE based R^2
calculated above? use the “predict()” function in the above
statsmodels object.
we want to perform 5-fold cross-validation for linear regression
and obtain the predicted y as a validation set instead of training
set. Calculate the R^2 by Pearson’s correlation coefficient between
predicted and observed y. Use utility functions in scikit-learn.
Please don’t add constant as scikit learn will add it by default.
(from sklearn.linear_model import LinearRegression, from
sklearn.model_selection import cross_val_predict)
Please visualize the observed y (x-axis) and predicted y
(y-axis) from above by using function seaborn.regplot().
If one predictor can be non-linearly transformed, e.g., by
logarithm, which one you would like to transform to improve the
prediction. What is the new R^2 between the observed and predicted
y, based on 5-fold cross-validation as above? Visualize the
distribution of the predictor and outcome with
seaborn.pairplot(df_reg, y_vars="Y house price of unit area")
No X1 transaction date X2 house age X3 distance to the nearest MRT station X4 number of convenience stores 2012.917 32 84.87882 2 2012.917 19.5 306.5947 3 2013.583 13.3 561.9845 4 2013.5 13.3 561.9845 5 2012.833 5 390.5684 6 2012.667 7.1 2175.03 7 2012.667 34.5 623.4731 2012 417 202 207 6025 WWO X5 latitude X6 longitude Y house price of unit area 10 24.98298 121.54024 37.9 9 24.98034 121.53951 42.2 5 24.98746 121.54391 47.3 5 24.98746 121.54391 54.8 5 24.97937 121.54245 43.1 3 24.96305 121.51254 7 24.97933 121.53642 40.3 12151220 மமம 32.1 6. 5 0004)
with python coding please Question 1 We want to use a linear regression model to predict the housing price (column 8) ba
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am