Predicting housing median prices. The file BostonHousing.xls
contains information on over 500 census tracts in Boston, where for
each tract 14 variables are recorded. The last column (CAT.MEDV)
was derived from MEDV, such that it obtains the value 1 if
MEDV>30 and 0 otherwise. Consider the goal of predicting the
median value(MEDV) of a tract, given the information in the first
13 columns.
a. Perform a k-NN prediction with all 13 predictors (ignore the
CAT.MEDV column), trying values of k from 1 to 5. Make sure to
normalize the data (click “normalize input data"). What is the best
k chosen? What does it mean?
b. Predict the MEDV for a tract with the following information,
using the best k:
c. Why is the error of the training data zero?
d. Why is the validation data error overly optimistic compared
to the error rate when applying this k-NN predictor to new
data?
e. If the purpose is to predict MEDV for several thousands of
new tracts, what would be the disadvantage of using k-NN
prediction? List the operations that the algorithm goes through in
order to produce each prediction.
Predicting housing median prices. The file BostonHousing.xls contains information on over 500 census tracts in Boston, w
-
answerhappygod
- Site Admin
- Posts: 899604
- Joined: Mon Aug 02, 2021 8:13 am
Predicting housing median prices. The file BostonHousing.xls contains information on over 500 census tracts in Boston, w
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!