In This Assignment You Will Conduct Knn Classification On A Dataset The Data Include Customer Demographic Information 1 (78.95 KiB) Viewed 38 times
In this assignment you will conduct KNN classification on a dataset. The data include customer demographic information (age, income, etc.), the customer's relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (Personal Loan). Let's start by loading the dataset. In [ ]: %matplotlib inline In [ ]: ► df = pd.read_excel('UniversalBank.xlsx', 'Data') df.shape from pathlib import Path import pandas as pd from sklearn import preprocessing from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.neighbors import NearestNeighbors, KNeighbors Classifier import matplotlib.pylab as plt In [ ]: df.head(1) Let's check out the proportion of the two classes in the column used as label (i.e., Personal Loan). There is no need to conduct oversampling or undersampling in this assignment. In [ ]: df[ 'Personal Loan'].value_counts() Select columns Exclude ID and ZIP Code columns. In [ ]: M Missing values Check missing values. Drop them if needed. In [ ]: M
Dummies Create dummies if any is needed. In [ ]: M In [ ]: Partitioning Partition the dataset into train and validation partitions. Use 40% for validation. There is no need to make up artificial records. ▸ Preprocessing Conduct all required preprocessing which includes (1) selecting features and (2) normalization. Use "Personal Loan' as label and the rest of the columns as predictors. At the end of this cell you should have 2 variables named trainNorm and validNorm representing train and validation partitions. Tip: create a list of features and name it features. Use it when needed instead of copy-pasting column names each time. In [ ]: M More partitioning create 4 variables train_X, train_y, valid_X, valid_y representing training features, training label, validation features, and validation label respectively. In [ ]: M Run KNN Ru KNN. Examine k values in range 1 and 15. Remeber that end index of range() function is excluded. In [ ]: N Select the best value for K Select the best value for K and write it below. Justify your selection.
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!