have the following csv file which look like below help me do the
parts step by step so i know you answer how on each part
Wine Alcohol Malic.acid Malic.acid Ash Acl Mg Phenols Flavanoids Nonflavanoid.phenols Proanth Color.int Hue OD Proline 0 1 14.23 1.71 2.43 15.6 127 2.80 3.06 0.28 2.29 5.64 1.04 3.92 1065 1 1 13.20 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 4.38 1.05 3.40 1050 2 1 13.16 2.36 2.67 18.6 101 2.80 3.24 0.30 2.81 5.68 1.03 3.17 1185 3 1 14.37 1.95 2.50 16.8 113 3.85 3.49 0.24 2.18 7.80 0.86 3.45 1480 4 1 13.24 2.59 2.87 21.0 118 2.80 2.69 0.39 1.82 4.32 1.04 2.93 735 ... 173 3 13.71 5.65 2.45 20.5 95 1.68 0.61 0.52 1.06 7.70 0.64 1.74 740 174 3 13.40 3.91 2.48 23.0 102 1.80 0.75 0.43 1.41 7.30 0.70 1.56 750 175 3 13.27 4.28 2.26 20.0 120 1.59 0.69 0.43 1.35 10.20 0.59 1.56 835 176 3 13.17 2.59 2.37 20.0 120 1.65 0.68 0.53 1.46 9.30 0.60 1.62 840 177 3 14.13 4.10 2.74 24.5 96 2.05 0.76 0.56 1.35 9.20 0.61 1.60 560 178 rows x 14 columns
Task 1 - k Nearest Neighbours Implementation Requirements: a. Implement the K-Nearest-Neighbours algorithm. Your code should include at least the following functions: 1. read_data: reads the wine.csv dataset, which includes the results of a chemical analysis of 178 wine samples grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 different features found in each of the three types of wines. (Some additional information on the dataset can be found in the attached file wines.names). 2. split_data: takes a percentage value as a parameter, which represents the relative size of the testing set. The function should randomly split the dataset into two groups: testing and training. For example, if the dataset includes 100 data items, then the function call split_data(0.3) should return two groups of data items: one that includes 70 random selected items for training, and the other includes the other 30 items for testing. Note: You may use the Python function random sample to split the data set. 3. euclidean_distance function: measures the distance between two wines based on their attributes. 4. KNN function: takes a training set, a single wine and an integer k, and returns the k nearest neighbours of the wine in the training set. 5. A classification function that finds the type of the wine. Your function should return the type (1,2 or 3) based on the majority of its k nearest neighbours. 6. A function that returns the prediction accuracy, i.e. the percentage of the wines in the test set that were correctly identified. b. The output of your program should include: 1. For each sample in each group (training and testing) print its real type, the classifier prediction and whether the prediction was correct (true/false). For each group print the prediction accuracy. For example: sample class = 1, prediction class = 1, prediction correct: True sample class= 1, prediction class = 2, prediction correct: False Training set accuracy: 90.47619047619048 % sample class = 1, prediction class = 1, prediction correct: True sample class = 1, prediction class = 2, prediction correct: Truel Testing set accuracy: 88.76543646533220 % c. Run your algorithm using different k values. d. Plot a graph that shows the accuracy of both sets (training and testing) in respect to k. Note: To make plots, you can use the Python library matplotlib. e. Try to use a different distance function (replacing the euclidean_distance from (4.) above). Does it change the results? In what way? (Improve or worsen the accuracy). The results should be included in the report.
have the following csv file which look like below help me do the parts step by step so i know you answer how on each par
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am