Q2. Perceptrons Instead of using Naive Bayes, you decide to try applying Perceptron to the interrogation data. You gener

Post by **answerhappygod** » Wed Nov 24, 2021 10:05 am

: Q2 Perceptrons Instead Of Using Naive Bayes You Decide To Try Applying Perceptron To The Interrogation Data You Gener 1 (213.99 KiB) Viewed 104 times

Q2. Perceptrons Instead of using Naive Bayes, you decide to try applying Perceptron to the interrogation data. You generate features from the training data as follows: • 01(x) = n where "not" appears n times. • 42(X) = m where "swear" appears m times. • 3(x) = 1, a bias term We use the labels +1 for G and -1 for I. Given a weight vector w = (W1, W2, W3), our classifier returns +1 if W101(x) + W242(x) + W3 >= 0 and 1 otherwise. Our training set from part 1 yields the following features and labels: Training Statements I am definitely innocent, officer Officer, I swear I am not lying I am not lying, I swear I am innocent, officer, I swear Officer, I am definitely not lying 1 0 1 1 0 1 02 0 1 1 1 0 P3 1 1 1 1 1 Label +1 +1 +1 -1 -1 W1 1 (a) [2 pts] Compute the first two updates of the Perceptron algorithm and fill in the following table, using the given initial Perceptron weights w = (W1, W2, W3) and data points (41, 42, 43, Label). w W2 W3 Initial 2 -0.5 Observing (0,0,1, +1) Observing (1,0, 1,-1) (b) [2 pts] What convergence guarantees can you give for the Perceptron algorithm applied to this data set? (c) [2 pts) Linear classifiers are often insufficient to represent a dataset using a given set of features. However, it is often possible to find new features using nonlinear functions of our existing features which do allow linear classifiers to separate the data. Nonlinear features result in more expressive linear classifiers. For example, consider the following data set, where t's represent positive examples andt's represent negative examples. o + -1 +1 No linear classifier can separate the positive examples (-1,0) and (1, 0) from the negative example (0,0). Rather than using a single feature, if we perform a nonlinear mapping 9(x1, x2) = (x2, 1), the positive examples are both mapped to (1, 1) and the negative example is mapped to (0,1), and we see the data can be separatedby a linear classifier. One example is the line w = [1, -0.5), i.e. the classifier wło(x) = x2 - 0.5 >= 0. +1 + 1. 0 -1 -1 0 +1 For what values of the weight vector w = (W1, w2) does the classifier w'$(x) >= 0 separate the given data? (d) [3 pts) Which of the following feature sets allows linear classifier w = (W1, W2, W3) to separate the original interrogation data set? Justify your answer briefly. (i) [1 pt] 4' = (01 + 4201 - 421) (ii) [1 pt] o' = (010201) (iii) [1 pt] ' = ((1 xor 02), 02, 1) where a xor b is 1 if either a = 1 or b = 1 but not both. (e) (2 pts] Given the features p(x) = [x?, x, 1], how many data points are we guaranteed to be able to separate with zero error using a linear classifier w'$(x) = wix? + w2x + wz? Assume that a data point x cannot have conflicting labels. Justify your answer briefly. (1) [2 pts] In general, if we use features y(x) = [XN-1, xN-2,..., 1], i.e. an N – 1th order polynomial, how many points can we separate with zero error using a linear classifier w = [W1, ..., wn]? Justify your answer briefly. (g) [2 pts] Assume we have N labeled training data points, which we would like to use for classification i.e. to predict the labels of unseen test data points. What are the disadvantages of using an Nth order polynomial to fit this data?