*Problem 13 - 10 points Consider a classification problem of the following kind: You are given n numeric features of an
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am
*Problem 13 - 10 points Consider a classification problem of the following kind: You are given n numeric features of an
A linear classifier consists of a n-dimensional weight vector W and and a numeric threshold T. The classifier predicts that a given entity X belongs to the category just if W.X > T. Suppose, now, that you have a data set consisting of collection D of m entities together with the associated correct labels I. Specifically, D is an m x n matrix, where each row is the feature vector for one entity. Vector I is a m-dimensional column vector, where L = 1 if the ith entity is in the category and 0 otherwise. There are a number of different ways of evaluating how well a given classifier fits a given labelled data set. The simplest is the overall accuracy which is just the fraction of the instances in the data set where the classifier gets the right answer. Overall accuracy, however, is often a very unhelpful measure. Suppose you are trying to locate pictures of camels in a large collection of images collected across the internet. Then, since images of camels constitutes only a very small fraction of the collection, you can achieve a high degree of overall accuracy simply by rejecting all the images. Clearly, this is not a useful retrieval engine. In this kind of case, the most commonly used measures are precision/recall. Let C be the set of entities which are actually in the category (i.e. labelled so by L); let R be the set of entities which the classifier predicts are in the category; and let Q=COR. Then precision is defined as Q\/|R| and recall is defined as Q1/C. In the camel example, the precision is the fraction of images that are actually camels, out of all the images that the classifier identifies as camels; and the recall is the fraction of the images of camels in the collection that the classifier accepts as camels. A. Write a MATLAB function evaluate (D,L,W,T) which takes as arguments D, L, W. and T, as described above, and which returns the overall accuracy, the precision, and the recall. For example let m = 6, n = 4, D [1 1 04 2011 2300 0 231 4020 301 3 L W = T=9 Then the classifications returned by the classifier are [1,0,0, 1, 0, 1]; the first, third, and fifth rows are correctly classified and the rest are misclassified. Thus, the accuracy is 3/6= 0.5. The precision is 1/3; of the three instances identified by the classifier, only one is correct. The recall is 1/2; of the two actual instances of the category, one is identified by the classifier.
B. Write a function evaluate2 (D,L,W,T). Here the input arguments D and L are as in part (A). W and T, however, represent a collection of q classifiers. W is a nx q matrix; T is a q-dimensional vector. For j = 1...q, the column W[:] and the value T[j] are the weight vector and threshold of a classifier. E= evaluate2 (D,L,W,T) returns a 3 xq matrix, where, for j=1...q, E[1, j], E[2.j] and E[3, j] are respectively the accuracy, precision, and recall of the jth classifier. (5) Midterm Exam - 100 points For example, let D and L be as in part (A). Let q = 2 and let 20 H D= Then evaluate 2 (D,L,W,T) returns CSCI-GA 3033.1180 Mathematical Techniques for Computer Science Instructor: Parijat Dube July 8, 2022 T = [9,2] 0.5 0.3333 0.5 0.5 0.6667 0.5 6