QUESTION 1 20 points Assume that we use cosine similarity as the similarity measure. In the hierarchical agglomerative c

Post by **answerhappygod** » Tue Jul 12, 2022 8:05 am

: Question 1 20 Points Assume That We Use Cosine Similarity As The Similarity Measure In The Hierarchical Agglomerative C 1 (97.79 KiB) Viewed 45 times

: Question 1 20 Points Assume That We Use Cosine Similarity As The Similarity Measure In The Hierarchical Agglomerative C 2 (100.38 KiB) Viewed 45 times

: Question 1 20 Points Assume That We Use Cosine Similarity As The Similarity Measure In The Hierarchical Agglomerative C 3 (112.9 KiB) Viewed 45 times

: Question 1 20 Points Assume That We Use Cosine Similarity As The Similarity Measure In The Hierarchical Agglomerative C 4 (76.58 KiB) Viewed 45 times

QUESTION 1 20 points Assume that we use cosine similarity as the similarity measure. In the hierarchical agglomerative clustering (HAC), we need to define a good way to measure the similarity of two clusters. One usual way is to use the complete link similarity between two clusters. Formally, for two cluster C and C, we define T sim (CC) = min x€C YEC s(x,y) Here, s(x,y) is the cosine similarity between and x y. Given a list of clusters C₁, C₂, 1' C assume that their pairwise similarities are saved in a two dimensional array of size m². Given three clusters C₁, C, and C show that there is a way to compute sim (CUC,, C) in constant time. Note that we ignore the dimensionality in time complexity. k' For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac). BIUS Paragraph m V Open Sans,s... 10pt D Save Answer T
QUESTION 3 Below is the PageRank algorithm for computing page ranks for web pages. Lets be the total set of pages. Let Vp @ S: E(p) =a/ISI Initialize VpE S: R (p) = 1/IST Until ranks do not change much for each p € S R'(p) = Σ c=1/ΣR' (p) pos for each p€ S R(p) N 9 R(p) = cxR'(p) + E(p) Show that there is a concrete matrix such that the PageRank vector of all the web pages is the principal eigenvector of the matrix. Click Save and Submit to save and submit. Click Save All Answers to save all answers. 20 points
P QUESTION 4 20 points Save Answer Text categorization is a supervised learning process. Let p be the document collection, cbe a set of categories. A training set T of document vector and categorization pairs is given, T={(d,,c(d)) 11sism). The naive Bayesian algorithm needs to compute the conditional probability P(clx). for any document * = (x, Xp² P(clx). For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac). BIUS Paragraph QSC *) ED for any e C. Assume that all index terms are mutually independent, explain how you may be able to compute Open Sans, s... V 10pt Click Save and Submit to save and submit. Click Save All Answers to save all answers. 3 ¶ ¶< 0 WORDS POWERED BY TINY V A Save All Answers I
Question Completion Status: QUESTION 5 Given a collection D of documents. For any keyword (or index term) w. the document frequency df is the number of documents in D that contain W We sort all keywords in decreasing order of their document frequencies. Let r denote the rank, i.e., the position ofw in the sorted list. Assume that we have the following Zipf's Law: W A Here, A is constant. Suppose that there are y distinct keywords. Under the above Zipf's Law, what is the size of the inverted indices for D Note: You shall estimate the total number of nodes in the inverted indices. For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac). B IU S Paragraph Q Open Sans, s... V 辦 10pt X² X₂ Te 20 points 1 +1 Z Time Expired.