7. Consider a dictionary that contains n= 300,000 words. For simplicity, let us assume that each word can be represented
-
answerhappygod
- Site Admin
- Posts: 899604
- Joined: Mon Aug 02, 2021 8:13 am
7. Consider a dictionary that contains n= 300,000 words. For simplicity, let us assume that each word can be represented
7. Consider a dictionary that contains n= 300,000 words. For simplicity, let us assume that each word can be represented by a 20-byte character string. We are implementing a spell checker which, given a piece of text document, examines each word in the document and returns those that cannot be found in the dictionary. We consider two options: Scheme 1: Store all the 300,000 words in a hash table. Given a word v in a document, we search the hash table for v. If v is not found, we flag v as a misspelling. Scheme 2: Use a "signature file”. A signature file is an array T of m 0/1 bits. In our application, given a hash function h, the bit T[h(w)] is set to 'l' for each word w in the dictionary. All other entries in T are set to 0. Given a word v in a document, we look up T[h(u)]. If the bit is ‘O', we flag v as a misspelling. Answer the following questions: (a) (6%) For Scheme 1, assume open addressing is used and a hash table size of 400,031 entries, how much memory is needed? Also, assume that a simple uniform hash function is used, what is the average number of probes each (i) successful search and (ii) unsuccessful search requires? (b) (3%) Explain why Scheme 2 is not a correct implementation of a spell checker. (c) (5%) Assume that the signature file uses the same amount of memory that Scheme 1 with 400,031 hash table entries uses (i.e., your answer for part (a)). What is the value of m of the signature file? What is the probability that Scheme 2 makes an error? (Hint: you may assume that (1 – 2)" ~1 - nx for small x and large n.) (d) (2%) In your opinion, is Scheme 2 a reasonable option? Why?
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!