7. Consider a dictionary that contains n= 300,000 words. For simplicity, let us assume that each word can be represented

Business, Finance, Economics, Accounting, Operations Management, Computer Science, Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Algebra, Precalculus, Statistics and Probabilty, Advanced Math, Physics, Chemistry, Biology, Nursing, Psychology, Certifications, Tests, Prep, and more.
Post Reply
answerhappygod
Site Admin
Posts: 899604
Joined: Mon Aug 02, 2021 8:13 am

7. Consider a dictionary that contains n= 300,000 words. For simplicity, let us assume that each word can be represented

Post by answerhappygod »

7 Consider A Dictionary That Contains N 300 000 Words For Simplicity Let Us Assume That Each Word Can Be Represented 1
7 Consider A Dictionary That Contains N 300 000 Words For Simplicity Let Us Assume That Each Word Can Be Represented 1 (233.35 KiB) Viewed 45 times
7. Consider a dictionary that contains n= 300,000 words. For simplicity, let us assume that each word can be represented by a 20-byte character string. We are implementing a spell checker which, given a piece of text document, examines each word in the document and returns those that cannot be found in the dictionary. We consider two options: Scheme 1: Store all the 300,000 words in a hash table. Given a word v in a document, we search the hash table for v. If v is not found, we flag v as a misspelling. Scheme 2: Use a "signature file”. A signature file is an array T of m 0/1 bits. In our application, given a hash function h, the bit T[h(w)] is set to 'l' for each word w in the dictionary. All other entries in T are set to 0. Given a word v in a document, we look up T[h(u)]. If the bit is ‘O', we flag v as a misspelling. Answer the following questions: (a) (6%) For Scheme 1, assume open addressing is used and a hash table size of 400,031 entries, how much memory is needed? Also, assume that a simple uniform hash function is used, what is the average number of probes each (i) successful search and (ii) unsuccessful search requires? (b) (3%) Explain why Scheme 2 is not a correct implementation of a spell checker. (c) (5%) Assume that the signature file uses the same amount of memory that Scheme 1 with 400,031 hash table entries uses (i.e., your answer for part (a)). What is the value of m of the signature file? What is the probability that Scheme 2 makes an error? (Hint: you may assume that (1 – 2)" ~1 - nx for small x and large n.) (d) (2%) In your opinion, is Scheme 2 a reasonable option? Why?
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!
Post Reply