5. Consider a file of N records arranged according to the Zipf distribution, where the probability of choosing the recor
Posted: Thu May 05, 2022 9:08 pm
5. Consider a file of N records arranged according to the Zipf distribution, where the probability of choosing the record in position k is inversely proportional to its position: C Pk = k k = 1, 2, ..., N i.e. the probability that the record in position k being the required record in a search is given by pk, and C is a normalizing constant. Thus, the records at the beginning of the file has higher chance of being chosen. (a) Let Xy be the (random) number of comparisons to locate a given record present in the file of N records. Derive an expression for the average E(XN) for the Zipf distribution (the expression should not include the constant C). (5 marks) (b) Compare the search performance of the Zipf distribution against the uniform distribution (i.e. pk =1/N) for N = 10. (Note: since N is not large here, the exact formula should be used). (3 marks) (c) Determine the file size N* where E(Xy+) is the same for both the Zipf and the uniform distribution. (3 marks) (d) Consider now the file of N records are now arranged in ascending order of the primary key, and assuming the required record is not present in the file. What is the approximate average number of comparisons required to conclude the search? (4 marks) (e) Consider the file of N records are now arranged using the heap organization where records are not arranged in any particular order. Determine, under the uniform distribution, the average number of comparisons required to conclude the search for (i) when the required record is present in the file, and (ii) when the required record is not present in the file.