- A For Binary Data The Li Distance Corresponds To The Hamming Distance That Is The Number Of Bits That Are Different 1 (74.04 KiB) Viewed 35 times
(a) For binary data, the Li distance corresponds to the Hamming distance; that is, the number of bits that are different
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am
(a) For binary data, the Li distance corresponds to the Hamming distance; that is, the number of bits that are different
(a) For binary data, the Li distance corresponds to the Hamming distance; that is, the number of bits that are different between two binary vectors. The Jaccard similarity is a measure of the similarity between two binary vectors. Compute the Hamming distance and the Jaccard similarity between the following two binary vectors. x=0101010001 y=0000011001 (b) Which approach, Jaccard or Hamming distance, is more similar to the Simple Matching Coefficient, and which approach is more similar to the cosine measure? Explain. (Note: The Hamming measure is a distance, while the other three measures are similarities, but don't let this confuse you.) (C) Suppose that you are comparing how similar two organisms of different species are in terms of the number of genes they share. Describe which measure, Hamming or Jaccard, you think would be more appropriate for comparing the genetic makeup of two organisms. Explain. (Assume that each animal is represented as a binary vector, where each attribute is 1 if a particular gene is present in the organism and 0 otherwise.) (d) If you wanted to compare the genetic makeup of two organisms of the same species, e.g., two human beings, would you use the Hamming distance, the Jaccard coefficient, or a different measure of similarity or distance? Explain. (Note that two human beings share > 99.9% of the same genes.)