For every item i in a grocery store, a set si is used to
represent the IDs of transactions in which i is purchased. Assume
that the data set to be analyzed contains hundreds of thousands of
such transactions.
1. In order to analyze the proximity between any two of these
sets si and sj , which measure, Jaccard or Hamming, would be more
appropriate and why?
2. In order to analyze the proximity between any two of these
sets si and sj for items i and j that are often brought together
(example: milk, bread), which measure, Jaccard or Hamming, would be
more appropriate and why ?
For every item i in a grocery store, a set si is used to represent the IDs of transactions in which i is purchased. Assu
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am