(10 points] Problem 2 - FP-Growth Implementation A sample dataset has been provided to you in the './data/dataset.pickle
Posted: Wed Apr 27, 2022 3:31 pm
(10 points] Problem 2 - FP-Growth Implementation A sample dataset has been provided to you in the './data/dataset.pickle' path. Here are the attributes for the dataset. Use this dataset to test your functions. • Dataset should load the transactions in the form of a python dictionary where each key holds the transaction id and the value is a python list of the items purchased in that transaction. An example transaction will have the following structure. If items A, C, D, F are purchased in transaction T3, this would appear as follows in the dictionary. transactions = { "13": ["A", "C", "D", "F"] }
def item_support(dataset, min_support): A helper function that returns the support count of each item in the dataset. The dictionary is further sorted based on maximum support of each item and pruned based on min_support. Input: 1. dataset - A python dictionary containing the transactions. 2. items - A python list representing all the items that are part of all the transactions. 3. min_support - A floating point variable representing the min support value for the set of transactions. Output: 1. support_dict - A dictionary representing the support count of each item in the dataset. len_transactions - len (dataset) support_dict = dict() for key, value in dataset.items(): # your code here sorted_support = dict(sorted (support_dict.items(), key=lambda item: item[1], reverse=True)) pruned_support = {key:val for key, val in sorted_support.items() if val/len_transactions >= min_support} return support_dict
def reorder_transactions (dataset, min_support): A helper function that reorders the transaction items based on maximum support count. It is important that you fini the code in the previous cells since this function makes use of the support count dictionary calculated above. Input: 1. dataset - A python dictionary containing the transactions. 2. items - A python list representing all the items that are part of all the transactions. 3. min_support - A floating point variable representing the min_support value for the set of transactions. Output: 1. updated_dataset - A dictionary representing the transaction items in sorted order of their support counts. pruned_support = item_support (dataset, min_support) updated_dataset = dict() # This loop sorts the transaction items based on the item support counts for key, value in dataset.items(): updated_dataset[key] = sorted(value, key=pruned_support.get, reverse=True) # Update the following loop to remove items that do not belong to the pruned_support dictionary for key, value in updated_dataset.items(): updated_values = list() for item in value: # your code here updated_dataset[key] = updated_values return updated_dataset
def build_fp_tree(updated_dataset): Input: 1. updated_dataset - A python dictionary containing the updated set of transactions based on the pruned support Output: 1. fp_tree - A dictionary representing the fp_tree. Each node should have a count and children attribute. HINT: 1. Loop over each transaction in the dataset and make an update to the fp_tree dictionary. 2. For each loop iteration store a pointer to the previously visited node and update it's children in the next 3. Update the root pointer when you start processing items in each transaction. 4. Reset the root pointer for each transaction. Sample Tree Output: {'Y': {'count': 3, 'children': {'V': {'count': 1, "children': {}}}}, 'X': {'count': 2, children': {'R': {'count': 1, 'children': {'F': {'count': 1, "children': {}}}}}}} fp_tree = dict() for key, value in updated_dataset.items(): # your code here return fp_tree
def item_support(dataset, min_support): A helper function that returns the support count of each item in the dataset. The dictionary is further sorted based on maximum support of each item and pruned based on min_support. Input: 1. dataset - A python dictionary containing the transactions. 2. items - A python list representing all the items that are part of all the transactions. 3. min_support - A floating point variable representing the min support value for the set of transactions. Output: 1. support_dict - A dictionary representing the support count of each item in the dataset. len_transactions - len (dataset) support_dict = dict() for key, value in dataset.items(): # your code here sorted_support = dict(sorted (support_dict.items(), key=lambda item: item[1], reverse=True)) pruned_support = {key:val for key, val in sorted_support.items() if val/len_transactions >= min_support} return support_dict
def reorder_transactions (dataset, min_support): A helper function that reorders the transaction items based on maximum support count. It is important that you fini the code in the previous cells since this function makes use of the support count dictionary calculated above. Input: 1. dataset - A python dictionary containing the transactions. 2. items - A python list representing all the items that are part of all the transactions. 3. min_support - A floating point variable representing the min_support value for the set of transactions. Output: 1. updated_dataset - A dictionary representing the transaction items in sorted order of their support counts. pruned_support = item_support (dataset, min_support) updated_dataset = dict() # This loop sorts the transaction items based on the item support counts for key, value in dataset.items(): updated_dataset[key] = sorted(value, key=pruned_support.get, reverse=True) # Update the following loop to remove items that do not belong to the pruned_support dictionary for key, value in updated_dataset.items(): updated_values = list() for item in value: # your code here updated_dataset[key] = updated_values return updated_dataset
def build_fp_tree(updated_dataset): Input: 1. updated_dataset - A python dictionary containing the updated set of transactions based on the pruned support Output: 1. fp_tree - A dictionary representing the fp_tree. Each node should have a count and children attribute. HINT: 1. Loop over each transaction in the dataset and make an update to the fp_tree dictionary. 2. For each loop iteration store a pointer to the previously visited node and update it's children in the next 3. Update the root pointer when you start processing items in each transaction. 4. Reset the root pointer for each transaction. Sample Tree Output: {'Y': {'count': 3, 'children': {'V': {'count': 1, "children': {}}}}, 'X': {'count': 2, children': {'R': {'count': 1, 'children': {'F': {'count': 1, "children': {}}}}}}} fp_tree = dict() for key, value in updated_dataset.items(): # your code here return fp_tree