PACE UNIVERSITY Summer 2022 - CS 673: Scalable Databases Assignment #3 Total Points: 100 Using the data set which has cu
Posted: Fri Jul 01, 2022 5:37 am
PACE UNIVERSITY Summer 2022 - CS 673: Scalable Databases Assignment #3 Total Points: 100 Using the data set which has customers data perform the following tasks. Points 20 Task Count the total amount ordered by each customer in Scala using RDD. • Split each comma-delimited line into RDD • Map each line to key/value pairs to cust_id and amount spent • Use reduce by key to add up amount for each customer Collect() the results and print List the customer who spent highest amount. List Top 5 customers based on the amount which they have spent, along with the customer ID and Product ID. List the Bottom 5 customers based on the amount spent, along with the customer ID and Product ID. Find out customers who spent an average amount of the total amount spent in data. Create a function to give rewards ($5) to all the customers who spent more than the average of the total amount. Sort the customers based on the amount spent (high to low) List the product ID's of Top 5 customers who have purchased List most sold product ID's Submission Details: 10 10 10 CS673 - Summer 2022-Kaleema 10 10 10 10 10 Submit the link of Databricks • Late submission up to one week will incur 10% of total points earned. • Attach this file with self-assessment Plagiarism will be checked, up to 15% similarity score is acceptable. NOTE : Extra credit for the good documentation - 5 Points. Self- assessment