Page 1 of 1

Question 6 (3 marks) In this question we use the same file hightemp.txt as in Question 1 of the examination paper. A fil

Posted: Mon Jun 06, 2022 6:46 pm
by answerhappygod
Question 6 3 Marks In This Question We Use The Same File Hightemp Txt As In Question 1 Of The Examination Paper A Fil 1
Question 6 3 Marks In This Question We Use The Same File Hightemp Txt As In Question 1 Of The Examination Paper A Fil 1 (79.39 KiB) Viewed 29 times
hightemp.txt
Question 6 3 Marks In This Question We Use The Same File Hightemp Txt As In Question 1 Of The Examination Paper A Fil 2
Question 6 3 Marks In This Question We Use The Same File Hightemp Txt As In Question 1 Of The Examination Paper A Fil 2 (11.95 KiB) Viewed 29 times
Question 6 (3 marks) In this question we use the same file hightemp.txt as in Question 1 of the examination paper. A file hightemp.txt contains information about the highest temperatures recorded every day in a number of cities all over the world. The file higtemp.txt is a text file where information about the highest temperature recorded on a given day, in a given city is stored in a single row. Data items like date, temperature and city name are separated with a single blank. A file hightemp.txt has been uploaded to HDFS at a location /bigdata/hightemp. (1) Load the contents of a file hightemp.txt located in HDFS into a Resilient Distributed Dataset (RDD) and use RDD to find an average temperature in Sydney in 2020. (1 mark) (2) Load the contents of a file hightemp.txt located in HDFS into a Dataset and use the Dataset to find the total number of temperature measurements per city. (1 mark) (3) Load the contents of a file hightemp.txt located in HDFS into a DataFrame and use SQL to find an average temperature per city and per year and city. (1 mark)

01-JAN-1991 25 Sydney 01-JAN-1991 30 Brisbane 32 Singapore. 01-JAN-1991 02-JAN-1991 02-JAN-1991 02-JAN-1991 25 Sydney 31 Brisbane 35 Singapore 05-JUN-2022 15 Sydney 05-JUN-2022 20 Brisbane 05-JUN-2022 25 Singapore