Question 6 (3 marks) In this question we use the same file hightemp.txt as in Question 1 of the examination paper. A fil
Posted: Mon Jun 06, 2022 6:46 pm
Question 6 (3 marks) In this question we use the same file hightemp.txt as in Question 1 of the examination paper. A file hightemp.txt contains information about the highest temperatures recorded every day in a number of cities all over the world. The file higtemp.txt is a text file where information about the highest temperature recorded on a given day, in a given city is stored in a single row. Data items like date, temperature and city name are separated with a single blank. A file hightemp.txt has been uploaded to HDFS at a location /bigdata/hightemp. (1) Load the contents of a file hightemp.txt located in HDFS into a Resilient Distributed Dataset (RDD) and use RDD to find an average temperature in Sydney in 2020. (1 mark) (2) Load the contents of a file hightemp.txt located in HDFS into a Dataset and use the Dataset to find the total number of temperature measurements per city. (1 mark) (3) Load the contents of a file hightemp.txt located in HDFS into a DataFrame and use SQL to find an average temperature per city and per year and city. (1 mark)
01-JAN-1991 25 Sydney 01-JAN-1991 30 Brisbane 32 Singapore. 01-JAN-1991 02-JAN-1991 02-JAN-1991 02-JAN-1991 25 Sydney 31 Brisbane 35 Singapore 05-JUN-2022 15 Sydney 05-JUN-2022 20 Brisbane 05-JUN-2022 25 Singapore