Question is related to Big Data system, pls dont answerif its not relevent to the subject
This is a subjective question, hence you have to write your answer in the Text-Field given below. 77308 You have a 928 MB file stored on HDFS as part of a Hadoop 2.x distribution. A data analytics program uses this file and runs in parallel across the cluster nodes [6 marks] a. The default block size and replication factor is used in the configuration. How many total blocks including replicas will be stored in the cluster ? What are the unique HDFS block sizes you will find for the specific file? b. The cluster has 64 cores to speed up the processing. If the program can at best achieve 60% parallelism in the code to exploit the multiple cores and the rest of it is sequential, what is the theoretical limit on speed-up you can expect with 64 cores compared to a sequential version of the same program running on one core with the same file ? How will this limit change if you doubled the compute power to 128 cores? You can simplify the system to assume cluster nodes and cores mean the same and we can ignore the overheads of communication etc. depending on the specific cluster configuration, scheduling etc. c. Suppose you could use a more scalable algorithm with 80% parallelism and a larger file as you move to a 128 core system. What would be the theoretical speed-up limit for 128 cores?
Question is related to Big Data system, pls dont answer if its not relevent to the subject
-
- Site Admin
- Posts: 899603
- Joined: Mon Aug 02, 2021 8:13 am