Cloudera Administrator for Apache Hadoop Questions + Answers

Post by **answerhappygod** » Tue Apr 05, 2022 9:46 am

Question 1
Which command does Hadoop offer to discover missing or corrupt HDFS data?
A. Hdfs fs –du
B. Hdfs fsck
C. Dskchk
D. The map-only checksum
E. Hadoop does not provide any tools todiscover missing or corrupt data; there is not need because three replicas are kept for each data block

Answer : B

Reference:https://twiki.grid.iu.edu/bin/view/Stor ... opRecovery
Question 2
You are migrating a cluster from MApReduce version 1 (MRv1)to MapReduce version 2
(MRv2) on YARN. You want to maintain your MRv1 TaskTracker slot capacities when you migrate. What should you do/
A. Configure yarn.applicationmaster.resource.memory-mb and yarn.applicationmaster.resource.cpu-vcores so that ApplicationMaster container allocations match the capacity you require.
B. You dont need to configure or balance these properties in YARN as YARN dynamically balances resource management capabilities on your cluster
C. Configure mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum ub yarn-site.xml to match your clusters capacity set by the yarn-scheduler.minimum-allocation
D. Configure yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores to match the capacity yourequire under YARN for each NodeManager

Answer : D

Question 3
Which YARN daemon or service negotiations map and reduce Containers from the
Scheduler, tracking their status and monitoring progress?
A. NodeManager
B. ApplicationMaster
C. ApplicationManager
D. ResourceManager

Answer : B

Reference:http://www.devx.com/opensource/intro-to ... -yarn.html(See resource manager)
Question 4
You are running a Hadoop cluster with MapReduce version 2 (MRv2) on YARN. You consistently see that MapReducemap tasks on your cluster are running slowly because of excessive garbage collection of JVM, how do you increase JVM heap size property to 3GB to optimize performance?
A. yarn.application.child.java.opts=-Xsx3072m
B. yarn.application.child.java.opts=-Xmx3072m
C. mapreduce.map.java.opts=-Xms3072m
D. mapreduce.map.java.opts=-Xmx3072m

Answer : C

Reference:http://hortonworks.com/blog/how-to-plan ... n-hdp-2-0/
Question 5
You have a cluster running with a FIFO scheduler enabled. You submit a large job A to the cluster, which you expect to run for one hour. Then, you submit job B to the cluster, which you expect to run a couple of minutes only.
You submit both jobs with the same priority.
Which two best describes how FIFO Scheduler arbitrates the cluster resources for job and its tasks? (Choose two)
A. Because there is a more than a single job on thecluster, the FIFO Scheduler will enforce a limit on the percentage of resources allocated to a particular job at any given time
B. Tasks are scheduled on the order of their job submission
C. The order of execution of job may vary
D. Given job A and submitted in that order, all tasks from job A are guaranteed to finish before all tasks from job B
E. The FIFO Scheduler will give, on average, and equal share of the cluster resources over the job lifecycle
F. The FIFO Scheduler will pass an exception back to the client when Job B is submitted, since all slots on the cluster are use

Answer : A,D

Question 6
A slave node in yourcluster has 4 TB hard drives installed (4 x 2TB). The DataNode is configured to store HDFS blocks on all disks. You set the value of the dfs.datanode.du.reserved parameter to 100 GB. How does this alter HDFS block storage?
A. 25GB on each hard drive maynot be used to store HDFS blocks
B. 100GB on each hard drive may not be used to store HDFS blocks
C. All hard drives may be used to store HDFS blocks as long as at least 100 GB in total is available on the node
D. A maximum if 100 GB on each hard drive maybe used to store HDFS blocks

Answer : C

Question 7
You want to understand more about how users browse your public website. For example, you want to know which pages they visit prior to placing an order. You have a server farm of 200 web servers hosting your website. Which is the most efficient process to gather these web server across logs into your Hadoop cluster analysis?
A. Sample the web server logs web servers and copy them into HDFS using curl
B. Ingest the server web logs into HDFS using Flume
C. Channel these clickstreams into Hadoop using Hadoop Streaming
D. Import all user clicks from your OLTP databasesinto Hadoop using Sqoop
E. Write a MapReeeduce job with the web servers for mappers and the Hadoop cluster nodes for reducers

Answer : B

Explanation: Apache Flume is a service for streaming logs into Hadoop.
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of streaming data into the Hadoop Distributed File
System (HDFS). It has a simple and flexible architecture based on streaming data flows; and is robust and fault tolerant with tunable reliability mechanisms for failover and recovery.
Question 8
Your Hadoop cluster is configuring with HDFS and MapReduce version 2 (MRv2) on
YARN. Can you configure a worker node to run a NodeManager daemon but not a
DataNode daemon and still have a functional cluster?
A. Yes. The daemon will receive data from the NameNode to run Map tasks
B. Yes. The daemon will get data from another (non-local) DataNode to run Map tasks
C. Yes. The daemon will receive Map tasks only
D. Yes. The daemon will receive Reducer tasks only

Answer : B

Question 9
You have recently converted your Hadoop cluster from a MapReduce 1 (MRv1) architecture to MapReduce 2 (MRv2) on YARN architecture. Your developers are accustomed to specifying map and reduce tasks (resource allocation) tasks when they run jobs: A developer wants to know how specify to reduce tasks when a specific job runs.
Which method should you tell that developers to implement?
A. MapReduce version 2 (MRv2) on YARN abstracts resource allocation away from the idea of tasks into memory and virtual cores, thus eliminating the need for a developer to specify the number of reduce tasks, and indeed preventing the developer from specifying the number of reduce tasks.
B. InYARN, resource allocations is a function of megabytes of memory in multiples of 1024mb. Thus, they should specify the amount of memory resource they need by executing D mapreduce-reduces.memory-mb-2048
C. In YARN, the ApplicationMaster is responsible forrequesting the resource required for a specific launch. Thus, executing D yarn.applicationmaster.reduce.tasks=2 will specify that the ApplicationMaster launch two task contains on the worker nodes.
D. Developers specify reduce tasks in the exact same wayfor both MapReduce version 1 (MRv1) and MapReduce version 2 (MRv2) on YARN. Thus, executing D mapreduce.job.reduces-2 will specify reduce tasks.
E. In YARN, resource allocation is function of virtual cores specified by the ApplicationManager making requests to the NodeManager where a reduce task is handeled by a single container (and thus a single virtual core). Thus, the developer needs to specify the number of virtual cores to the NodeManager by executing p yarn.nodemanager.cpu-vcores=2

Answer : D

Question 10
You have A 20 node Hadoop cluster, with 18 slave nodes and 2 master nodes running
HDFS High Availability (HA). You want to minimize the chance of data loss in your cluster.
What should you do?
A. Add another master node to increase the number of nodes running the JournalNode which increases the number of machines available to HA to create a quorum
B. Set an HDFS replication factor that provides data redundancy, protecting against node failure
C. Run a Secondary NameNode on a different master from the NameNode in order to provide automatic recovery from a NameNode failure.
D. Run the ResourceManager on a different master from the NameNode in order to load- share HDFS metadata processing
E. Configure thecluster’s disk drives with an appropriate fault tolerant RAID level

Answer : D

Question 11
On a cluster running MapReduce v2 (MRv2) on YARN, a MapReduce job is given a directory of 10 plain text files as its input directory. Each file is made up of 3 HDFS blocks.
How many Mappers will run?
A. We cannot say; the number of Mappers is determined by the ResourceManager
B. We cannot say; the number of Mappers is determined by the developer
C. 30
D. 3
E. 10
F. We cannot say; the number of mappers is determined by theApplicationMaster

Answer : E

Question 12
Given:

You want to clean up thislist by removing jobs where the State is KILLED. What command you enter?
A. Yarn application –refreshJobHistory
B. Yarn application –kill application_1374638600275_0109
C. Yarn rmadmin –refreshQueue
D. Yarn rmadmin –kill application_1374638600275_0109

Answer : B

Reference:http://docs.hortonworks.com/HDPDocument ... /bk_using- apache-hadoop/content/common_mrv2_commands.html
Question 13
You are running a Hadoop cluster with a NameNode on host mynamenode. What are two ways to determine available HDFS space in your cluster?
A. Run hdfs fs –du / and locate the DFS Remaining value
B. Run hdfs dfsadmin –report and locate the DFS Remaining value
C. Run hdfs dfs / and subtract NDFS Used from configured Capacity
D. Connect to http://mynamenode:50070/dfshealth.jsp and locate the DFS remaining value

Answer : B

Question 14
In CDH4 and later, which file contains a serialized form of all the directory andfiles inodes in the filesystem, giving the NameNode a persistent checkpoint of the filesystem metadata?
A. fstime
B. VERSION
C. Fsimage_N (where N reflects transactions up to transaction ID N)
D. Edits_N-M (where N-M transactions between transaction ID Nand transaction ID N)

Answer : C

Reference:http://mikepluta.com/tag/namenode/
Question 15
Assuming youre not running HDFS Federation, what is the maximum number of
NameNode daemons you should run on your clusterin order to avoid a split-brain scenario with your NameNode when running HDFS High Availability (HA) using Quorum- based storage?
A. Two active NameNodes and two Standby NameNodes
B. One active NameNode and one Standby NameNode
C. Two active NameNodes and on Standby NameNode
D. Unlimited. HDFS High Availability (HA) is designed to overcome limitations on the number of NameNodes you can deploy

Answer : B

Question 16
Choose three reasons why should you run the HDFS balancer periodically? (Choose three)
A. To ensure that there iscapacity in HDFS for additional data
B. To ensure that all blocks in the cluster are 128MB in size
C. To help HDFS deliver consistent performance under heavy loads
D. To ensure that there is consistent disk utilization across the DataNodes
E. To improve data locality MapReduce

Answer : C,D,E

Explanation: http://www.quora.com/Apache-Hadoop/It-i ... u-run-the-
HDFS-balancer-periodically-Why-Choose-3
Question 17
You are configuring your cluster to run HDFS andMapReducer v2 (MRv2) on YARN. Which two daemons needs to be installed on your clusters master nodes? (Choose two)
A. HMaster
B. ResourceManager
C. TaskManager
D. JobTracker
E. NameNode
F. DataNode

Answer : B,E

Question 18
You have a Hadoop clusterHDFS, and a gateway machine external to the cluster from which clients submit jobs. What do you need to do in order to run Impala on the cluster and submit jobs from the command line of the gateway machine?
A. Install the impalad daemon statestored daemon, and daemon on each machine in the cluster, and the impala shell on your gateway machine
B. Install the impalad daemon, the statestored daemon, the catalogd daemon, and the impala shell on your gateway machine
C. Install the impalad daemon and the impalashell on your gateway machine, and the statestored daemon and catalogd daemon on one of the nodes in the cluster
D. Install the impalad daemon on each machine in the cluster, the statestored daemon and catalogd daemon on one machine in the cluster, and the impala shell on your gateway machine
E. Install the impalad daemon, statestored daemon, and catalogd daemon on each machine in the cluster and on the gateway node

Answer : D

Question 19
You observed that thenumber of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to
1000MB. How would you tune your io.sort.mb value to achieve maximum memory to disk
I/O ratio?
A. For a 1GB child heap size an io.sort.mb of 128 MB will always maximize memory to disk I/O
B. Increase the io.sort.mb to 1GB
C. Decrease the io.sort.mb value to 0
D. Tune the io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records.

Answer : D

Question 20
You suspect that your NameNode is incorrectly configured, and is swapping memory to disk. Which Linux commands help you to identify whether swappingis occurring? (Select all that apply)
A. free
B. df
C. memcat
D. top
E. jps
F. vmstat
G. swapinfo

Answer : A,D,F

Reference:http://www.cyberciti.biz/faq/linux-chec ... e-command/

Question 21
Your cluster is running MapReduce version 2 (MRv2) on YARN. Your ResourceManager is configured to use the FairScheduler. Now you want to configure your scheduler such that a new user on the cluster can submit jobs into their own queue application submission.
Which configuration should you set?
A. You can specify new queue name when user submits a job and new queue can be created dynamically if the property yarn.scheduler.fair.allow-undecleared-pools = true
B. Yarn.scheduler.fair.user.fair-as-default-queue = false and yarn.scheduler.fair.allow- undecleared-pools = true
C. You can specify new queue name when user submits a job and new queue can be created dynamically if yarn .schedule.fair.user-as-default-queue = false
D. You can specify new queue name per application in allocations.xml file and have new jobs automatically assigned to the application queue

Answer : A

Question 22
Assume you have a file named foo.txt in your local directory. You issue the following three commands:

Hadoop fs mkdir input -
Hadoop fs put foo.txt input/foo.txt

Hadoop fs put foo.txt input -
What happens when you issue the third command?
A. The write succeeds, overwriting foo.txt in HDFS with no warning
B. The file is uploaded and stored as a plain file named input
C. You get a warning that foo.txt is being overwritten
D. You get an error message telling youthat foo.txt already exists, and asking you if you would like to overwrite it.
E. You get a error message telling you that foo.txt already exists. The file is not written to HDFS
F. You get an error message telling you that input is not a directory
G. Thewrite silently fails

Answer : C,E

Question 23
Your cluster implements HDFS High Availability (HA). Your two NameNodes are named nn01 and nn02. What occurs when you execute the command: hdfs haadmin failover nn01 nn02?
A. nn02 is fenced, and nn01 becomesthe active NameNode
B. nn01 is fenced, and nn02 becomes the active NameNode
C. nn01 becomes the standby NameNode and nn02 becomes the active NameNode
D. nn02 becomes the standby NameNode and nn01 becomes the active NameNode

Answer : B

Explanation: Explanation:
failover initiate a failover between two NameNodes
This subcommand causes a failover from the first provided NameNode to the second. If the first
NameNode is in the Standby state, this command simply transitions the second to the
Active state without error. If the first NameNode is in the Active state, an attempt will be made to gracefully transition it to the Standby state. If this fails, the fencing methods (as configured by dfs.ha.fencing.methods) will be attempted in order until one of the methods succeeds. Only after this process will the second NameNode be transitioned to the Active state. If no fencing method succeeds, the second NameNode will not be transitioned to the
Active state, and an error will be returned.
Question 24
Which process instantiates user code, and executes map and reduce tasks on a cluster running MapReduce v2 (MRv2) on YARN?
A. NodeManager
B. ApplicationMaster
C. TaskTracker
D. JobTracker
E. NameNode
F. DataNode
G. ResourceManager

Answer : A

Question 25
Which scheduler would you deploy to ensure that your cluster allows short jobs to finish within a reasonable time without startinglong-running jobs?
A. Complexity Fair Scheduler (CFS)
B. Capacity Scheduler
C. Fair Scheduler
D. FIFO Scheduler

Answer : C

Reference:http://hadoop.apache.org/docs/r1.2.1/fa ... duler.html