Page 1 of 1

Daily depletion reports from the field are received by a major food distributor in the form of gzip archives or CSV file

Posted: Thu Jul 21, 2022 10:00 pm
by answerhappygod
Daily depletion reports from the field are received by a major food distributor in the form of gzip archives or CSV files uploaded to Amazon S3. The files are between 500MB and 5GB in size. Each day, these files are processed by an EMR task.

Recently, it has been noted that file sizes fluctuate and EMR tasks take an excessive amount of time. With this little information, the distributor must adjust and optimize the data processing workflow in order to enhance the EMR job's performance.

Which suggestion is appropriate for an administrator to make?

A. Reduce the HDFS block size to increase the number of task processors.
B. Use bzip2 or Snappy rather than gzip for the archives.
C. Decompress the gzip archives and store the data as CSV files.
D. Use Avro rather than gzip for the archives.