A company utilizes a custom map reduce program to generate monthly reports from a large number of tiny data files stored in an Amazon S3 bucket. The data is provided on a regular yet unexpected basis by different business divisions. As the dataset becomes larger, processing all of the data in one day gets more challenging. Although the company has increased the size of its Amazon EMR cluster, further optimizations may enhance speed.
The organization's performance must be enhanced while requiring minimum modifications to current procedures and systems.
What course of action should be taken by the organization?
A. Use Amazon S3 Event Notifications and AWS Lambda to create a quick search file index in DynamoDB.
B. Add Spark to the Amazon EMR cluster and utilize Resilient Distributed Datasets in-memory.
C. Use Amazon S3 Event Notifications and AWS Lambda to index each file into an Amazon Elasticsearch Service cluster.
D. Schedule a daily AWS Data Pipeline process that aggregates content into larger files using S3DistCp.
E. Have business units submit data via Amazon Kinesis Firehose to aggregate data hourly into Amazon S3.
A company utilizes a custom map reduce program to generate monthly reports from a large number of tiny data files stored
-
answerhappygod
- Site Admin
- Posts: 899604
- Joined: Mon Aug 02, 2021 8:13 am
A company utilizes a custom map reduce program to generate monthly reports from a large number of tiny data files stored
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!