A client uses Amazon Kinesis to gather clickstream data and groups the events by IP address into 5-minute chunks saved i

Business, Finance, Economics, Accounting, Operations Management, Computer Science, Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Algebra, Precalculus, Statistics and Probabilty, Advanced Math, Physics, Chemistry, Biology, Nursing, Psychology, Certifications, Tests, Prep, and more.
Post Reply
answerhappygod
Site Admin
Posts: 899604
Joined: Mon Aug 02, 2021 8:13 am

A client uses Amazon Kinesis to gather clickstream data and groups the events by IP address into 5-minute chunks saved i

Post by answerhappygod »

A client uses Amazon Kinesis to gather clickstream data and groups the events by IP address into 5-minute chunks saved in Amazon S3.

Numerous analysts inside the business examine this data using Hive on Amazon EMR. Their requests are always made with a particular IP address in mind. Using Hive running on Amazon EMR, data must be optimized for querying depending on IP address.

What is the most effective way to query Hive data?

A. Store an index of the files by IP address in the Amazon DynamoDB metadata store for EMRFS.
B. Store the Amazon S3 objects with the following naming scheme: bucket_name/source=ip_address/ year=yy/month=mm/day=dd/hour=hh/filename.
C. Store the data in an HBase table with the IP address as the row key.
D. Store the events for an IP address as a single file in Amazon S3 and add metadata with keys: Hive_Partitioned_IPAddress.
Join a community of subject matter experts. Register for FREE to view solutions, replies, and use search function. Request answer by replying!

This topic has 1 reply

You must be a registered member and logged in to view the replies in this topic.


Register Login
 
Post Reply