A client uses Amazon Kinesis to gather clickstream data and groups the events by IP address into 5-minute chunks saved i

Post by **answerhappygod** » Thu Jul 21, 2022 10:00 pm

A client uses Amazon Kinesis to gather clickstream data and groups the events by IP address into 5-minute chunks saved in Amazon S3.

Numerous analysts inside the business examine this data using Hive on Amazon EMR. Their requests are always made with a particular IP address in mind. Using Hive running on Amazon EMR, data must be optimized for querying depending on IP address.

What is the most effective way to query Hive data?

A. Store an index of the files by IP address in the Amazon DynamoDB metadata store for EMRFS.
B. Store the Amazon S3 objects with the following naming scheme: bucket_name/source=ip_address/ year=yy/month=mm/day=dd/hour=hh/filename.
C. Store the data in an HBase table with the IP address as the row key.
D. Store the events for an IP address as a single file in Amazon S3 and add metadata with keys: Hive_Partitioned_IPAddress.

This topic has 1 reply

You must be a registered member and logged in to view the replies in this topic.

Register Login