How do I execute adhoc SQL queries on flat files generated by Hive?

We parse the log data with Hive and store the aggregation results in daily partitioned text fields on S3 (let's call them "rough" aggregations).

These aggregation results are pretty small (no more than a few MB per day) and we have a Javascript toolbar that loads and renders some aspects of this data (call them "fine-grained" clusters).

We are now doing "fine-grained" aggregations with Javascript code. I just want to use SQL queries here, for simplicity. I wonder what best practices exist for this kind of problem?

A) We could generate all the "fine-grained" aggregations in Hive. However, working with these small datasets is time consuming in Hive.

B) We could introduce a "quick access layer" between S3 and Javascript that can run SQL queries. What request mechanism would you recommend?

+3


source to share





All Articles