Hive is an sql-like abstraction over map reduce. It just enables you
to execute sql-like queries over data without actually having to write
the MR job. However it converts the query into a job at the back.

Hbase might be what you are looking for. You can put your logs into
hbase and query them as well as run MR jobs over them...
On 10/1/09, Mayuran Yogarajah wrote:
ishwar ramani wrote:

I have a setup where logs are periodically bundled up and dumped into
hadoop dfs as large sequence file.

It works fine for all my map reduce jobs.

Now i need to handle adhoc queries for pulling out logs based on user
and time range.

I really dont need a full indexer (like lucene) for this purpose.

My first thought is to run a periodic mapreduce to generate a large
text file sorted by user id.

The text file will have (sequence file name, offset) to retrieve the logs

I am guessing many of you ran into similar requirements... Any
suggestions on doing this better?

Have you looked into Hive? Its perfect for ad hoc queries..



Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 10 | next ›
Discussion Overview
groupcommon-user @
postedOct 1, '09 at 5:49p
activeOct 5, '09 at 9:32p



site design / logo © 2021 Grokbase