FAQ
thanks for all the suggestions ...
From my understanding so far,
as a primitive first cut i can use Hbase for indexing a client id
<version as timestamp> -> log location

If i have future requirements of more complex queries i can extend
hive or pig over hbase ...

ishwar
On Sat, Oct 3, 2009 at 5:43 AM, Omer Trajman wrote:
You might consider loading logs to a parallel database for the ad-hoc queries (full disclosure, I work for a database company).

For repeated ad-hoc queries, a distributed database will give you the scalability of hdfs and also structure the data to handle fast predicates and relational aggregates.

-Omer


-----Original Message-----
From: Amandeep Khurana <amansk@gmail.com>
Sent: Saturday, October 03, 2009 04:07
To: common-user@hadoop.apache.org <common-user@hadoop.apache.org>
Subject: Re: indexing log files for adhoc queries - suggestions?

Hbase is built on hdfs but just to read records from it, you don't
need map reduce. So, its possible to access it real time. The .20
release compares to mysql as far as random reads go...

I haven't heard of hive talking to hbase yet. But that'll be a good
feature to have for sure.
On 10/2/09, Otis Gospodnetic wrote:
My understanding is that *no* tools built on top of MapReduce (Hive, Pig,
Cascading, CloudBase...) can be real-time where real-time is something that
processes the data and produces output in under 5 seconds or so.

I believe Hive can read HBase now, too.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR


----- Original Message ----
From: Amandeep Khurana <amansk@gmail.com>
To: common-user@hadoop.apache.org
Sent: Saturday, October 3, 2009 1:18:57 AM
Subject: Re: indexing log files for adhoc queries - suggestions?

There's another option - cascading.

With pig and cascading you can use hbase as a backend. So that might
be something you can explore too... The choice will depend on what
kind of querying you want to do - real time or batch processed.
On 10/2/09, Otis Gospodnetic wrote:
Use Pig or Hive.  Lots of overlap, some differences, but it looks like
both
projects' future plans mean even more overlap, though I didn't hear any
mentions of convergence and merging.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR


----- Original Message ----
From: Amandeep Khurana
To: common-user@hadoop.apache.org
Sent: Friday, October 2, 2009 6:28:51 PM
Subject: Re: indexing log files for adhoc queries - suggestions?

Hive is an sql-like abstraction over map reduce. It just enables you
to execute sql-like queries over data without actually having to write
the MR job. However it converts the query into a job at the back.

Hbase might be what you are looking for. You can put your logs into
hbase and query them as well as run MR jobs over them...
On 10/1/09, Mayuran Yogarajah wrote:
ishwar ramani wrote:
Hi,

I have a setup where logs are periodically bundled up and dumped
into
hadoop dfs as large sequence file.

It works fine for all my map reduce jobs.

Now i need to handle adhoc queries for pulling out logs based on
user
and time range.

I really dont need a full indexer (like lucene) for this purpose.

My first thought is to run a periodic mapreduce to generate a large
text file sorted by user id.

The text file will have (sequence file name, offset) to retrieve the
logs
....


I am guessing many of you ran into similar requirements... Any
suggestions on doing this better?

ishwar
Have you looked into Hive? Its perfect for ad hoc queries..

M

--


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

--


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

--


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 10 of 10 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 1, '09 at 5:49p
activeOct 5, '09 at 9:32p
posts10
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase