FAQ
Hbase is built on hdfs but just to read records from it, you don't
need map reduce. So, its possible to access it real time. The .20
release compares to mysql as far as random reads go...

I haven't heard of hive talking to hbase yet. But that'll be a good
feature to have for sure.
On 10/2/09, Otis Gospodnetic wrote:
My understanding is that *no* tools built on top of MapReduce (Hive, Pig,
Cascading, CloudBase...) can be real-time where real-time is something that
processes the data and produces output in under 5 seconds or so.

I believe Hive can read HBase now, too.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR


----- Original Message ----
From: Amandeep Khurana <amansk@gmail.com>
To: common-user@hadoop.apache.org
Sent: Saturday, October 3, 2009 1:18:57 AM
Subject: Re: indexing log files for adhoc queries - suggestions?

There's another option - cascading.

With pig and cascading you can use hbase as a backend. So that might
be something you can explore too... The choice will depend on what
kind of querying you want to do - real time or batch processed.
On 10/2/09, Otis Gospodnetic wrote:
Use Pig or Hive. Lots of overlap, some differences, but it looks like
both
projects' future plans mean even more overlap, though I didn't hear any
mentions of convergence and merging.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR


----- Original Message ----
From: Amandeep Khurana
To: common-user@hadoop.apache.org
Sent: Friday, October 2, 2009 6:28:51 PM
Subject: Re: indexing log files for adhoc queries - suggestions?

Hive is an sql-like abstraction over map reduce. It just enables you
to execute sql-like queries over data without actually having to write
the MR job. However it converts the query into a job at the back.

Hbase might be what you are looking for. You can put your logs into
hbase and query them as well as run MR jobs over them...
On 10/1/09, Mayuran Yogarajah wrote:
ishwar ramani wrote:
Hi,

I have a setup where logs are periodically bundled up and dumped
into
hadoop dfs as large sequence file.

It works fine for all my map reduce jobs.

Now i need to handle adhoc queries for pulling out logs based on
user
and time range.

I really dont need a full indexer (like lucene) for this purpose.

My first thought is to run a periodic mapreduce to generate a large
text file sorted by user id.

The text file will have (sequence file name, offset) to retrieve the
logs
....


I am guessing many of you ran into similar requirements... Any
suggestions on doing this better?

ishwar
Have you looked into Hive? Its perfect for ad hoc queries..

M

--


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

--


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

--


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 8 of 10 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 1, '09 at 5:49p
activeOct 5, '09 at 9:32p
posts10
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase