as well.
HBase can lend in "ok" in storing adjacency lists for large graphs. Although
processing on the stored graph does not necessarily leverage the data
locality since different nodes in a node's adjacency list could reside on
different physical nodes. You can intelligently partition your graph though.
HBase offers the ability to work on large graphs since it can scale more
than other graph databases or graph processing engines. At some point we
were considering building an RDF triple store over HBase (there is still
some steam there but not enough to take it up yet).
But as Jonathan said, if you are looking at a data set of the order of 10GB,
HBase isnt your best bet.
-Amandeep
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Tue, Mar 9, 2010 at 4:12 PM, Andrew Purtell wrote:
I came to this discussion late.
Ryan and J-D's use case is clearly successful.
In addition to what others have said, I think another case where HBase
really excels is supporting analytics over Big Data (which I define as on
the order of petabyte). Some of the best performance numbers are put up by
scanners. There is tight integration with the Hadoop MapReduce framework,
not only in terms of API support but also with respect to efficient task
distribution over the cluster -- moving computation to data -- and there is
a favorable interaction with HDFS's location aware data placement. Moving
computation to data like that is one major reason how analytics using the
MapReduce paradigm can put conventional RDBMS/data warehouses to shame for
substantially less cost. Since 0.20.0, results of analytic computations over
the data can be materialized and served out in real time in response to
queries. This is a complete solution.
- Andy
about the
such
where
the
in
10GB...
take
Tokyo
do
them.
specs
perfect
presentation
data
our
what
fill
Gray
the
your
some
Others
memory"
you'll
knowledgeable
you're
wrote:
Barney.
with
using
you say
At
session
little
into
Is
would
HBase
that
few
really
down
give you
outage
season)
HBase
our
flattening
software.
eXtreme
understand
http://old.nabble.com/Use-cases-of-HBase-tp27837470p27837470.html
--
View this message in context:
http://old.nabble.com/Use-cases-of-HBase-tp27837470p27838006.html
Sent from the HBase User mailing list archive at Nabble.com.
--
View this message in context:
http://old.nabble.com/Use-cases-of-HBase-tp27837470p27841193.html
Sent from the HBase User mailing list archive at Nabble.com.
--
---
Thanks,
Charles Woerner
I came to this discussion late.
Ryan and J-D's use case is clearly successful.
In addition to what others have said, I think another case where HBase
really excels is supporting analytics over Big Data (which I define as on
the order of petabyte). Some of the best performance numbers are put up by
scanners. There is tight integration with the Hadoop MapReduce framework,
not only in terms of API support but also with respect to efficient task
distribution over the cluster -- moving computation to data -- and there is
a favorable interaction with HDFS's location aware data placement. Moving
computation to data like that is one major reason how analytics using the
MapReduce paradigm can put conventional RDBMS/data warehouses to shame for
substantially less cost. Since 0.20.0, results of analytic computations over
the data can be materialized and served out in real time in response to
queries. This is a complete solution.
- Andy
----- Original Message ----
From: Ryan Rawson <ryanobjc@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Tue, March 9, 2010 3:34:55 PM
Subject: Re: Use cases of HBase
HBase operates more like a write-thru cache. Recent writes are in
memory (aka memstore). Older data is in the block cache (by default
20% of Xmx). While you can rely on os buffering, you also want a
generous helping of block caching directly in HBase's regionserver.
We are seeing great performance, and our 95th percentiles seem to be
related to GC pauses.
So to answer your use case below, the answer is most decidedly 'yes'.
Recent values are in memory, also read from memory as well.
-ryan
On Tue, Mar 9, 2010 at 3:12 PM, Charles Woerner
wrote:
further forFrom: Ryan Rawson <ryanobjc@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Tue, March 9, 2010 3:34:55 PM
Subject: Re: Use cases of HBase
HBase operates more like a write-thru cache. Recent writes are in
memory (aka memstore). Older data is in the block cache (by default
20% of Xmx). While you can rely on os buffering, you also want a
generous helping of block caching directly in HBase's regionserver.
We are seeing great performance, and our 95th percentiles seem to be
related to GC pauses.
So to answer your use case below, the answer is most decidedly 'yes'.
Recent values are in memory, also read from memory as well.
-ryan
On Tue, Mar 9, 2010 at 3:12 PM, Charles Woerner
wrote:
Ryan, your confidence has me interested in exploring HBase a bit
some real-time functionality that we're building out. One question
mem-caching functionality in HBase... Is it write-through or write-back
that all frequently written items are likely in memory, or is it
pull-through via a client query? Or would I be relying on lower level
caching features of the OS and underlying filesystem? In other words,
pull-through via a client query? Or would I be relying on lower level
caching features of the OS and underlying filesystem? In other words,
there are a high number of both reads and writes, and where 90% of all
reads are on recently (5 minutes) written datums would the HBase
architecture help ensure that the most recently written data is already
architecture help ensure that the most recently written data is already
the cache?
On Tue, Mar 9, 2010 at 2:29 PM, Ryan Rawson wrote:
One thing to note is that 10GB is half the memory of a reasonable
sized machine. In fact I have seen 128 GB memcache boxes out there.
As for performance, I obviously feel HBase can be performant for real
time queries. To get a consistent response you absolutely have to
have 95%+ caching in ram. There is no way to achieve 1-2ms responses
from disk. Throwing enough ram at the problem, I think HBase solves
this nicely and you won't have to maintain multiple architectures.
-ryan
One thing to note is that 10GB is half the memory of a reasonable
sized machine. In fact I have seen 128 GB memcache boxes out there.
As for performance, I obviously feel HBase can be performant for real
time queries. To get a consistent response you absolutely have to
have 95%+ caching in ram. There is no way to achieve 1-2ms responses
from disk. Throwing enough ram at the problem, I think HBase solves
this nicely and you won't have to maintain multiple architectures.
-ryan
On Tue, Mar 9, 2010 at 2:08 PM, Jonathan Gray wrote:
Brian,
I would just reiterate what others have said. If you're goal is a
consistent 1-2ms read latency and your dataset is on the order of
Brian,
I would just reiterate what others have said. If you're goal is a
consistent 1-2ms read latency and your dataset is on the order of
HBase is not a good match. It's more than what you need and you'll
unnecessary performance hits.
I would look at some of the simpler KV-style stores out there like
I would look at some of the simpler KV-style stores out there like
Cabinet, Memcached, or BerkeleyDB, the in-memory ones like Redis.
JG
-----Original Message-----
From: jaxzin
Sent: Tuesday, March 09, 2010 12:09 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Use cases of HBase
Gary, I looked at your presentation and it was very helpful. But I
JG
-----Original Message-----
From: jaxzin
Sent: Tuesday, March 09, 2010 12:09 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Use cases of HBase
Gary, I looked at your presentation and it was very helpful. But I
have
a
few unanswered questions from it if you wouldn't mind answering
few unanswered questions from it if you wouldn't mind answering
How
big is/was your cluster that handled 3k req/sec? And what were the
on
each node (RAM/CPU)?
When you say latency can be good, what you mean? Is it even in the ballpark
of 1 ms? Because we already deal with the GC and don't expect
When you say latency can be good, what you mean? Is it even in the ballpark
of 1 ms? Because we already deal with the GC and don't expect
real-time behavior. So that might be okay with me.
P.S. I was at Hadoop World NYC and saw Ryan and Jonathan's
P.S. I was at Hadoop World NYC and saw Ryan and Jonathan's
there but somehow mentally blocked it. Thanks for the reminder.
Gary Helmling wrote:
Gary Helmling wrote:
Hey Brian,
We use HBase to complement MySQL in serving activity-stream type
We use HBase to complement MySQL in serving activity-stream type
here
at Meetup. It's handling real-time requests involved in 20-25% of
page
views, but our latency requirements aren't as strict as yours. For
views, but our latency requirements aren't as strict as yours. For
it's worth, I did a presentation on our setup which will hopefully
in
some details: http://www.slideshare.net/ghelmling/hbase-at-meetup
There are also some great presentations by Ryan Rawson and Jonathan
There are also some great presentations by Ryan Rawson and Jonathan
on
how they've used HBase for realtime serving on their sites. See
how they've used HBase for realtime serving on their sites. See
presentations wiki page:
http://wiki.apache.org/hadoop/HBase/HBasePresentations
Like Barney, I suspect where you'll hit some issues will be in your
latency
requirements. Depending on how you layout your data and configure
http://wiki.apache.org/hadoop/HBase/HBasePresentations
Like Barney, I suspect where you'll hit some issues will be in your
latency
requirements. Depending on how you layout your data and configure
column families, your average latency may be good, but you will hit
pauses as I believe reads block at times during region splits or
compactions
and memstore flushes (unless you have a fairly static data set).
compactions
and memstore flushes (unless you have a fairly static data set).
here should be able to fill in more details.
With a relatively small dataset, you may want to look at the "in
With a relatively small dataset, you may want to look at the "in
configuration option for your column families.
What's your expected workload -- writes vs. reads? types of reads
What's your expected workload -- writes vs. reads? types of reads
be
doing: random access vs. sequential? There are a lot of
doing: random access vs. sequential? There are a lot of
folks
here to offer advice if you can give us some more insight into what
here to offer advice if you can give us some more insight into what
trying to build.
--gh
On Tue, Mar 9, 2010 at 11:21 AM, jaxzin
--gh
On Tue, Mar 9, 2010 at 11:21 AM, jaxzin
This is exactly the kind of feedback I'm looking for thanks,
So its sounds like you cache the data you get from HBase in a
session-based
memory? Are you using a Java EE HttpSession? (I'm less familiar
session-based
memory? Are you using a Java EE HttpSession? (I'm less familiar
django/rails equivalent but I'm assuming they exist) Or are you
a
memory cache provider like ehcache or memcache(d)?
Can you tell me more about your experience with latency and why
Can you tell me more about your experience with latency and why
that?
Barney Frank wrote:
Barney Frank wrote:
I am using Hbase to store visitor level clickstream-like data.
the
beginning of the visitor session I retrieve all the previous
data
from hbase and use it within my app server and massage it a
and
serve
to the consumer via web services. Where I think you will run
to the consumer via web services. Where I think you will run
the
most
problems is your latency requirement.
Just my 2 cents from a user.
On Tue, Mar 9, 2010 at 9:45 AM, jaxzin
wrote:problems is your latency requirement.
Just my 2 cents from a user.
On Tue, Mar 9, 2010 at 9:45 AM, jaxzin
Hi all, I've got a question about how everyone is using HBase.
anyone
using its as online data store to directly back a web service?
The text-book example of a weblink HBase table suggests there
The text-book example of a weblink HBase table suggests there
be
an
associated web front-end to display the information in that
table
(ex.
search results page), but I'm having trouble finding evidence
search results page), but I'm having trouble finding evidence
anyone
practice.
is
servicing web traffic backed directly by an HBase instance in
servicing web traffic backed directly by an HBase instance in
I'm evaluating if HBase would be the right tool to provide a
things
for
a large-scale web service we want to develop at ESPN and I'd
a large-scale web service we want to develop at ESPN and I'd
like
to
get opinions and experience from people who have already been
get opinions and experience from people who have already been
this
path. No need to reinvent the wheel, right?
I can tell you a little about the project goals if it helps
I can tell you a little about the project goals if it helps
an
idea
of what I'm trying to design for:
1) Highly available (It would be a central service and an
of what I'm trying to design for:
1) Highly available (It would be a central service and an
would
take
down everything)
2) Low latency (1-2 ms, less is better, more isn't acceptable)
3) High throughput (5-10k req/sec at worse case peak)
4) Unstable traffic (ex. Sunday afternoons during football
down everything)
2) Low latency (1-2 ms, less is better, more isn't acceptable)
3) High throughput (5-10k req/sec at worse case peak)
4) Unstable traffic (ex. Sunday afternoons during football
5) Small data...for now (< 10 GB of total data currently, but
could
allow us to design differently and store more online)
The reason I'm looking at HBase is that we've solved many of
The reason I'm looking at HBase is that we've solved many of
scaling
issues with the same basic concepts of HBase (sharding,
data
to
fit in one row, throw away ACID, etc) but with home-grown
fit in one row, throw away ACID, etc) but with home-grown
I'd
like to adopt an active open-source project if it makes sense.
Alternatives I'm also looking at: RDBMS fronted with Websphere
Alternatives I'm also looking at: RDBMS fronted with Websphere
Scale, RDBMS fronted with Hibernate/ehcache, or (the option I
the
least right now) memcached.
Thanks,
Brian
--
View this message in context:
least right now) memcached.
Thanks,
Brian
--
View this message in context:
Sent from the HBase User mailing list archive at Nabble.com.
View this message in context:
http://old.nabble.com/Use-cases-of-HBase-tp27837470p27838006.html
Sent from the HBase User mailing list archive at Nabble.com.
View this message in context:
http://old.nabble.com/Use-cases-of-HBase-tp27837470p27841193.html
Sent from the HBase User mailing list archive at Nabble.com.
--
---
Thanks,
Charles Woerner
