While HBase uses HDFS to store a table's HFile, it's still a different
storage manager as far as Impala is concerned. HBase provides a different
set of APIs to get at data, for example, you can get a single record back
very quickly in HBase.
HBase also supports a notion range scans, Impala can take advantage of this
to minimize the amount of data it has to analyze. Impala makes use of
HBase's APIs to get at the data you're asking for, allowing you to analyze
data within HBase using SQL. The HBase storage engine also allows for
updates and deletes, something not possible with regular HDFS files. Impala
will later have the capability to perform updates/deletes/inserts directly
to HBase.
If you have data which is added and modified rapidly, you should go with
HBase. If you have data which doesn't change often, but is added
frequently, HDFS might be a better choice.
Here's some more about HBase: http://hbase.apache.org/
I hope this clears things up.
On Fri, Jul 5, 2013 at 4:13 AM, nancy jean wrote:
Hi,
According to my understanding, When we create any table in HBase, It is
stored in HDFS as H-file. So HBase needs HDFS to store the data.
Here in Impala, How is HBase used as storage? I read somewhere that Impala
uses either HDFS or HBase for storage of data?
Can anyone clear this up for me?
Thanks in advance.
Hi,
According to my understanding, When we create any table in HBase, It is
stored in HDFS as H-file. So HBase needs HDFS to store the data.
Here in Impala, How is HBase used as storage? I read somewhere that Impala
uses either HDFS or HBase for storage of data?
Can anyone clear this up for me?
Thanks in advance.
--
Ricky Saltzer
Tools Developer
http://www.cloudera.com