FAQ
Hi Nancy -

While HBase uses HDFS to store a table's HFile, it's still a different
storage manager as far as Impala is concerned. HBase provides a different
set of APIs to get at data, for example, you can get a single record back
very quickly in HBase.

HBase also supports a notion range scans, Impala can take advantage of this
to minimize the amount of data it has to analyze. Impala makes use of
HBase's APIs to get at the data you're asking for, allowing you to analyze
data within HBase using SQL. The HBase storage engine also allows for
updates and deletes, something not possible with regular HDFS files. Impala
will later have the capability to perform updates/deletes/inserts directly
to HBase.

If you have data which is added and modified rapidly, you should go with
HBase. If you have data which doesn't change often, but is added
frequently, HDFS might be a better choice.

Here's some more about HBase: http://hbase.apache.org/

I hope this clears things up.


On Fri, Jul 5, 2013 at 4:13 AM, nancy jean wrote:

Hi,

According to my understanding, When we create any table in HBase, It is
stored in HDFS as H-file. So HBase needs HDFS to store the data.
Here in Impala, How is HBase used as storage? I read somewhere that Impala
uses either HDFS or HBase for storage of data?
Can anyone clear this up for me?

Thanks in advance.


--
Ricky Saltzer
Tools Developer
http://www.cloudera.com

Search Discussions

  • Nancy jean at Jul 8, 2013 at 5:12 am
    So, If I use just HBase as storage in Impala, where are all the data
    getting stored? In which part of HBase is it getting stored? Or is it like
    HBase will need HDFS to store the data?
    On Saturday, July 6, 2013 3:46:12 AM UTC+5:30, Ricky Saltzer wrote:

    Hi Nancy -

    While HBase uses HDFS to store a table's HFile, it's still a different
    storage manager as far as Impala is concerned. HBase provides a different
    set of APIs to get at data, for example, you can get a single record back
    very quickly in HBase.

    HBase also supports a notion range scans, Impala can take advantage of
    this to minimize the amount of data it has to analyze. Impala makes use of
    HBase's APIs to get at the data you're asking for, allowing you to analyze
    data within HBase using SQL. The HBase storage engine also allows for
    updates and deletes, something not possible with regular HDFS files. Impala
    will later have the capability to perform updates/deletes/inserts directly
    to HBase.

    If you have data which is added and modified rapidly, you should go with
    HBase. If you have data which doesn't change often, but is added
    frequently, HDFS might be a better choice.

    Here's some more about HBase: http://hbase.apache.org/

    I hope this clears things up.



    On Fri, Jul 5, 2013 at 4:13 AM, nancy jean <nancyj...@gmail.com<javascript:>
    wrote:
    Hi,

    According to my understanding, When we create any table in HBase, It is
    stored in HDFS as H-file. So HBase needs HDFS to store the data.
    Here in Impala, How is HBase used as storage? I read somewhere that
    Impala uses either HDFS or HBase for storage of data?
    Can anyone clear this up for me?

    Thanks in advance.


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com

  • Darren Lo at Jul 8, 2013 at 5:14 pm
    HBase uses HDFS to store its information.

    On Sun, Jul 7, 2013 at 10:12 PM, nancy jean wrote:

    So, If I use just HBase as storage in Impala, where are all the data
    getting stored? In which part of HBase is it getting stored? Or is it like
    HBase will need HDFS to store the data?

    On Saturday, July 6, 2013 3:46:12 AM UTC+5:30, Ricky Saltzer wrote:

    Hi Nancy -

    While HBase uses HDFS to store a table's HFile, it's still a different
    storage manager as far as Impala is concerned. HBase provides a different
    set of APIs to get at data, for example, you can get a single record back
    very quickly in HBase.

    HBase also supports a notion range scans, Impala can take advantage of
    this to minimize the amount of data it has to analyze. Impala makes use of
    HBase's APIs to get at the data you're asking for, allowing you to analyze
    data within HBase using SQL. The HBase storage engine also allows for
    updates and deletes, something not possible with regular HDFS files. Impala
    will later have the capability to perform updates/deletes/inserts directly
    to HBase.

    If you have data which is added and modified rapidly, you should go with
    HBase. If you have data which doesn't change often, but is added
    frequently, HDFS might be a better choice.

    Here's some more about HBase: http://hbase.apache.**org/<http://hbase.apache.org/>

    I hope this clears things up.


    On Fri, Jul 5, 2013 at 4:13 AM, nancy jean wrote:

    Hi,

    According to my understanding, When we create any table in HBase, It is
    stored in HDFS as H-file. So HBase needs HDFS to store the data.
    Here in Impala, How is HBase used as storage? I read somewhere that
    Impala uses either HDFS or HBase for storage of data?
    Can anyone clear this up for me?

    Thanks in advance.


    --
    Ricky Saltzer
    Tools Developer
    http://www.cloudera.com


    --
    Thanks,
    Darren

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedJul 5, '13 at 10:16p
activeJul 8, '13 at 5:14p
posts3
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase