FAQ
Thanks for your reply Alex.
On Fri, May 10, 2013 at 10:32 AM, Alex Behm wrote:

correct, for a given query Impala uses a single thread per node to perform
an HBase scan.
So, the query performance for a hbase-based table is inefficient? In hive,
there is one task per region, and tasks can be executed concurrently. In
impala, there is one thread for scanning all regions reside on a region
server. The degrading will be huge IMHO.







--
Anty Rao

Search Discussions

  • Alex Behm at May 13, 2013 at 5:10 pm
    There is certainly room for improving the HBase scan performance, and we
    are planning to address that issue in the future.

    As for the end-to-end response time of SQL queries over HBase using Impala:
    Keep in mind that Impala uses a new runtime system build from scratch,
    including pipelining of results, hash joins/aggregation, etc. So it still
    quite possible that Impala outperforms Hive depending on your workload.
    That said, I don't believe in speculation. I'd give Impala a chance on your
    workload and let the numbers speak for themselves.

    Cheers,

    Alex


    On Thu, May 9, 2013 at 8:21 PM, Anty Rao wrote:

    Thanks for your reply Alex.
    On Fri, May 10, 2013 at 10:32 AM, Alex Behm wrote:

    correct, for a given query Impala uses a single thread per node to
    perform an HBase scan.
    So, the query performance for a hbase-based table is inefficient? In
    hive, there is one task per region, and tasks can be executed concurrently.
    In impala, there is one thread for scanning all regions reside on a region
    server. The degrading will be huge IMHO.







    --
    Anty Rao
  • Anty Rao at May 16, 2013 at 8:28 am
    Thx Alex.

    I have some suggestion about the way approach HBase in impala.Currently
    HBaseScanNode operator is dedicated for HBase only.whether we can go
    further, extract a new operator ,say JniScanNode to facilitate reading
    record via JNI;So,HBaseScanNode can be achieved by adding hbase-specific
    JNI wrapper into JniScanNode.In this way, you can easily add new file
    formats or data sources that accessed via JNI.



    On Tue, May 14, 2013 at 1:10 AM, Alex Behm wrote:

    There is certainly room for improving the HBase scan performance, and we
    are planning to address that issue in the future.

    As for the end-to-end response time of SQL queries over HBase using
    Impala: Keep in mind that Impala uses a new runtime system build from
    scratch, including pipelining of results, hash joins/aggregation, etc. So
    it still quite possible that Impala outperforms Hive depending on your
    workload.
    That said, I don't believe in speculation. I'd give Impala a chance on
    your workload and let the numbers speak for themselves.

    Cheers,

    Alex


    On Thu, May 9, 2013 at 8:21 PM, Anty Rao wrote:

    Thanks for your reply Alex.
    On Fri, May 10, 2013 at 10:32 AM, Alex Behm wrote:

    correct, for a given query Impala uses a single thread per node to
    perform an HBase scan.
    So, the query performance for a hbase-based table is inefficient? In
    hive, there is one task per region, and tasks can be executed concurrently.
    In impala, there is one thread for scanning all regions reside on a region
    server. The degrading will be huge IMHO.







    --
    Anty Rao

    --
    Anty Rao
  • Alex Behm at May 16, 2013 at 10:41 pm
    Anty, thanks for your suggestion.

    We'll consider adding abstractions once we know of a concrete use case. In
    general, we want to avoid going through JNI for data scans because of the
    additional cost.

    Cheers,

    Alex

    On Thu, May 16, 2013 at 1:28 AM, Anty Rao wrote:

    Thx Alex.

    I have some suggestion about the way approach HBase in impala.Currently
    HBaseScanNode operator is dedicated for HBase only.whether we can go
    further, extract a new operator ,say JniScanNode to facilitate reading
    record via JNI;So,HBaseScanNode can be achieved by adding hbase-specific
    JNI wrapper into JniScanNode.In this way, you can easily add new file
    formats or data sources that accessed via JNI.



    On Tue, May 14, 2013 at 1:10 AM, Alex Behm wrote:

    There is certainly room for improving the HBase scan performance, and we
    are planning to address that issue in the future.

    As for the end-to-end response time of SQL queries over HBase using
    Impala: Keep in mind that Impala uses a new runtime system build from
    scratch, including pipelining of results, hash joins/aggregation, etc. So
    it still quite possible that Impala outperforms Hive depending on your
    workload.
    That said, I don't believe in speculation. I'd give Impala a chance on
    your workload and let the numbers speak for themselves.

    Cheers,

    Alex


    On Thu, May 9, 2013 at 8:21 PM, Anty Rao wrote:

    Thanks for your reply Alex.
    On Fri, May 10, 2013 at 10:32 AM, Alex Behm wrote:

    correct, for a given query Impala uses a single thread per node to
    perform an HBase scan.
    So, the query performance for a hbase-based table is inefficient? In
    hive, there is one task per region, and tasks can be executed concurrently.
    In impala, there is one thread for scanning all regions reside on a region
    server. The degrading will be huge IMHO.







    --
    Anty Rao

    --
    Anty Rao

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupimpala-user @
categorieshadoop
postedMay 10, '13 at 3:21a
activeMay 16, '13 at 10:41p
posts4
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Alex Behm: 2 posts Anty Rao: 2 posts

People

Translate

site design / logo © 2022 Grokbase