|| at May 13, 2013 at 5:10 pm
There is certainly room for improving the HBase scan performance, and we
are planning to address that issue in the future.
As for the end-to-end response time of SQL queries over HBase using Impala:
Keep in mind that Impala uses a new runtime system build from scratch,
including pipelining of results, hash joins/aggregation, etc. So it still
quite possible that Impala outperforms Hive depending on your workload.
That said, I don't believe in speculation. I'd give Impala a chance on your
workload and let the numbers speak for themselves.
On Thu, May 9, 2013 at 8:21 PM, Anty Rao wrote:
Thanks for your reply Alex.
On Fri, May 10, 2013 at 10:32 AM, Alex Behm wrote:
correct, for a given query Impala uses a single thread per node to
perform an HBase scan.
So, the query performance for a hbase-based table is inefficient? In
hive, there is one task per region, and tasks can be executed concurrently.
In impala, there is one thread for scanning all regions reside on a region
server. The degrading will be huge IMHO.