|| at Dec 14, 2012 at 2:29 am
We have a doc section on "Testing Impala for High Performance
Configuration" that describes how to confirm that you are operating with
necessary performance configuration:https://ccp.cloudera.com/display/IMPALA10BETADOC/Configuring+Impala+for+Performance+without+Cloudera+Manager
I'd also recommend running a simple SELECT COUNT(*) query on one of your
larger tables and confirming that you're seeing approximately 100 MB/s per
disk. If not there is likely something not optimally configured.
The following blog has links to some external experiences others have
Really eager to your results.
On Thu, Dec 13, 2012 at 2:19 PM, Ricky Saltzer wrote:
Hey Romit -
So far we've seen great results when comparing queries in Hive to Impala.
I've personally seen 2x to 40x speed ups depending on the type of query,
which was pretty great. Hive queries which require multiple MapReduce
phases end up being very quick in Impala. We look forward to your results,
let us know if you have any questions during your testing :-).
On Thu, Dec 13, 2012 at 4:56 PM, Romit Singhai wrote:
I am working on comparing impala with hive on a cluster with 8 nodes
mostly from response time perspective.
I will really appreciate if people can share there experiences in this
regard. I will also publish my results to the group once I am done.