I've been running some tests on query throughput, and the results have been
different than I expected. In short, even a few concurrent queries really
slows down Impala.
I have a test query that takes roughly 1 second to complete. If I run this
query from 10 different parallel processes 10 times each (for 100 total
queries), the whole thing takes about 80 seconds to run. That means it's
not running much faster than simply running these queries sequentially.
Further more, the per query completion time spikes up to about 10 seconds
each. My setup is a 4 node cluster, and all queries are being issued to
the same impalad daemon (though presumably the resulting fragments are
being run elsewhere). iostat shows there's plenty of headroom on the
disks, and top says I have about 20% peak cpu use.
Since Impala was built as a faster version of hive, I'll understand if
multiple concurrent queries isn't really a case it's designed to handle.
But before I abandon impala as not suitable for my project, I want to make
sure this is expected behavior and not some sort of misconfiguration.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.