We did both so the performance is so confused.May be we did something
in configuration?
conf.zip is our impala configuration(/usr/lib/impala/conf,we install impala
with rpm), log.txt is the log of the query.

在 2013年1月24日星期四UTC+8上午11时54分09秒,Marcel Kornacker写道:
Did you follow the setup instructions on


If so, what does this line in the info log
I0123 09:26:30.049484 29410 simple-scheduler.cc:174] SimpleScheduler
locality percentage 100% (148 out of 148)
look like for you? My first guess would be that you're doing remote scans.


On Wed, Jan 23, 2013 at 7:07 PM, Feng Xu <wind...@gmail.com <javascript:>>
We do a select count(*) query on a table with 15m records and 3.2g bytes, it
costs 220s milewhile hive do the same qurey only cost 100s-. We find when we
do the query with impala, every impalad node`s net bandwidth was used
up(100m). The cost of net bandwidth is 1/4 of node when do the qurey with

We do the count query on a table wich 10m records and 400m bytes,it costs
25s which is little faster then hive. The query cost 1/4 net bandwidth of

I think the high net bandwidth cost does not make sence with select count.
The return data of query by each process node to coordinator node should be
little, then the net bandwidth cost is used by what?

5 nodes connected with a 100m switch, every node with amd a6-3650,16g
ram,4*1T disks.
node1: NameNode,ResourceManager,SecondaryNameNode,state_store
node2: DataNode,NodeManager,HMaster,HRegionServer,impalad
node3: DataNode,NodeManager,HRegionServer,impalad
node4: DataNode,NodeManager,HRegionServer,impalad
node5: DataNode,NodeManager,HRegionServer,hive,zookeeper,impalad


Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 1 | next ›
Discussion Overview
groupimpala-user @
postedJan 24, '13 at 6:03a
activeJan 24, '13 at 6:03a

1 user in discussion

Feng Xu: 1 post



site design / logo © 2022 Grokbase