On Fri, Dec 28, 2012 at 5:01 AM, lyrebird1999 wrote:
According to Google's paper about Dremel, Google has a server tree(root
server, level 1 server, leverl 2 server,... leaf server). Each server except
the leaf servers do query re-writed for the server in the lower level, to
build a query execution tree.
But in Impala's architecture, each impalad has the query planner,
coordinator, execute engine component, they are in the same level.
How does Impala take advantages of query execution tree like Dremel.?
Is it the case that all the query re-write work are done by the Impalad who
receive query from client? If the answer is yes, how much would
the performance decline?
In Impala, the impalad process that receives the client request
creates a sequence of plan fragments and coordinates the execution of
those fragments on the worker nodes. The coordination part itself is
very light-weight, and having this reside on a single node does not
create a scalability problem.
When you're running an aggregation query, the aggregation itself is
distributed into a pre-aggregation step, which runs on every impalad
that's participating in the execution of the query, and a merge
aggregation step that runs only in the coordinator fragment. In
otherwise, this constitutes a two-level tree. In a future release,
Impala will also be able to distribute the merge step by
repartitioning the output of the pre-aggregation step (which would
create a 3-level tree).