[MapReduce-dev] How is MRv2 fundamentally changed?

Jie Li
Jan 24, 2012 at 1:12 am
Hi Mahadev,

Thanks, they are both very helpful to understand the architecture of YARN.

What we are looking for, is more of the difference at the task level.
Suppose a map task takes 10 minutes in Hadoop, then we have a model to
analyse what makes up the 10 minutes, e.g. reading from HDFS, invoking the
map function, writing to the buffer, partitioning, sorting and merging.
This model can be used to identify the bottleneck of the task execution and
suggest better configurations.

If we run MR jobs in YARN, can we use the same model to analyse the running
time of a task? One possible difference I've noticed so far is that the
shuffling has become a service of the node manager. Any other change
related to the map phase or reduce phase?

On Mon, Jan 16, 2012 at 4:32 PM, Mahadev Konar wrote:

Hi Jie,
You might want to read through:


for more information on the architecture. Itll help you understand the
major differences between the two.

On Mon, Jan 16, 2012 at 11:41 AM, Jie Li wrote:
Hi all,

As we know MRv2 (the MapReduce library in YARN) has changed
We have a cost model built for the MapReduce in Hadoop and are going to
migrate to MRv2. Can anyone give us a pointer to the fundamental
differences between them? Also, below are some of my understandings and
feel free to correct me.

1. JT has been replaced by a central RM and a per-application AM.
2. TT has been replaced by the NM and the task slots have been replaced by
the containers. The containers can be allocated dynamically thus both the
number and the memory size of the containers can vary on demand.
3. The shuffle service has become independent from the Map.


Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 5 | next ›