Thanks for the reply. I am currently trying to move all the blocks and its replicas, of all the input files only, to a specified location. That is, just before job start-up check for the input files' location and move its corresponding blocks and replicas to the desired/highly efficient data nodes, there by making sure only these nodes execute the job (I am assuming this because, I believe each block will be operated upon by the nearest available mapping process only).

And in your reply you had mentioned that some of the work should be initiated from the client, is it the JobClient class you are talking about?


--- On Fri, 2/19/10, Wang Xu wrote:

From: Wang Xu <gnawux@gmail.com>
Subject: Re: Question on job scheduling
To: common-dev@hadoop.apache.org
Date: Friday, February 19, 2010, 7:25 AM
On Thu, Feb 18, 2010 at 12:00 AM, arun kumar wrote:
My questions are:
1. Will such a change improve the performance? Considering the overhead caused by moving the data blocks.
In some special case, it might improve the performance, but it depends
on your application.
2. I believe I will have to start from the NameNode to move the blocks. If anyone can give me a brief explanation on the process to implement this or even sources to find information on this it would be very helpful.
I think some of the work might initiate from client. Could you
describe what you want to do in detail?
1 do you want to specify datanode to store special blocks, or only
want some blocks are located together?
2 do you want to specify the location of all the replicas of a block,
or only want to specify one of the replicas.

Wang Xu
Stephen Leacock  - "I detest life-insurance agents: they always argue
that I shall some day die, which is not so." -

Search Discussions

Discussion Posts


Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 4 | next ›
Discussion Overview
groupcommon-dev @
postedFeb 17, '10 at 6:28a
activeFeb 24, '10 at 2:28p



site design / logo © 2022 Grokbase