|
Harsh J |
at Sep 21, 2011 at 1:10 pm
|
⇧ |
| |
Praveenesh,
TaskTrackers run your jobs' tasks for you, not DataNodes directly. So
you can statically control loads on nodes by removing away
TaskTrackers from your cluster.
i.e, if you "service hadoop-0.20-tasktracker stop" or
"hadoop-daemon.sh stop tasktracker" on the specific nodes, jobs won't
run there anymore.
Is this what you're looking for?
(There are ways to achieve the exclusion dynamically, by writing a
scheduler, but hard to tell without knowing what you need
specifically, and why do you require it?)
On Wed, Sep 21, 2011 at 6:32 PM, praveenesh kumar wrote:
Is there any way that we can run a particular job in a hadoop on subset of
datanodes ?
My problem is I don't want to use all the nodes to run some job,
I am trying to make Job completion Vs No. of nodes graph for a particular
job.
One way to do is I can remove datanodes, and then see how much time the job
is taking.
Just for curiosity sake, want to know is there any other way possible to do
this, without removing datanodes.
I am afraid, if I remove datanodes, I can loose some data blocks that reside
on those machines as I have some files with replication = 1 ?
Thanks,
Praveenesh
--
Harsh J