|| at Dec 16, 2011 at 12:09 am
Hi Steve, there is no simple way to just limit the number of nodes as it
would involve moving the data: You want to have the 3 replicas on the
5,10,20 nodes, correct?
You could potentially just stop the TTs on the extra nodes, but your job(s)
will likely have to fetch the data from remote nodes and will run slower
than it/they actually would in the corresponding cluster. Shutting down
the DNs will cause unnecessary replication and redistribution of data
(unless your data are small and you can afford to reload the data or to
reformat the HDFS each time).
Moving the computations to data is a big part of MR and by restricting the
job to a subset of nodes one is likely to skew the results.
Dr. Alex Kozlov
On Thu, Dec 15, 2011 at 2:03 PM, Steve Lewis wrote:
I am reporting on performance of a hadoop task on a cluster with about 50
nodes. I would like to be able to report performance on clusters of 5,10,20
changing int current cluster. Is there a way to limit the number of nodes
used by a job and if so how?
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033