FAQ
Dear all, I have a problem here.


HOD is good, and can manage a large virtual cluster on a huge physical
cluster. but the problem is, it doesnt apply more than one core for each
machine, and I have already recieved complaint from our admin!

Since Hadoop often starts more than one process on each machine, I
believe this feature is essential for many hadoop programs, and I guess HOD
should already have this feature, but I cant find it.

Can any one provide any ideas?

Song Liu

Search Discussions

  • Song Liu at Apr 15, 2010 at 3:19 pm
    Here is my configuration file

    [hod]
    stream = True
    java-home = /gpfs/cluster/cosc/sl9885/jre1.6.0_19/
    cluster = ALL
    cluster-factor = 1.8
    xrs-port-range = 32768-65536
    debug = 3
    allocate-wait-time = 3600
    temp-dir = /local/hod

    [ringmaster]
    register = True
    stream = False
    temp-dir = /local/sl9885
    http-port-range = 8000-9000
    work-dirs = /local/sl9885/1,/local/sl9885/2
    xrs-port-range = 32768-65536
    debug = 3

    [hodring]
    stream = False
    temp-dir = /local/sl9885
    register = True
    java-home = /gpfs/cluster/cosc/sl9885/jre1.6.0_19/
    http-port-range = 8000-9000
    xrs-port-range = 32768-65536
    debug = 3

    [resource_manager]
    queue = short
    batch-home = /cvos/shared/apps/torque/2.3.3/
    id = torque
    env-vars =
    HOD_PYTHON_HOME=/gpfs/cluster/cosc/sl9885/python/bin/python
    [gridservice-mapred]
    external = False
    pkgs = /gpfs/cluster/cosc/sl9885/hadoop-0.20.2
    tracker_port = 8030
    info_port = 50080

    [gridservice-hdfs]
    external = False
    pkgs = /gpfs/cluster/cosc/sl9885/hadoop-0.20.2
    fs_port = 8020
    info_port = 50070
    server-params = mapred.child.java.opts=-Xmx1024m

    On Thu, Apr 15, 2010 at 4:01 PM, Song Liu wrote:

    Dear all, I have a problem here.


    HOD is good, and can manage a large virtual cluster on a huge physical
    cluster. but the problem is, it doesnt apply more than one core for each
    machine, and I have already recieved complaint from our admin!

    Since Hadoop often starts more than one process on each machine, I
    believe this feature is essential for many hadoop programs, and I guess HOD
    should already have this feature, but I cant find it.

    Can any one provide any ideas?

    Song Liu
  • Hemanth Yamijala at Apr 15, 2010 at 6:01 pm
    Song,
    HOD is good, and can manage a large virtual cluster on a huge physical
    cluster. but the problem is, it doesnt apply more than one core for each
    machine, and I have already recieved complaint from our admin!
    I assume what you want is the Map/Reduce cluster that is started by
    HOD to use more than core on each machine. You can configure this in
    the gridservice-mapred section, setting the property server-params.
    For example, if you want to configure 4 map and 2 reduce slots per
    node, you can say:

    [gridservice-mapred]
    server-params =
    mapred.tasktracker.map.tasks.maximum=4,mapred.tasktracker.reduce.tasks.maximum=2,

    That said, since you have not specified any values for these
    parameters, Hadoop's defaults should be picked up, and they default to
    2 map and 2 reduce slots. Hence, it should already be using more than
    one core. Are you seeing that the JobTracker administration page is
    not showing multiple map and reduce slots per node ?
  • Song Liu at Apr 15, 2010 at 8:55 pm
    Hi, Thanks for the answer.

    I know it is the way to set the capacity of each node, however, I want to
    know, how can we make Torque manager that we will run more than 1 mapred
    tasks on each machine. Because if we dont do this, torque will assign other
    cores on this machine to other tasks, which may cause a competition for
    cores.

    Do you know how to solve this?

    Thanks.
    On Thu, Apr 15, 2010 at 7:01 PM, Hemanth Yamijala wrote:

    Song,
    HOD is good, and can manage a large virtual cluster on a huge physical
    cluster. but the problem is, it doesnt apply more than one core for each
    machine, and I have already recieved complaint from our admin!
    I assume what you want is the Map/Reduce cluster that is started by
    HOD to use more than core on each machine. You can configure this in
    the gridservice-mapred section, setting the property server-params.
    For example, if you want to configure 4 map and 2 reduce slots per
    node, you can say:

    [gridservice-mapred]
    server-params =

    mapred.tasktracker.map.tasks.maximum=4,mapred.tasktracker.reduce.tasks.maximum=2,

    That said, since you have not specified any values for these
    parameters, Hadoop's defaults should be picked up, and they default to
    2 map and 2 reduce slots. Hence, it should already be using more than
    one core. Are you seeing that the JobTracker administration page is
    not showing multiple map and reduce slots per node ?
  • Hemanth Yamijala at Apr 16, 2010 at 10:51 am
    Song,
    I know it is the way to set the capacity of each node, however, I want to
    know, how can we make Torque manager that we will run more than 1 mapred
    tasks on each machine. Because if we dont do this, torque will assign other
    cores on this machine to other tasks, which may cause a competition for
    cores.

    Do you know how to solve this?
    If I understand, what you want is that when a physical node is
    allocated via HOD by the Torque resource manager, you don't want that
    node to be shared by other jobs. Is that correct ?

    Looking on the web, I found that schedulers like Maui / Moab that are
    typically used with Torque allow for this. In particular, I thought
    this link: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2009-May/039949.html
    may be particularly useful. It talks about a NODEACCESSPOLICY
    configuration in Maui that is described here:
    http://www.clusterresources.com/products/maui/docs/5.3nodeaccess.shtml.
    Setting this policy to SINGLEJOB seems to solve your problem.

    Can you check if this meets your requirement ?
  • Song Liu at Apr 21, 2010 at 4:47 pm
    Hi, Thanks Hemanth!

    I guess you are very close to my point. I mean whether we can find a way
    to set the qsub parameter "ppn"?

    ppn controls how many processors are allocated by a specified job. for
    example:

    A normal torque qsub will be excuted like qsub -l nodes=3;ppn=4

    However, all the torque job submitted by HOD is like this

    qsub -l nodes=3

    Here are the qstat result

    qstat -f 179245

    snip ----------
    Resource_List.nodect = 3
    Resource_List.nodes = 3
    Resource_List.walltime = 05:00:00
    snip ----------

    For normal job:

    Resource_List.nodect = 3
    Resource_List.nodes = 3:ppn=4
    Resource_List.walltime = 280:00:00

    I guess we lost ppn parameter when submitting this job. And I believe this
    is quite important in most cases we configure jobs.

    Song
    On Fri, Apr 16, 2010 at 11:50 AM, Hemanth Yamijala wrote:

    Song,
    I know it is the way to set the capacity of each node, however, I want to
    know, how can we make Torque manager that we will run more than 1 mapred
    tasks on each machine. Because if we dont do this, torque will assign other
    cores on this machine to other tasks, which may cause a competition for
    cores.

    Do you know how to solve this?
    If I understand, what you want is that when a physical node is
    allocated via HOD by the Torque resource manager, you don't want that
    node to be shared by other jobs. Is that correct ?

    Looking on the web, I found that schedulers like Maui / Moab that are
    typically used with Torque allow for this. In particular, I thought
    this link:
    https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2009-May/039949.html
    may be particularly useful. It talks about a NODEACCESSPOLICY
    configuration in Maui that is described here:
    http://www.clusterresources.com/products/maui/docs/5.3nodeaccess.shtml.
    Setting this policy to SINGLEJOB seems to solve your problem.

    Can you check if this meets your requirement ?
  • Hemanth Yamijala at Apr 21, 2010 at 6:05 pm
    Song,
    I guess you are very close to my point. I mean whether we can find a way
    to set the qsub parameter "ppn"?
    From what I could see in the HOD code, it appears you cannot override
    the ppn value with HOD. You could look at
    src/contrib/hod/hodlib/NodePools/torque.py, and specifically the
    method process_qsub_attributes. In this method, the nodes parameter is
    getting set to the value defined by the -n parameter passed to HOD.
    Unless I am missing something, this seems to be the final value that
    can be specified for the nodes parameter to the qsub command.

    The method I suggested seems like a workaround to circumvent this
    limitation. In the MAUI documentation, I found it is possible to set
    the specific parameter per Torque job as well. And there is an option
    in HOD to specify such additional parameters, using the key
    resource_manager.attrs.

    I know this is not an ideal answer for you. But ATM this is all I can think of.

    Thanks
    Hemanth
  • Song Liu at Apr 21, 2010 at 7:04 pm
    Thanks Hemanth!
    As you said, I made a slight change in the file torque.py at the line 41:

    #change the ppn at arguments list
    for index, item in enumerate(argList):
    if item.startswith("nodes"):
    argList[index] = argList[index]+":ppn=4"
    print argList[index]

    and it works fine now.

    But I think it doesnt solve the problem elegantly, and I really think
    someone should make a patch on this issue?

    Many Thanks.

    Song Liu
    On Wed, Apr 21, 2010 at 7:05 PM, Hemanth Yamijala wrote:

    Song,
    I guess you are very close to my point. I mean whether we can find a way
    to set the qsub parameter "ppn"?
    From what I could see in the HOD code, it appears you cannot override
    the ppn value with HOD. You could look at
    src/contrib/hod/hodlib/NodePools/torque.py, and specifically the
    method process_qsub_attributes. In this method, the nodes parameter is
    getting set to the value defined by the -n parameter passed to HOD.
    Unless I am missing something, this seems to be the final value that
    can be specified for the nodes parameter to the qsub command.

    The method I suggested seems like a workaround to circumvent this
    limitation. In the MAUI documentation, I found it is possible to set
    the specific parameter per Torque job as well. And there is an option
    in HOD to specify such additional parameters, using the key
    resource_manager.attrs.

    I know this is not an ideal answer for you. But ATM this is all I can think
    of.

    Thanks
    Hemanth

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 15, '10 at 3:02p
activeApr 21, '10 at 7:04p
posts8
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Song Liu: 5 posts Hemanth Yamijala: 3 posts

People

Translate

site design / logo © 2022 Grokbase