FAQ
[sorry for the double posting (to general), but I think this list is
the appropriate place for this message]

Hello,

I'm trying to setup hadoop on demand (HOD) on my cluster. I'm
currently unable to "allocate cluster". I'm starting hod with the
following command:

/usr/local/hadoop-0.20.2/hod/bin/hod -c
/usr/local/hadoop-0.20.2/hod/conf/hodrc -t
/b/01/vanw/hod/hadoop-0.20.2.tar.gz -o "allocate ~/hod 3"
--ringmaster.log-dir=/tmp -b 4

The job starts on the nodes and I see the ringmaster running on the
MotherSuperior. The ringmaster-main.log file is created and contains:

[2010-04-06 11:18:29,036] DEBUG/10 ringMaster:487 - getServiceAddr
service: <hodlib.GridServices.mapred.MapReduce instance at 0x12b42518>
[2010-04-06 11:18:29,038] DEBUG/10 ringMaster:504 - getServiceAddr
addr mapred: not found
[2010-04-06 10:47:43,183] DEBUG/10 ringMaster:479 - getServiceAddr name: hdfs
[2010-04-06 10:47:43,184] DEBUG/10 ringMaster:487 - getServiceAddr
service: <hodlib.GridServices.hdfs.Hdfs instance at 0x122d24d0>
[2010-04-06 10:47:43,186] DEBUG/10 ringMaster:504 - getServiceAddr
addr hdfs: not found

I don't see any associated processes running on the other 2 nodes in
the job.

The critical errors are as follows:

[2010-04-06 10:34:13,630] CRITICAL/50 hadoop:298 - Failed to retrieve
'hdfs' service address.
[2010-04-06 10:34:13,631] DEBUG/10 hadoop:631 - Cleaning up cluster id
238366.jman, as cluster could not be allocated.
[2010-04-06 10:34:13,632] DEBUG/10 hadoop:635 - Calling rm.stop()
[2010-04-06 10:34:13,639] DEBUG/10 hadoop:637 - Returning from rm.stop()
[2010-04-06 10:34:13,639] CRITICAL/50 hod:401 - Cannot allocate
cluster /b/01/vanw/hod
[2010-04-06 10:34:14,149] DEBUG/10 hod:597 - return code: 7

The contents of the hodrc file is:

[hod]
stream = True
java-home = /usr/local/jdk1.6.0_02
cluster = orange
cluster-factor = 1.8
xrs-port-range = 32768-65536
debug = 4
allocate-wait-time = 3600
temp-dir = /tmp/hod

[ringmaster]
register = True
stream = False
temp-dir = /tmp/hod
http-port-range = 8000-9000
work-dirs = /tmp/hod/1,/tmp/hod/2
xrs-port-range = 32768-65536
debug = 4

[hodring]
stream = False
temp-dir = /tmp/hod
register = True
java-home = /usr/local/jdk1.6.0_02
http-port-range = 8000-9000
xrs-port-range = 32768-65536
debug = 4

[resource_manager]
queue = dque
batch-home = /usr/local/torque-2.3.7
id = torque
env-vars =
HOD_PYTHON_HOME=/usr/local/python-2.5.5/bin/python

[gridservice-mapred]
external = False
tracker_port = 8030
info_port = 50080

[gridservice-hdfs]
external = False
fs_port = 8020
info_port = 50070


Some other useful information:
Linux 2.6.18-128.7.1.el5
Python 2.5.5
Twisted 10.0.0
zope 3.3.0
java version "1.6.0_02"
hadoop version 0.20.2



--
Kevin Van Workum, PhD
Sabalcore Computing Inc.
Run your code on 500 processors.
Sign up for a free trial account.
www.sabalcore.com
877-492-8027 ext. 11

Search Discussions

  • Boyu Zhang at Apr 8, 2010 at 6:24 pm
    Hi Kevin,

    I am having the same error, but my critical error is:

    [2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be
    allocated because of the following errors.
    Hodring at n0 failed with following errors:
    JobTracker failed to initialise

    Have you solved this? Thanks!

    Boyu

    On Tue, Apr 6, 2010 at 11:32 AM, Kevin Van Workum [sorry for the double posting (to general), but I think this list is
    the appropriate place for this message]

    Hello,

    I'm trying to setup hadoop on demand (HOD) on my cluster. I'm
    currently unable to "allocate cluster". I'm starting hod with the
    following command:

    /usr/local/hadoop-0.20.2/hod/bin/hod -c
    /usr/local/hadoop-0.20.2/hod/conf/hodrc -t
    /b/01/vanw/hod/hadoop-0.20.2.tar.gz -o "allocate ~/hod 3"
    --ringmaster.log-dir=/tmp -b 4

    The job starts on the nodes and I see the ringmaster running on the
    MotherSuperior. The ringmaster-main.log file is created and contains:

    [2010-04-06 11:18:29,036] DEBUG/10 ringMaster:487 - getServiceAddr
    service: <hodlib.GridServices.mapred.MapReduce instance at 0x12b42518>
    [2010-04-06 11:18:29,038] DEBUG/10 ringMaster:504 - getServiceAddr
    addr mapred: not found
    [2010-04-06 10:47:43,183] DEBUG/10 ringMaster:479 - getServiceAddr name:
    hdfs
    [2010-04-06 10:47:43,184] DEBUG/10 ringMaster:487 - getServiceAddr
    service: <hodlib.GridServices.hdfs.Hdfs instance at 0x122d24d0>
    [2010-04-06 10:47:43,186] DEBUG/10 ringMaster:504 - getServiceAddr
    addr hdfs: not found

    I don't see any associated processes running on the other 2 nodes in
    the job.

    The critical errors are as follows:

    [2010-04-06 10:34:13,630] CRITICAL/50 hadoop:298 - Failed to retrieve
    'hdfs' service address.
    [2010-04-06 10:34:13,631] DEBUG/10 hadoop:631 - Cleaning up cluster id
    238366.jman, as cluster could not be allocated.
    [2010-04-06 10:34:13,632] DEBUG/10 hadoop:635 - Calling rm.stop()
    [2010-04-06 10:34:13,639] DEBUG/10 hadoop:637 - Returning from rm.stop()
    [2010-04-06 10:34:13,639] CRITICAL/50 hod:401 - Cannot allocate
    cluster /b/01/vanw/hod
    [2010-04-06 10:34:14,149] DEBUG/10 hod:597 - return code: 7

    The contents of the hodrc file is:

    [hod]
    stream = True
    java-home = /usr/local/jdk1.6.0_02
    cluster = orange
    cluster-factor = 1.8
    xrs-port-range = 32768-65536
    debug = 4
    allocate-wait-time = 3600
    temp-dir = /tmp/hod

    [ringmaster]
    register = True
    stream = False
    temp-dir = /tmp/hod
    http-port-range = 8000-9000
    work-dirs = /tmp/hod/1,/tmp/hod/2
    xrs-port-range = 32768-65536
    debug = 4

    [hodring]
    stream = False
    temp-dir = /tmp/hod
    register = True
    java-home = /usr/local/jdk1.6.0_02
    http-port-range = 8000-9000
    xrs-port-range = 32768-65536
    debug = 4

    [resource_manager]
    queue = dque
    batch-home = /usr/local/torque-2.3.7
    id = torque
    env-vars =
    HOD_PYTHON_HOME=/usr/local/python-2.5.5/bin/python

    [gridservice-mapred]
    external = False
    tracker_port = 8030
    info_port = 50080

    [gridservice-hdfs]
    external = False
    fs_port = 8020
    info_port = 50070


    Some other useful information:
    Linux 2.6.18-128.7.1.el5
    Python 2.5.5
    Twisted 10.0.0
    zope 3.3.0
    java version "1.6.0_02"
    hadoop version 0.20.2



    --
    Kevin Van Workum, PhD
    Sabalcore Computing Inc.
    Run your code on 500 processors.
    Sign up for a free trial account.
    www.sabalcore.com
    877-492-8027 ext. 11
  • Kevin Van Workum at Apr 8, 2010 at 8:59 pm

    On Thu, Apr 8, 2010 at 2:23 PM, Boyu Zhang wrote:
    Hi Kevin,

    I am having the same error, but my critical error is:

    [2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be
    allocated because of the following errors.
    Hodring at n0 failed with following errors:
    JobTracker failed to initialise

    Have you solved this? Thanks!
    Yes, I was about to post my solution. In my case the issue was that
    the default log-dir is to use the "log" directory under the HOD
    installation. Since I didn't have permissions to write to this
    directory, the hdfs couldn't initailize. Setting "log-dir = logs" for
    [hod], [ringmaster], [hodring], [gridservice-mapred], and
    [gridservice-hdfs] in hodrc fixed the problem by writing the logs to
    the "logs" directory under the CWD.

    Also, I have managed to get HOD to use the hod.cluster setting from
    hodrc to set the node properties for the qsub command. I'm going to
    clean up my modifications and post it in the next day or two.

    Kevin
    Boyu
    On Tue, Apr 6, 2010 at 11:32 AM, Kevin Van Workum wrote:

    [sorry for the double posting (to general), but I think this list is
    the appropriate place for this message]

    Hello,

    I'm trying to setup hadoop on demand (HOD) on my cluster. I'm
    currently unable to "allocate cluster". I'm starting hod with the
    following command:

    /usr/local/hadoop-0.20.2/hod/bin/hod -c
    /usr/local/hadoop-0.20.2/hod/conf/hodrc -t
    /b/01/vanw/hod/hadoop-0.20.2.tar.gz -o "allocate ~/hod 3"
    --ringmaster.log-dir=/tmp -b 4

    The job starts on the nodes and I see the ringmaster running on the
    MotherSuperior. The ringmaster-main.log file is created and contains:

    [2010-04-06 11:18:29,036] DEBUG/10 ringMaster:487 - getServiceAddr
    service: <hodlib.GridServices.mapred.MapReduce instance at 0x12b42518>
    [2010-04-06 11:18:29,038] DEBUG/10 ringMaster:504 - getServiceAddr
    addr mapred: not found
    [2010-04-06 10:47:43,183] DEBUG/10 ringMaster:479 - getServiceAddr name:
    hdfs
    [2010-04-06 10:47:43,184] DEBUG/10 ringMaster:487 - getServiceAddr
    service: <hodlib.GridServices.hdfs.Hdfs instance at 0x122d24d0>
    [2010-04-06 10:47:43,186] DEBUG/10 ringMaster:504 - getServiceAddr
    addr hdfs: not found

    I don't see any associated processes running on the other 2 nodes in
    the job.

    The critical errors are as follows:

    [2010-04-06 10:34:13,630] CRITICAL/50 hadoop:298 - Failed to retrieve
    'hdfs' service address.
    [2010-04-06 10:34:13,631] DEBUG/10 hadoop:631 - Cleaning up cluster id
    238366.jman, as cluster could not be allocated.
    [2010-04-06 10:34:13,632] DEBUG/10 hadoop:635 - Calling rm.stop()
    [2010-04-06 10:34:13,639] DEBUG/10 hadoop:637 - Returning from rm.stop()
    [2010-04-06 10:34:13,639] CRITICAL/50 hod:401 - Cannot allocate
    cluster /b/01/vanw/hod
    [2010-04-06 10:34:14,149] DEBUG/10 hod:597 - return code: 7

    The contents of the hodrc file is:

    [hod]
    stream                          = True
    java-home                       = /usr/local/jdk1.6.0_02
    cluster                         = orange
    cluster-factor                  = 1.8
    xrs-port-range                  = 32768-65536
    debug                           = 4
    allocate-wait-time              = 3600
    temp-dir                        = /tmp/hod

    [ringmaster]
    register                        = True
    stream                          = False
    temp-dir                        = /tmp/hod
    http-port-range                 = 8000-9000
    work-dirs                       = /tmp/hod/1,/tmp/hod/2
    xrs-port-range                  = 32768-65536
    debug                           = 4

    [hodring]
    stream                          = False
    temp-dir                        = /tmp/hod
    register                        = True
    java-home                       = /usr/local/jdk1.6.0_02
    http-port-range                 = 8000-9000
    xrs-port-range                  = 32768-65536
    debug                           = 4

    [resource_manager]
    queue                           = dque
    batch-home                      = /usr/local/torque-2.3.7
    id                              = torque
    env-vars                       =
    HOD_PYTHON_HOME=/usr/local/python-2.5.5/bin/python

    [gridservice-mapred]
    external                        = False
    tracker_port                    = 8030
    info_port                       = 50080

    [gridservice-hdfs]
    external                        = False
    fs_port                         = 8020
    info_port                       = 50070


    Some other useful information:
    Linux 2.6.18-128.7.1.el5
    Python 2.5.5
    Twisted 10.0.0
    zope 3.3.0
    java version "1.6.0_02"
    hadoop version 0.20.2



    --
    Kevin Van Workum, PhD
    Sabalcore Computing Inc.
    Run your code on 500 processors.
    Sign up for a free trial account.
    www.sabalcore.com
    877-492-8027 ext. 11


    --
    Kevin Van Workum, PhD
    Sabalcore Computing Inc.
    Run your code on 500 processors.
    Sign up for a free trial account.
    www.sabalcore.com
    877-492-8027 ext. 11
  • Boyu Zhang at Apr 8, 2010 at 9:39 pm
    Thanks for the reply. I checked out my logs more and found out that
    sometimes the hdfs addr is the correct one.

    But in the jobtracker log, there is an error:

    file /data/mapredsys/zhang~~~/xxxx.info can only be replicated on 0 nodes
    instead of 1
    ...........................
    DFS is not ready...


    And when I check the file, the who dir is not there. And do you know how to
    check the namenode/datanode logs? I can't find them anywhere. Thanks a lot!

    Boyu
    On Thu, Apr 8, 2010 at 4:58 PM, Kevin Van Workum wrote:
    On Thu, Apr 8, 2010 at 2:23 PM, Boyu Zhang wrote:
    Hi Kevin,

    I am having the same error, but my critical error is:

    [2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be
    allocated because of the following errors.
    Hodring at n0 failed with following errors:
    JobTracker failed to initialise

    Have you solved this? Thanks!
    Yes, I was about to post my solution. In my case the issue was that
    the default log-dir is to use the "log" directory under the HOD
    installation. Since I didn't have permissions to write to this
    directory, the hdfs couldn't initailize. Setting "log-dir = logs" for
    [hod], [ringmaster], [hodring], [gridservice-mapred], and
    [gridservice-hdfs] in hodrc fixed the problem by writing the logs to
    the "logs" directory under the CWD.

    Also, I have managed to get HOD to use the hod.cluster setting from
    hodrc to set the node properties for the qsub command. I'm going to
    clean up my modifications and post it in the next day or two.

    Kevin

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 6, '10 at 3:32p
activeApr 8, '10 at 9:39p
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Boyu Zhang: 2 posts Kevin Van Workum: 2 posts

People

Translate

site design / logo © 2022 Grokbase