FAQ
Hi all,

I am configuring a cluster to use yarn (using cloudera manager free v4.5.2)
and I am trying to run the Pi example and had no luck until now.
I am running the example with the following command:
sudo -u hdfs hadoop jar
/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
pi -jt mydevnamenode:8032 -fs hdfs://mynamenode:8020 10 10

I am getting the following error:

Number of Maps = 10
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
13/05/10 15:07:28 ERROR security.UserGroupInformation:
PriviledgedActionException as:hdfs (auth:SIMPLE)
cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown
rpc kind RPC_WRITABLE
org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown rpc
kind RPC_WRITABLE


When trying the same example with MRv1 it works correctly.
Mapred-site-xml contains the mapreduce.framework.name set to yarn and
yarn-site.xml has the resource manager properties set. It has all been set
automatically by CDM and I haven't changed anything.

Any hints what could be causing the issue?
Thanks in advance,

Robert

Search Discussions

  • Herman Chen at May 10, 2013 at 10:50 pm
    Hi Robert,

    The default client configs under /etc/hadoop/conf use MR1. So your job
    failed because you were asking the MR1 client to talk to YARN port.

    The YARN client configs are also deployed via alternatives. You can list
    all deployed configs by:
    # alternatives --display hadoop-conf

    The output should be something like:
    hadoop-conf - status is auto.
      link currently points to /etc/hadoop/conf.cloudera.MAPREDUCE-1
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty -
    priority 10
    /etc/hadoop/conf.cloudera.HDFS-1 - priority 90
    /etc/hadoop/conf.cloudera.MAPREDUCE-1 - priority 92
    /etc/hadoop/conf.cloudera.YARN-1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.MAPREDUCE-1.

    And you can manually override it to YARN by:
    # alternatives --set hadoop-conf /etc/hadoop/conf.cloudera.YARN-1

    Then your command (no need for -jt -fs) should be able to run with YARN.

    Lastly, if you want to use YARN on a regular basis, the priority 91 can be
    configured in CM to be something higher. Then you don't need to manually
    set alternatives. Hope this helps!

    Herman


    On Fri, May 10, 2013 at 6:19 AM, Robert wrote:

    Hi all,

    I am configuring a cluster to use yarn (using cloudera manager free
    v4.5.2) and I am trying to run the Pi example and had no luck until now.
    I am running the example with the following command:
    sudo -u hdfs hadoop jar
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
    pi -jt mydevnamenode:8032 -fs hdfs://mynamenode:8020 10 10

    I am getting the following error:

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/10 15:07:28 ERROR security.UserGroupInformation:
    PriviledgedActionException as:hdfs (auth:SIMPLE)
    cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown
    rpc kind RPC_WRITABLE
    org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown rpc
    kind RPC_WRITABLE


    When trying the same example with MRv1 it works correctly.
    Mapred-site-xml contains the mapreduce.framework.name set to yarn and
    yarn-site.xml has the resource manager properties set. It has all been set
    automatically by CDM and I haven't changed anything.

    Any hints what could be causing the issue?
    Thanks in advance,

    Robert
  • Robert at May 13, 2013 at 8:12 am
    Hi Herman,

    this is the output I get when running the command you suggested:

    # alternatives --display hadoop-conf
    hadoop-conf - status is auto.
      link currently points to
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty -
    priority 10
    Current `best' version is
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty.

    I installed everything using CDM and I am surprised YARN is not showing up
    in the output of the above command.
    The YARN service is installed in the cluster and it shows good health. I am
    also able to set Oozie to use Yarn, although still not managed to get it to
    run properly.
    Any hints?
    Cheers,

    Robert

    On Saturday, 11 May 2013 00:50:30 UTC+2, Herman Chen wrote:

    Hi Robert,

    The default client configs under /etc/hadoop/conf use MR1. So your job
    failed because you were asking the MR1 client to talk to YARN port.

    The YARN client configs are also deployed via alternatives. You can list
    all deployed configs by:
    # alternatives --display hadoop-conf

    The output should be something like:
    hadoop-conf - status is auto.
    link currently points to /etc/hadoop/conf.cloudera.MAPREDUCE-1
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty -
    priority 10
    /etc/hadoop/conf.cloudera.HDFS-1 - priority 90
    /etc/hadoop/conf.cloudera.MAPREDUCE-1 - priority 92
    /etc/hadoop/conf.cloudera.YARN-1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.MAPREDUCE-1.

    And you can manually override it to YARN by:
    # alternatives --set hadoop-conf /etc/hadoop/conf.cloudera.YARN-1

    Then your command (no need for -jt -fs) should be able to run with YARN.

    Lastly, if you want to use YARN on a regular basis, the priority 91 can be
    configured in CM to be something higher. Then you don't need to manually
    set alternatives. Hope this helps!

    Herman



    On Fri, May 10, 2013 at 6:19 AM, Robert <robert...@gmail.com <javascript:>
    wrote:
    Hi all,

    I am configuring a cluster to use yarn (using cloudera manager free
    v4.5.2) and I am trying to run the Pi example and had no luck until now.
    I am running the example with the following command:
    sudo -u hdfs hadoop jar
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
    pi -jt mydevnamenode:8032 -fs hdfs://mynamenode:8020 10 10

    I am getting the following error:

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/10 15:07:28 ERROR security.UserGroupInformation:
    PriviledgedActionException as:hdfs (auth:SIMPLE)
    cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown
    rpc kind RPC_WRITABLE
    org.apache.hadoop.ipc.RemoteException(java.io.IOException): Unknown rpc
    kind RPC_WRITABLE


    When trying the same example with MRv1 it works correctly.
    Mapred-site-xml contains the mapreduce.framework.name set to yarn and
    yarn-site.xml has the resource manager properties set. It has all been set
    automatically by CDM and I haven't changed anything.

    Any hints what could be causing the issue?
    Thanks in advance,

    Robert
  • Herman Chen at May 13, 2013 at 2:57 pm
    Hi Robert,

    It looks like you need to Deploy Client Configuration, one of the
    cluster-level action. The wizard would have automatically run the command
    if you used that to setup the cluster.

    Herman

    On Mon, May 13, 2013 at 1:12 AM, Robert wrote:

    Hi Herman,

    this is the output I get when running the command you suggested:

    # alternatives --display hadoop-conf
    hadoop-conf - status is auto.
    link currently points to
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty -
    priority 10
    Current `best' version is
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty.

    I installed everything using CDM and I am surprised YARN is not showing up
    in the output of the above command.
    The YARN service is installed in the cluster and it shows good health. I
    am also able to set Oozie to use Yarn, although still not managed to get it
    to run properly.
    Any hints?
    Cheers,

    Robert

    On Saturday, 11 May 2013 00:50:30 UTC+2, Herman Chen wrote:

    Hi Robert,

    The default client configs under /etc/hadoop/conf use MR1. So your job
    failed because you were asking the MR1 client to talk to YARN port.

    The YARN client configs are also deployed via alternatives. You can list
    all deployed configs by:
    # alternatives --display hadoop-conf

    The output should be something like:
    hadoop-conf - status is auto.
    link currently points to /etc/hadoop/conf.cloudera.**MAPREDUCE-1
    /opt/cloudera/parcels/CDH-4.2.**1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    - priority 10
    /etc/hadoop/conf.cloudera.**HDFS-1 - priority 90
    /etc/hadoop/conf.cloudera.**MAPREDUCE-1 - priority 92
    /etc/hadoop/conf.cloudera.**YARN-1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.**MAPREDUCE-1.

    And you can manually override it to YARN by:
    # alternatives --set hadoop-conf /etc/hadoop/conf.cloudera.**YARN-1

    Then your command (no need for -jt -fs) should be able to run with YARN.

    Lastly, if you want to use YARN on a regular basis, the priority 91 can
    be configured in CM to be something higher. Then you don't need to manually
    set alternatives. Hope this helps!

    Herman


    On Fri, May 10, 2013 at 6:19 AM, Robert wrote:

    Hi all,

    I am configuring a cluster to use yarn (using cloudera manager free
    v4.5.2) and I am trying to run the Pi example and had no luck until now.
    I am running the example with the following command:
    sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/lib/hadoop-**mapreduce/hadoop-mapreduce-**examples.jar
    pi -jt mydevnamenode:8032 -fs hdfs://mynamenode:8020 10 10

    I am getting the following error:

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/10 15:07:28 ERROR security.UserGroupInformation:
    PriviledgedActionException as:hdfs (auth:SIMPLE)
    cause:org.apache.hadoop.ipc.**RemoteException(java.io.**IOException):
    Unknown rpc kind RPC_WRITABLE
    org.apache.hadoop.ipc.**RemoteException(java.io.**IOException): Unknown
    rpc kind RPC_WRITABLE


    When trying the same example with MRv1 it works correctly.
    Mapred-site-xml contains the mapreduce.framework.name set to yarn and
    yarn-site.xml has the resource manager properties set. It has all been set
    automatically by CDM and I haven't changed anything.

    Any hints what could be causing the issue?
    Thanks in advance,

    Robert
  • Robert at May 13, 2013 at 3:15 pm
    Hi Herman,

    I've just done that on the main page of CDM and I also restarted the
    cluster by selecting the option in CDM.
    The output of the command "alternatives --display hadoop-conf" is still the
    same, i.e.:

    hadoop-conf - status is auto.
      link currently points to
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty -
    priority 10
    Current `best' version is
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty.

    I did use the wizard to set-up the cluster so I am kind of stuck as I
    really don't understand what is going wrong.
    Cheers,

    Robert
    On Monday, 13 May 2013 16:57:31 UTC+2, Herman Chen wrote:

    Hi Robert,

    It looks like you need to Deploy Client Configuration, one of the
    cluster-level action. The wizard would have automatically run the command
    if you used that to setup the cluster.

    Herman


    On Mon, May 13, 2013 at 1:12 AM, Robert <robert...@gmail.com <javascript:>
    wrote:
    Hi Herman,

    this is the output I get when running the command you suggested:

    # alternatives --display hadoop-conf
    hadoop-conf - status is auto.
    link currently points to
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty -
    priority 10
    Current `best' version is
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty.

    I installed everything using CDM and I am surprised YARN is not showing
    up in the output of the above command.
    The YARN service is installed in the cluster and it shows good health. I
    am also able to set Oozie to use Yarn, although still not managed to get it
    to run properly.
    Any hints?
    Cheers,

    Robert

    On Saturday, 11 May 2013 00:50:30 UTC+2, Herman Chen wrote:

    Hi Robert,

    The default client configs under /etc/hadoop/conf use MR1. So your job
    failed because you were asking the MR1 client to talk to YARN port.

    The YARN client configs are also deployed via alternatives. You can
    list all deployed configs by:
    # alternatives --display hadoop-conf

    The output should be something like:
    hadoop-conf - status is auto.
    link currently points to /etc/hadoop/conf.cloudera.**MAPREDUCE-1
    /opt/cloudera/parcels/CDH-4.2.**1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    - priority 10
    /etc/hadoop/conf.cloudera.**HDFS-1 - priority 90
    /etc/hadoop/conf.cloudera.**MAPREDUCE-1 - priority 92
    /etc/hadoop/conf.cloudera.**YARN-1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.**MAPREDUCE-1.

    And you can manually override it to YARN by:
    # alternatives --set hadoop-conf /etc/hadoop/conf.cloudera.**YARN-1

    Then your command (no need for -jt -fs) should be able to run with YARN.

    Lastly, if you want to use YARN on a regular basis, the priority 91 can
    be configured in CM to be something higher. Then you don't need to manually
    set alternatives. Hope this helps!

    Herman


    On Fri, May 10, 2013 at 6:19 AM, Robert wrote:

    Hi all,

    I am configuring a cluster to use yarn (using cloudera manager free
    v4.5.2) and I am trying to run the Pi example and had no luck until now.
    I am running the example with the following command:
    sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/lib/hadoop-**mapreduce/hadoop-mapreduce-**examples.jar
    pi -jt mydevnamenode:8032 -fs hdfs://mynamenode:8020 10 10

    I am getting the following error:

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/10 15:07:28 ERROR security.UserGroupInformation:
    PriviledgedActionException as:hdfs (auth:SIMPLE)
    cause:org.apache.hadoop.ipc.**RemoteException(java.io.**IOException):
    Unknown rpc kind RPC_WRITABLE
    org.apache.hadoop.ipc.**RemoteException(java.io.**IOException):
    Unknown rpc kind RPC_WRITABLE


    When trying the same example with MRv1 it works correctly.
    Mapred-site-xml contains the mapreduce.framework.name set to yarn and
    yarn-site.xml has the resource manager properties set. It has all been set
    automatically by CDM and I haven't changed anything.

    Any hints what could be causing the issue?
    Thanks in advance,

    Robert
  • Herman Chen at May 13, 2013 at 5:07 pm
    Hi Robert,

    Deploy Client Configurations will deploy configs for a particular service
    to a host as long as that host contains any role belonging to the service.
    Does the host that you're issuing the command from contain any MapReduce or
    YARN roles? If you prefer using a host without the roles, you can add to
    the host a Gateway role for that service, which will ensure that client
    configurations are deployed. Hope this helps.

    Herman

    On Mon, May 13, 2013 at 8:15 AM, Robert wrote:

    Hi Herman,

    I've just done that on the main page of CDM and I also restarted the
    cluster by selecting the option in CDM.
    The output of the command "alternatives --display hadoop-conf" is still
    the same, i.e.:

    hadoop-conf - status is auto.
    link currently points to
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty -
    priority 10
    Current `best' version is
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty.

    I did use the wizard to set-up the cluster so I am kind of stuck as I
    really don't understand what is going wrong.
    Cheers,

    Robert
    On Monday, 13 May 2013 16:57:31 UTC+2, Herman Chen wrote:

    Hi Robert,

    It looks like you need to Deploy Client Configuration, one of the
    cluster-level action. The wizard would have automatically run the command
    if you used that to setup the cluster.

    Herman

    On Mon, May 13, 2013 at 1:12 AM, Robert wrote:

    Hi Herman,

    this is the output I get when running the command you suggested:

    # alternatives --display hadoop-conf
    hadoop-conf - status is auto.
    link currently points to /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    /opt/cloudera/parcels/CDH-4.2.**1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    - priority 10
    Current `best' version is /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty.

    I installed everything using CDM and I am surprised YARN is not showing
    up in the output of the above command.
    The YARN service is installed in the cluster and it shows good health. I
    am also able to set Oozie to use Yarn, although still not managed to get it
    to run properly.
    Any hints?
    Cheers,

    Robert

    On Saturday, 11 May 2013 00:50:30 UTC+2, Herman Chen wrote:

    Hi Robert,

    The default client configs under /etc/hadoop/conf use MR1. So your job
    failed because you were asking the MR1 client to talk to YARN port.

    The YARN client configs are also deployed via alternatives. You can
    list all deployed configs by:
    # alternatives --display hadoop-conf

    The output should be something like:
    hadoop-conf - status is auto.
    link currently points to /etc/hadoop/conf.cloudera.**MAPR**EDUCE-1
    /opt/cloudera/parcels/CDH-4.2.****1-1.cdh4.2.1.p0.5/etc/hadoop/**c**onf.empty
    - priority 10
    /etc/hadoop/conf.cloudera.**HDFS**-1 - priority 90
    /etc/hadoop/conf.cloudera.**MAPR**EDUCE-1 - priority 92
    /etc/hadoop/conf.cloudera.**YARN**-1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.**MAPR**EDUCE-1.

    And you can manually override it to YARN by:
    # alternatives --set hadoop-conf /etc/hadoop/conf.cloudera.**YARN**-1

    Then your command (no need for -jt -fs) should be able to run with YARN.

    Lastly, if you want to use YARN on a regular basis, the priority 91 can
    be configured in CM to be something higher. Then you don't need to manually
    set alternatives. Hope this helps!

    Herman


    On Fri, May 10, 2013 at 6:19 AM, Robert wrote:

    Hi all,

    I am configuring a cluster to use yarn (using cloudera manager free
    v4.5.2) and I am trying to run the Pi example and had no luck until now.
    I am running the example with the following command:
    sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-4.2.****
    1-1.cdh4.2.1.p0.5/lib/hadoop-**m**apreduce/hadoop-mapreduce-**exam**ples.jar
    pi -jt mydevnamenode:8032 -fs hdfs://mynamenode:8020 10 10

    I am getting the following error:

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/10 15:07:28 ERROR security.UserGroupInformation:
    PriviledgedActionException as:hdfs (auth:SIMPLE)
    cause:org.apache.hadoop.ipc.**Re**moteException(java.io.**IOExcept**ion):
    Unknown rpc kind RPC_WRITABLE
    org.apache.hadoop.ipc.**RemoteEx**ception(java.io.**IOException):
    Unknown rpc kind RPC_WRITABLE


    When trying the same example with MRv1 it works correctly.
    Mapred-site-xml contains the mapreduce.framework.name set to yarn and
    yarn-site.xml has the resource manager properties set. It has all been set
    automatically by CDM and I haven't changed anything.

    Any hints what could be causing the issue?
    Thanks in advance,

    Robert
  • Robert at May 14, 2013 at 8:10 am
    Hi Herman,

    I think now I got 1 step further. Basically the FrontEnd machine I am
    running the command from had no roles for Yarn service. I added it as a
    gateway and deployed again the client configuration.
    Now the command # alternatives --display hadoop-conf gives me the following:

    hadoop-conf - status is auto.
      link currently points to /etc/hadoop/conf.cloudera.yarn1
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty -
    priority 10
    /etc/hadoop/conf.cloudera.yarn1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.yarn1.

    Now when running the Pi example I am still not getting it to complete:

    sudo -u hdfs hadoop jar
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
    pi 10 10

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/14 10:02:27 INFO service.AbstractService:
    Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
    13/05/14 10:02:27 INFO service.AbstractService:
    Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
    13/05/14 10:02:27 INFO input.FileInputFormat: Total input paths to process
    : 10
    13/05/14 10:02:27 INFO mapreduce.JobSubmitter: number of splits:10
    13/05/14 10:02:27 WARN conf.Configuration: mapred.jar is deprecated.
    Instead, use mapreduce.job.jar
    13/05/14 10:02:27 WARN conf.Configuration:
    mapred.map.tasks.speculative.execution is deprecated. Instead, use
    mapreduce.map.speculative
    13/05/14 10:02:27 WARN conf.Configuration: mapred.reduce.tasks is
    deprecated. Instead, use mapreduce.job.reduces
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.value.class is
    deprecated. Instead, use mapreduce.job.output.value.class
    13/05/14 10:02:27 WARN conf.Configuration:
    mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
    mapreduce.reduce.speculative
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.map.class is
    deprecated. Instead, use mapreduce.job.map.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.job.name is deprecated.
    Instead, use mapreduce.job.name
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.reduce.class is
    deprecated. Instead, use mapreduce.job.reduce.class
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.inputformat.class is
    deprecated. Instead, use mapreduce.job.inputformat.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.input.dir is deprecated.
    Instead, use mapreduce.input.fileinputformat.inputdir
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.dir is deprecated.
    Instead, use mapreduce.output.fileoutputformat.outputdir
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.outputformat.class is
    deprecated. Instead, use mapreduce.job.outputformat.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.map.tasks is deprecated.
    Instead, use mapreduce.job.maps
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.key.class is
    deprecated. Instead, use mapreduce.job.output.key.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.working.dir is
    deprecated. Instead, use mapreduce.job.working.dir
    13/05/14 10:02:27 INFO mapreduce.JobSubmitter: Submitting tokens for job:
    job_1368517819509_0003
    13/05/14 10:02:27 INFO client.YarnClientImpl: Submitted application
    application_1368517819509_0003 to ResourceManager at
    bi4td-devnamenode01.hi.inet/10.95.108.251:8032
    13/05/14 10:02:27 INFO mapreduce.Job: The url to track the job:
    http://mynamenode01:8088/proxy/application_1368517819509_0003/
    13/05/14 10:02:27 INFO mapreduce.Job: Running job: job_1368517819509_0003

    There is no sign of completion (the system sits there showing the last line
    with the job_id).
    If I go tho the url to track the job, I get the following error:

    The requested application does not appear to be running yet, and has not
    set a tracking URL.

    It would be great if you could suggest what else is missing in my
    configuration. My end goal is to have an Oozie job running but I took one
    step back as debugging the Pi example is probably less tricky.
    The cluster I am setting up is created with 9 machines: 1 FrontEnd (where
    we have oozie client, hue server, etc...), 1 namenode (which is also the
    resourcemanager), 6 datanodes and 1 host with the DBs. Should I add all the
    machines as gateways for a particular MapReduce service? Note that the
    oozie commands or mapreduce examples are always initiated from the FrontEnd
    machine.
    Thanks in advance.
    Cheers,

    Robert

    On Monday, 13 May 2013 19:06:39 UTC+2, Herman Chen wrote:

    Hi Robert,

    Deploy Client Configurations will deploy configs for a particular service
    to a host as long as that host contains any role belonging to the service.
    Does the host that you're issuing the command from contain any MapReduce or
    YARN roles? If you prefer using a host without the roles, you can add to
    the host a Gateway role for that service, which will ensure that client
    configurations are deployed. Hope this helps.

    Herman


    On Mon, May 13, 2013 at 8:15 AM, Robert <robert...@gmail.com <javascript:>
    wrote:
    Hi Herman,

    I've just done that on the main page of CDM and I also restarted the
    cluster by selecting the option in CDM.
    The output of the command "alternatives --display hadoop-conf" is still
    the same, i.e.:

    hadoop-conf - status is auto.
    link currently points to
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty -
    priority 10
    Current `best' version is
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty.

    I did use the wizard to set-up the cluster so I am kind of stuck as I
    really don't understand what is going wrong.
    Cheers,

    Robert
    On Monday, 13 May 2013 16:57:31 UTC+2, Herman Chen wrote:

    Hi Robert,

    It looks like you need to Deploy Client Configuration, one of the
    cluster-level action. The wizard would have automatically run the command
    if you used that to setup the cluster.

    Herman

    On Mon, May 13, 2013 at 1:12 AM, Robert wrote:

    Hi Herman,

    this is the output I get when running the command you suggested:

    # alternatives --display hadoop-conf
    hadoop-conf - status is auto.
    link currently points to /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    /opt/cloudera/parcels/CDH-4.2.**1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    - priority 10
    Current `best' version is /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty.

    I installed everything using CDM and I am surprised YARN is not showing
    up in the output of the above command.
    The YARN service is installed in the cluster and it shows good health.
    I am also able to set Oozie to use Yarn, although still not managed to get
    it to run properly.
    Any hints?
    Cheers,

    Robert

    On Saturday, 11 May 2013 00:50:30 UTC+2, Herman Chen wrote:

    Hi Robert,

    The default client configs under /etc/hadoop/conf use MR1. So your job
    failed because you were asking the MR1 client to talk to YARN port.

    The YARN client configs are also deployed via alternatives. You can
    list all deployed configs by:
    # alternatives --display hadoop-conf

    The output should be something like:
    hadoop-conf - status is auto.
    link currently points to /etc/hadoop/conf.cloudera.**MAPR**EDUCE-1
    /opt/cloudera/parcels/CDH-4.2.****1-1.cdh4.2.1.p0.5/etc/hadoop/**c**onf.empty
    - priority 10
    /etc/hadoop/conf.cloudera.**HDFS**-1 - priority 90
    /etc/hadoop/conf.cloudera.**MAPR**EDUCE-1 - priority 92
    /etc/hadoop/conf.cloudera.**YARN**-1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.**MAPR**EDUCE-1.

    And you can manually override it to YARN by:
    # alternatives --set hadoop-conf /etc/hadoop/conf.cloudera.**YARN**-1

    Then your command (no need for -jt -fs) should be able to run with
    YARN.

    Lastly, if you want to use YARN on a regular basis, the priority 91
    can be configured in CM to be something higher. Then you don't need to
    manually set alternatives. Hope this helps!

    Herman


    On Fri, May 10, 2013 at 6:19 AM, Robert wrote:

    Hi all,

    I am configuring a cluster to use yarn (using cloudera manager free
    v4.5.2) and I am trying to run the Pi example and had no luck until now.
    I am running the example with the following command:
    sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-4.2.****
    1-1.cdh4.2.1.p0.5/lib/hadoop-**m**apreduce/hadoop-mapreduce-**exam**ples.jar
    pi -jt mydevnamenode:8032 -fs hdfs://mynamenode:8020 10 10

    I am getting the following error:

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/10 15:07:28 ERROR security.UserGroupInformation:
    PriviledgedActionException as:hdfs (auth:SIMPLE)
    cause:org.apache.hadoop.ipc.**Re**moteException(java.io.**IOExcept**ion):
    Unknown rpc kind RPC_WRITABLE
    org.apache.hadoop.ipc.**RemoteEx**ception(java.io.**IOException):
    Unknown rpc kind RPC_WRITABLE


    When trying the same example with MRv1 it works correctly.
    Mapred-site-xml contains the mapreduce.framework.name set to yarn
    and yarn-site.xml has the resource manager properties set. It has all been
    set automatically by CDM and I haven't changed anything.

    Any hints what could be causing the issue?
    Thanks in advance,

    Robert
  • Robert at May 14, 2013 at 1:41 pm
    Hi again,

    the last 3 lines of the log are actually the following:

    13/05/14 10:02:27 INFO client.YarnClientImpl: Submitted application
    application_1368517819509_0003 to ResourceManager
    at mynamenode/10.95.108.251:8032
    13/05/14 10:02:27 INFO mapreduce.Job: The url to track the job:
    http://mynamenode:8088/proxy/application_1368517819509_0003/
    13/05/14 10:02:27 INFO mapreduce.Job: Running job: job_1368517819509_0003

    I am wondering if the log which says "Submitted application
    application_1368517819509_0003 to ResourceManager at
    mynamenode/10.95.108.251:8032" is actually showing where the error is. I
    mean, the fact that is showing the FQDN and the IP is it purely informative
    or does it try to use as resourcemanager something like
    "mynamenode/10.95.108.251:8032"?
    Thanks in advance,

    Robert
    On Tuesday, 14 May 2013 10:10:18 UTC+2, Robert wrote:

    Hi Herman,

    I think now I got 1 step further. Basically the FrontEnd machine I am
    running the command from had no roles for Yarn service. I added it as a
    gateway and deployed again the client configuration.
    Now the command # alternatives --display hadoop-conf gives me the
    following:

    hadoop-conf - status is auto.
    link currently points to /etc/hadoop/conf.cloudera.yarn1
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty -
    priority 10
    /etc/hadoop/conf.cloudera.yarn1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.yarn1.

    Now when running the Pi example I am still not getting it to complete:

    sudo -u hdfs hadoop jar
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
    pi 10 10

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/14 10:02:27 INFO service.AbstractService:
    Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
    13/05/14 10:02:27 INFO service.AbstractService:
    Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
    13/05/14 10:02:27 INFO input.FileInputFormat: Total input paths to process
    : 10
    13/05/14 10:02:27 INFO mapreduce.JobSubmitter: number of splits:10
    13/05/14 10:02:27 WARN conf.Configuration: mapred.jar is deprecated.
    Instead, use mapreduce.job.jar
    13/05/14 10:02:27 WARN conf.Configuration:
    mapred.map.tasks.speculative.execution is deprecated. Instead, use
    mapreduce.map.speculative
    13/05/14 10:02:27 WARN conf.Configuration: mapred.reduce.tasks is
    deprecated. Instead, use mapreduce.job.reduces
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.value.class is
    deprecated. Instead, use mapreduce.job.output.value.class
    13/05/14 10:02:27 WARN conf.Configuration:
    mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
    mapreduce.reduce.speculative
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.map.class is
    deprecated. Instead, use mapreduce.job.map.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.job.name is deprecated.
    Instead, use mapreduce.job.name
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.reduce.class is
    deprecated. Instead, use mapreduce.job.reduce.class
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.inputformat.class is
    deprecated. Instead, use mapreduce.job.inputformat.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.input.dir is deprecated.
    Instead, use mapreduce.input.fileinputformat.inputdir
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.dir is
    deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.outputformat.class is
    deprecated. Instead, use mapreduce.job.outputformat.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.map.tasks is deprecated.
    Instead, use mapreduce.job.maps
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.key.class is
    deprecated. Instead, use mapreduce.job.output.key.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.working.dir is
    deprecated. Instead, use mapreduce.job.working.dir
    13/05/14 10:02:27 INFO mapreduce.JobSubmitter: Submitting tokens for job:
    job_1368517819509_0003
    13/05/14 10:02:27 INFO client.YarnClientImpl: Submitted application
    application_1368517819509_0003 to ResourceManager at
    bi4td-devnamenode01.hi.inet/10.95.108.251:8032
    13/05/14 10:02:27 INFO mapreduce.Job: The url to track the job:
    http://mynamenode01:8088/proxy/application_1368517819509_0003/
    13/05/14 10:02:27 INFO mapreduce.Job: Running job: job_1368517819509_0003

    There is no sign of completion (the system sits there showing the last
    line with the job_id).
    If I go tho the url to track the job, I get the following error:

    The requested application does not appear to be running yet, and has not
    set a tracking URL.

    It would be great if you could suggest what else is missing in my
    configuration. My end goal is to have an Oozie job running but I took one
    step back as debugging the Pi example is probably less tricky.
    The cluster I am setting up is created with 9 machines: 1 FrontEnd (where
    we have oozie client, hue server, etc...), 1 namenode (which is also the
    resourcemanager), 6 datanodes and 1 host with the DBs. Should I add all the
    machines as gateways for a particular MapReduce service? Note that the
    oozie commands or mapreduce examples are always initiated from the FrontEnd
    machine.
    Thanks in advance.
    Cheers,

    Robert

    On Monday, 13 May 2013 19:06:39 UTC+2, Herman Chen wrote:

    Hi Robert,

    Deploy Client Configurations will deploy configs for a particular service
    to a host as long as that host contains any role belonging to the service.
    Does the host that you're issuing the command from contain any MapReduce or
    YARN roles? If you prefer using a host without the roles, you can add to
    the host a Gateway role for that service, which will ensure that client
    configurations are deployed. Hope this helps.

    Herman

    On Mon, May 13, 2013 at 8:15 AM, Robert wrote:

    Hi Herman,

    I've just done that on the main page of CDM and I also restarted the
    cluster by selecting the option in CDM.
    The output of the command "alternatives --display hadoop-conf" is still
    the same, i.e.:

    hadoop-conf - status is auto.
    link currently points to
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty -
    priority 10
    Current `best' version is
    /opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/etc/hadoop/conf.empty.

    I did use the wizard to set-up the cluster so I am kind of stuck as I
    really don't understand what is going wrong.
    Cheers,

    Robert
    On Monday, 13 May 2013 16:57:31 UTC+2, Herman Chen wrote:

    Hi Robert,

    It looks like you need to Deploy Client Configuration, one of the
    cluster-level action. The wizard would have automatically run the command
    if you used that to setup the cluster.

    Herman

    On Mon, May 13, 2013 at 1:12 AM, Robert wrote:

    Hi Herman,

    this is the output I get when running the command you suggested:

    # alternatives --display hadoop-conf
    hadoop-conf - status is auto.
    link currently points to /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    /opt/cloudera/parcels/CDH-4.2.**1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    - priority 10
    Current `best' version is /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty.

    I installed everything using CDM and I am surprised YARN is not
    showing up in the output of the above command.
    The YARN service is installed in the cluster and it shows good health.
    I am also able to set Oozie to use Yarn, although still not managed to get
    it to run properly.
    Any hints?
    Cheers,

    Robert

    On Saturday, 11 May 2013 00:50:30 UTC+2, Herman Chen wrote:

    Hi Robert,

    The default client configs under /etc/hadoop/conf use MR1. So your
    job failed because you were asking the MR1 client to talk to YARN port.

    The YARN client configs are also deployed via alternatives. You can
    list all deployed configs by:
    # alternatives --display hadoop-conf

    The output should be something like:
    hadoop-conf - status is auto.
    link currently points to /etc/hadoop/conf.cloudera.**MAPR**EDUCE-1
    /opt/cloudera/parcels/CDH-4.2.****1-1.cdh4.2.1.p0.5/etc/hadoop/**c**onf.empty
    - priority 10
    /etc/hadoop/conf.cloudera.**HDFS**-1 - priority 90
    /etc/hadoop/conf.cloudera.**MAPR**EDUCE-1 - priority 92
    /etc/hadoop/conf.cloudera.**YARN**-1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.**MAPR**EDUCE-1.

    And you can manually override it to YARN by:
    # alternatives --set hadoop-conf /etc/hadoop/conf.cloudera.**YARN**-1

    Then your command (no need for -jt -fs) should be able to run with
    YARN.

    Lastly, if you want to use YARN on a regular basis, the priority 91
    can be configured in CM to be something higher. Then you don't need to
    manually set alternatives. Hope this helps!

    Herman


    On Fri, May 10, 2013 at 6:19 AM, Robert wrote:

    Hi all,

    I am configuring a cluster to use yarn (using cloudera manager free
    v4.5.2) and I am trying to run the Pi example and had no luck until now.
    I am running the example with the following command:
    sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-4.2.****
    1-1.cdh4.2.1.p0.5/lib/hadoop-**m**apreduce/hadoop-mapreduce-**exam**ples.jar
    pi -jt mydevnamenode:8032 -fs hdfs://mynamenode:8020 10 10

    I am getting the following error:

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/10 15:07:28 ERROR security.UserGroupInformation:
    PriviledgedActionException as:hdfs (auth:SIMPLE)
    cause:org.apache.hadoop.ipc.**Re**moteException(java.io.**IOExcept**ion):
    Unknown rpc kind RPC_WRITABLE
    org.apache.hadoop.ipc.**RemoteEx**ception(java.io.**IOException):
    Unknown rpc kind RPC_WRITABLE


    When trying the same example with MRv1 it works correctly.
    Mapred-site-xml contains the mapreduce.framework.name set to yarn
    and yarn-site.xml has the resource manager properties set. It has all been
    set automatically by CDM and I haven't changed anything.

    Any hints what could be causing the issue?
    Thanks in advance,

    Robert
  • Herman Chen at May 15, 2013 at 11:47 pm
    Hi Robert,

    Sounds like you should add to your FrontEnd host the Gateway role for each
    service. That way you will have the client configurations available for
    whichever service you need to work with.

    So it looks like you have successfully submitted the job to YARN, but the
    job isn't making progress for some reason. The message "mynamenode/
    10.95.108.251:8032" is not the issue, since I see that on my successful job
    runs also. To debug you would have to dig into YARN logs or web UI for
    useful bits of information. One cause I have experienced for job stalling
    is having insufficient memory to allocate even the Application Manager,
    which by default requires 1.5GB. Good luck!

    Herman

    On Tue, May 14, 2013 at 6:41 AM, Robert wrote:

    Hi again,

    the last 3 lines of the log are actually the following:

    13/05/14 10:02:27 INFO client.YarnClientImpl: Submitted application
    application_1368517819509_0003 to ResourceManager at mynamenode/
    10.95.108.251:8032
    13/05/14 10:02:27 INFO mapreduce.Job: The url to track the job:
    http://mynamenode:8088/proxy/application_1368517819509_0003/
    13/05/14 10:02:27 INFO mapreduce.Job: Running job: job_1368517819509_0003

    I am wondering if the log which says "Submitted application
    application_1368517819509_0003 to ResourceManager at mynamenode/
    10.95.108.251:8032" is actually showing where the error is. I mean, the
    fact that is showing the FQDN and the IP is it purely informative or does
    it try to use as resourcemanager something like "mynamenode/
    10.95.108.251:8032"?
    Thanks in advance,

    Robert
    On Tuesday, 14 May 2013 10:10:18 UTC+2, Robert wrote:

    Hi Herman,

    I think now I got 1 step further. Basically the FrontEnd machine I am
    running the command from had no roles for Yarn service. I added it as a
    gateway and deployed again the client configuration.
    Now the command # alternatives --display hadoop-conf gives me the
    following:

    hadoop-conf - status is auto.
    link currently points to /etc/hadoop/conf.cloudera.**yarn1
    /opt/cloudera/parcels/CDH-4.2.**1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    - priority 10
    /etc/hadoop/conf.cloudera.**yarn1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.**yarn1.

    Now when running the Pi example I am still not getting it to complete:

    sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/lib/hadoop-**mapreduce/hadoop-mapreduce-**examples.jar
    pi 10 10

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/14 10:02:27 INFO service.AbstractService: Service:org.apache.hadoop.
    **yarn.client.YarnClientImpl is inited.
    13/05/14 10:02:27 INFO service.AbstractService: Service:org.apache.hadoop.
    **yarn.client.YarnClientImpl is started.
    13/05/14 10:02:27 INFO input.FileInputFormat: Total input paths to
    process : 10
    13/05/14 10:02:27 INFO mapreduce.JobSubmitter: number of splits:10
    13/05/14 10:02:27 WARN conf.Configuration: mapred.jar is deprecated.
    Instead, use mapreduce.job.jar
    13/05/14 10:02:27 WARN conf.Configuration: mapred.map.tasks.speculative.*
    *execution is deprecated. Instead, use mapreduce.map.speculative
    13/05/14 10:02:27 WARN conf.Configuration: mapred.reduce.tasks is
    deprecated. Instead, use mapreduce.job.reduces
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.value.class is
    deprecated. Instead, use mapreduce.job.output.value.**class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.reduce.tasks.**speculative.execution
    is deprecated. Instead, use mapreduce.reduce.speculative
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.map.class is
    deprecated. Instead, use mapreduce.job.map.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.job.name is
    deprecated. Instead, use mapreduce.job.name
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.reduce.class is
    deprecated. Instead, use mapreduce.job.reduce.class
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.inputformat.class is
    deprecated. Instead, use mapreduce.job.inputformat.**class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.input.dir is
    deprecated. Instead, use mapreduce.input.**fileinputformat.inputdir
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.dir is
    deprecated. Instead, use mapreduce.output.**fileoutputformat.outputdir
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.outputformat.class
    is deprecated. Instead, use mapreduce.job.outputformat.**class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.map.tasks is
    deprecated. Instead, use mapreduce.job.maps
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.key.class is
    deprecated. Instead, use mapreduce.job.output.key.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.working.dir is
    deprecated. Instead, use mapreduce.job.working.dir
    13/05/14 10:02:27 INFO mapreduce.JobSubmitter: Submitting tokens for job:
    job_1368517819509_0003
    13/05/14 10:02:27 INFO client.YarnClientImpl: Submitted application
    application_1368517819509_0003 to ResourceManager at
    bi4td-devnamenode01.hi.inet/10**.95.108.251:8032<http://10.95.108.251:8032>
    13/05/14 10:02:27 INFO mapreduce.Job: The url to track the job:
    http://mynamenode01:8088/**proxy/application_**1368517819509_0003/<http://mynamenode01:8088/proxy/application_1368517819509_0003/>
    13/05/14 10:02:27 INFO mapreduce.Job: Running job: job_1368517819509_0003

    There is no sign of completion (the system sits there showing the last
    line with the job_id).
    If I go tho the url to track the job, I get the following error:

    The requested application does not appear to be running yet, and has not
    set a tracking URL.

    It would be great if you could suggest what else is missing in my
    configuration. My end goal is to have an Oozie job running but I took one
    step back as debugging the Pi example is probably less tricky.
    The cluster I am setting up is created with 9 machines: 1 FrontEnd (where
    we have oozie client, hue server, etc...), 1 namenode (which is also the
    resourcemanager), 6 datanodes and 1 host with the DBs. Should I add all the
    machines as gateways for a particular MapReduce service? Note that the
    oozie commands or mapreduce examples are always initiated from the FrontEnd
    machine.
    Thanks in advance.
    Cheers,

    Robert

    On Monday, 13 May 2013 19:06:39 UTC+2, Herman Chen wrote:

    Hi Robert,

    Deploy Client Configurations will deploy configs for a particular
    service to a host as long as that host contains any role belonging to the
    service. Does the host that you're issuing the command from contain any
    MapReduce or YARN roles? If you prefer using a host without the roles, you
    can add to the host a Gateway role for that service, which will ensure that
    client configurations are deployed. Hope this helps.

    Herman

    On Mon, May 13, 2013 at 8:15 AM, Robert wrote:

    Hi Herman,

    I've just done that on the main page of CDM and I also restarted the
    cluster by selecting the option in CDM.
    The output of the command "alternatives --display hadoop-conf" is
    still the same, i.e.:

    hadoop-conf - status is auto.
    link currently points to /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    /opt/cloudera/parcels/CDH-4.2.**1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    - priority 10
    Current `best' version is /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty.

    I did use the wizard to set-up the cluster so I am kind of stuck as I
    really don't understand what is going wrong.
    Cheers,

    Robert
    On Monday, 13 May 2013 16:57:31 UTC+2, Herman Chen wrote:

    Hi Robert,

    It looks like you need to Deploy Client Configuration, one of the
    cluster-level action. The wizard would have automatically run the command
    if you used that to setup the cluster.

    Herman

    On Mon, May 13, 2013 at 1:12 AM, Robert wrote:

    Hi Herman,

    this is the output I get when running the command you suggested:

    # alternatives --display hadoop-conf
    hadoop-conf - status is auto.
    link currently points to /opt/cloudera/parcels/CDH-4.2.****
    1-1.cdh4.2.1.p0.5/etc/hadoop/**c**onf.empty
    /opt/cloudera/parcels/CDH-4.2.****1-1.cdh4.2.1.p0.5/etc/hadoop/**c**onf.empty
    - priority 10
    Current `best' version is /opt/cloudera/parcels/CDH-4.2.****
    1-1.cdh4.2.1.p0.5/etc/hadoop/**c**onf.empty.

    I installed everything using CDM and I am surprised YARN is not
    showing up in the output of the above command.
    The YARN service is installed in the cluster and it shows good
    health. I am also able to set Oozie to use Yarn, although still not managed
    to get it to run properly.
    Any hints?
    Cheers,

    Robert

    On Saturday, 11 May 2013 00:50:30 UTC+2, Herman Chen wrote:

    Hi Robert,

    The default client configs under /etc/hadoop/conf use MR1. So your
    job failed because you were asking the MR1 client to talk to YARN port.

    The YARN client configs are also deployed via alternatives. You can
    list all deployed configs by:
    # alternatives --display hadoop-conf

    The output should be something like:
    hadoop-conf - status is auto.
    link currently points to /etc/hadoop/conf.cloudera.**MAPR****
    EDUCE-1
    /opt/cloudera/parcels/CDH-4.2.******1-1.cdh4.2.1.p0.5/etc/hadoop/**c
    ****onf.empty - priority 10
    /etc/hadoop/conf.cloudera.**HDFS****-1 - priority 90
    /etc/hadoop/conf.cloudera.**MAPR****EDUCE-1 - priority 92
    /etc/hadoop/conf.cloudera.**YARN****-1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.**MAPR****
    EDUCE-1.

    And you can manually override it to YARN by:
    # alternatives --set hadoop-conf /etc/hadoop/conf.cloudera.**YARN***
    *-1

    Then your command (no need for -jt -fs) should be able to run with
    YARN.

    Lastly, if you want to use YARN on a regular basis, the priority 91
    can be configured in CM to be something higher. Then you don't need to
    manually set alternatives. Hope this helps!

    Herman


    On Fri, May 10, 2013 at 6:19 AM, Robert wrote:

    Hi all,

    I am configuring a cluster to use yarn (using cloudera manager free
    v4.5.2) and I am trying to run the Pi example and had no luck until now.
    I am running the example with the following command:
    sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-4.2.******
    1-1.cdh4.2.1.p0.5/lib/hadoop-**m****apreduce/hadoop-mapreduce-**
    exam****ples.jar pi -jt mydevnamenode:8032 -fs
    hdfs://mynamenode:8020 10 10

    I am getting the following error:

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/10 15:07:28 ERROR security.UserGroupInformation:
    PriviledgedActionException as:hdfs (auth:SIMPLE)
    cause:org.apache.hadoop.ipc.**Re****moteException(java.io.**
    IOExcept****ion): Unknown rpc kind RPC_WRITABLE
    org.apache.hadoop.ipc.**RemoteEx****ception(java.io.**IOException):
    Unknown rpc kind RPC_WRITABLE


    When trying the same example with MRv1 it works correctly.
    Mapred-site-xml contains the mapreduce.framework.name set to yarn
    and yarn-site.xml has the resource manager properties set. It has all been
    set automatically by CDM and I haven't changed anything.

    Any hints what could be causing the issue?
    Thanks in advance,

    Robert
  • Robert at May 16, 2013 at 9:45 am
    Hi Herman,

    I've added the YARN gateway role for the FE machine and also for all the
    other nodes (1 resource manager and 6 nodemanagers)
    Then I've run first the PI example with MR1 and it works fine. Then I've
    changed the priority for YARN to 93 so that the example runs with Yarn.
    I've checked the yarn.app.mapreduce.am.resource.mb and it is set to 1.5GB.

    The output of the resource manager log in /var/log/hadoop-yarn/is:

    2013-05-16 10:46:28,123 INFO
    org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated
    new applicationId: 1
    2013-05-16 10:46:28,786 INFO
    org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
    with id 1 submitted by user hdfs
    2013-05-16 10:46:28,789 INFO
    org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hdfs
    IP=10.95.108.245 OPERATION=Submit Application Request
      TARGET=ClientRMService RESULT=SUCCESS
      APPID=application_1368693860919_0001
    2013-05-16 10:46:28,817 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
    application_1368693860919_0001 State change from NEW to SUBMITTED
    2013-05-16 10:46:28,818 INFO
    org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
    Registering appattempt_1368693860919_0001_000001
    2013-05-16 10:46:28,819 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
    appattempt_1368693860919_0001_000001 State change from NEW to SUBMITTED
    2013-05-16 10:46:28,836 INFO
    org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
    Application Submission: application_1368693860919_0001 from hdfs, currently
    active: 1
    2013-05-16 10:46:28,839 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
    appattempt_1368693860919_0001_000001 State change from SUBMITTED to
    SCHEDULED
    2013-05-16 10:46:28,839 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
    application_1368693860919_0001 State change from SUBMITTED to ACCEPTED

    Is there any other log that I should look at which can give me a hint of
    what is going wrong? Still the tracking URL shows "
    The requested application does not appear to be running yet, and has not
    set a tracking URL." and the job sits there forever...

    In the WebUI of the resource manager (screenshot attached) does not seem to
    give more info about what is going on.
    Thanks again for your help.
    Kind regards,

    Robert



    On Thursday, 16 May 2013 01:46:55 UTC+2, Herman Chen wrote:

    Hi Robert,

    Sounds like you should add to your FrontEnd host the Gateway role for each
    service. That way you will have the client configurations available for
    whichever service you need to work with.

    So it looks like you have successfully submitted the job to YARN, but the
    job isn't making progress for some reason. The message "mynamenode/
    10.95.108.251:8032" is not the issue, since I see that on my successful
    job runs also. To debug you would have to dig into YARN logs or web UI for
    useful bits of information. One cause I have experienced for job stalling
    is having insufficient memory to allocate even the Application Manager,
    which by default requires 1.5GB. Good luck!

    Herman


    On Tue, May 14, 2013 at 6:41 AM, Robert <robert...@gmail.com <javascript:>
    wrote:
    Hi again,

    the last 3 lines of the log are actually the following:

    13/05/14 10:02:27 INFO client.YarnClientImpl: Submitted application
    application_1368517819509_0003 to ResourceManager at mynamenode/
    10.95.108.251:8032
    13/05/14 10:02:27 INFO mapreduce.Job: The url to track the job:
    http://mynamenode:8088/proxy/application_1368517819509_0003/
    13/05/14 10:02:27 INFO mapreduce.Job: Running job: job_1368517819509_0003

    I am wondering if the log which says "Submitted application
    application_1368517819509_0003 to ResourceManager at mynamenode/
    10.95.108.251:8032" is actually showing where the error is. I mean, the
    fact that is showing the FQDN and the IP is it purely informative or does
    it try to use as resourcemanager something like "mynamenode/
    10.95.108.251:8032"?
    Thanks in advance,

    Robert
    On Tuesday, 14 May 2013 10:10:18 UTC+2, Robert wrote:

    Hi Herman,

    I think now I got 1 step further. Basically the FrontEnd machine I am
    running the command from had no roles for Yarn service. I added it as a
    gateway and deployed again the client configuration.
    Now the command # alternatives --display hadoop-conf gives me the
    following:

    hadoop-conf - status is auto.
    link currently points to /etc/hadoop/conf.cloudera.**yarn1
    /opt/cloudera/parcels/CDH-4.2.**1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    - priority 10
    /etc/hadoop/conf.cloudera.**yarn1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.**yarn1.

    Now when running the Pi example I am still not getting it to complete:

    sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/lib/hadoop-**mapreduce/hadoop-mapreduce-**examples.jar
    pi 10 10

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/14 10:02:27 INFO service.AbstractService:
    Service:org.apache.hadoop.**yarn.client.YarnClientImpl is inited.
    13/05/14 10:02:27 INFO service.AbstractService:
    Service:org.apache.hadoop.**yarn.client.YarnClientImpl is started.
    13/05/14 10:02:27 INFO input.FileInputFormat: Total input paths to
    process : 10
    13/05/14 10:02:27 INFO mapreduce.JobSubmitter: number of splits:10
    13/05/14 10:02:27 WARN conf.Configuration: mapred.jar is deprecated.
    Instead, use mapreduce.job.jar
    13/05/14 10:02:27 WARN conf.Configuration: mapred.map.tasks.speculative.
    **execution is deprecated. Instead, use mapreduce.map.speculative
    13/05/14 10:02:27 WARN conf.Configuration: mapred.reduce.tasks is
    deprecated. Instead, use mapreduce.job.reduces
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.value.class is
    deprecated. Instead, use mapreduce.job.output.value.**class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.reduce.tasks.**speculative.execution
    is deprecated. Instead, use mapreduce.reduce.speculative
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.map.class is
    deprecated. Instead, use mapreduce.job.map.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.job.name is
    deprecated. Instead, use mapreduce.job.name
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.reduce.class is
    deprecated. Instead, use mapreduce.job.reduce.class
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.inputformat.class
    is deprecated. Instead, use mapreduce.job.inputformat.**class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.input.dir is
    deprecated. Instead, use mapreduce.input.**fileinputformat.inputdir
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.dir is
    deprecated. Instead, use mapreduce.output.**fileoutputformat.outputdir
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.outputformat.class
    is deprecated. Instead, use mapreduce.job.outputformat.**class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.map.tasks is
    deprecated. Instead, use mapreduce.job.maps
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.key.class is
    deprecated. Instead, use mapreduce.job.output.key.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.working.dir is
    deprecated. Instead, use mapreduce.job.working.dir
    13/05/14 10:02:27 INFO mapreduce.JobSubmitter: Submitting tokens for
    job: job_1368517819509_0003
    13/05/14 10:02:27 INFO client.YarnClientImpl: Submitted application
    application_1368517819509_0003 to ResourceManager at
    bi4td-devnamenode01.hi.inet/10**.95.108.251:8032<http://10.95.108.251:8032>
    13/05/14 10:02:27 INFO mapreduce.Job: The url to track the job:
    http://mynamenode01:8088/**proxy/application_**1368517819509_0003/<http://mynamenode01:8088/proxy/application_1368517819509_0003/>
    13/05/14 10:02:27 INFO mapreduce.Job: Running job: job_1368517819509_0003

    There is no sign of completion (the system sits there showing the last
    line with the job_id).
    If I go tho the url to track the job, I get the following error:

    The requested application does not appear to be running yet, and has not
    set a tracking URL.

    It would be great if you could suggest what else is missing in my
    configuration. My end goal is to have an Oozie job running but I took one
    step back as debugging the Pi example is probably less tricky.
    The cluster I am setting up is created with 9 machines: 1 FrontEnd
    (where we have oozie client, hue server, etc...), 1 namenode (which is also
    the resourcemanager), 6 datanodes and 1 host with the DBs. Should I add all
    the machines as gateways for a particular MapReduce service? Note that the
    oozie commands or mapreduce examples are always initiated from the FrontEnd
    machine.
    Thanks in advance.
    Cheers,

    Robert

    On Monday, 13 May 2013 19:06:39 UTC+2, Herman Chen wrote:

    Hi Robert,

    Deploy Client Configurations will deploy configs for a particular
    service to a host as long as that host contains any role belonging to the
    service. Does the host that you're issuing the command from contain any
    MapReduce or YARN roles? If you prefer using a host without the roles, you
    can add to the host a Gateway role for that service, which will ensure that
    client configurations are deployed. Hope this helps.

    Herman

    On Mon, May 13, 2013 at 8:15 AM, Robert wrote:

    Hi Herman,

    I've just done that on the main page of CDM and I also restarted the
    cluster by selecting the option in CDM.
    The output of the command "alternatives --display hadoop-conf" is
    still the same, i.e.:

    hadoop-conf - status is auto.
    link currently points to /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    /opt/cloudera/parcels/CDH-4.2.**1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    - priority 10
    Current `best' version is /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty.

    I did use the wizard to set-up the cluster so I am kind of stuck as I
    really don't understand what is going wrong.
    Cheers,

    Robert
    On Monday, 13 May 2013 16:57:31 UTC+2, Herman Chen wrote:

    Hi Robert,

    It looks like you need to Deploy Client Configuration, one of the
    cluster-level action. The wizard would have automatically run the command
    if you used that to setup the cluster.

    Herman

    On Mon, May 13, 2013 at 1:12 AM, Robert wrote:

    Hi Herman,

    this is the output I get when running the command you suggested:

    # alternatives --display hadoop-conf
    hadoop-conf - status is auto.
    link currently points to /opt/cloudera/parcels/CDH-4.2.****
    1-1.cdh4.2.1.p0.5/etc/hadoop/**c**onf.empty
    /opt/cloudera/parcels/CDH-4.2.****1-1.cdh4.2.1.p0.5/etc/hadoop/**c*
    *onf.empty - priority 10
    Current `best' version is /opt/cloudera/parcels/CDH-4.2.****
    1-1.cdh4.2.1.p0.5/etc/hadoop/**c**onf.empty.

    I installed everything using CDM and I am surprised YARN is not
    showing up in the output of the above command.
    The YARN service is installed in the cluster and it shows good
    health. I am also able to set Oozie to use Yarn, although still not managed
    to get it to run properly.
    Any hints?
    Cheers,

    Robert

    On Saturday, 11 May 2013 00:50:30 UTC+2, Herman Chen wrote:

    Hi Robert,

    The default client configs under /etc/hadoop/conf use MR1. So your
    job failed because you were asking the MR1 client to talk to YARN port.

    The YARN client configs are also deployed via alternatives. You
    can list all deployed configs by:
    # alternatives --display hadoop-conf

    The output should be something like:
    hadoop-conf - status is auto.
    link currently points to /etc/hadoop/conf.cloudera.**MAPR****
    EDUCE-1
    /opt/cloudera/parcels/CDH-4.2.******1-1.cdh4.2.1.p0.5/etc/hadoop/**
    c****onf.empty - priority 10
    /etc/hadoop/conf.cloudera.**HDFS****-1 - priority 90
    /etc/hadoop/conf.cloudera.**MAPR****EDUCE-1 - priority 92
    /etc/hadoop/conf.cloudera.**YARN****-1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.**MAPR****
    EDUCE-1.

    And you can manually override it to YARN by:
    # alternatives --set hadoop-conf /etc/hadoop/conf.cloudera.**YARN**
    **-1

    Then your command (no need for -jt -fs) should be able to run with
    YARN.

    Lastly, if you want to use YARN on a regular basis, the priority 91
    can be configured in CM to be something higher. Then you don't need to
    manually set alternatives. Hope this helps!

    Herman


    On Fri, May 10, 2013 at 6:19 AM, Robert wrote:

    Hi all,

    I am configuring a cluster to use yarn (using cloudera manager
    free v4.5.2) and I am trying to run the Pi example and had no luck until
    now.
    I am running the example with the following command:
    sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-4.2.******
    1-1.cdh4.2.1.p0.5/lib/hadoop-**m****apreduce/hadoop-mapreduce-**
    exam****ples.jar pi -jt mydevnamenode:8032 -fs
    hdfs://mynamenode:8020 10 10

    I am getting the following error:

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/10 15:07:28 ERROR security.UserGroupInformation:
    PriviledgedActionException as:hdfs (auth:SIMPLE)
    cause:org.apache.hadoop.ipc.**Re****moteException(java.io.**
    IOExcept****ion): Unknown rpc kind RPC_WRITABLE
    org.apache.hadoop.ipc.**RemoteEx****ception(java.io.**IOException):
    Unknown rpc kind RPC_WRITABLE


    When trying the same example with MRv1 it works correctly.
    Mapred-site-xml contains the mapreduce.framework.name set to yarn
    and yarn-site.xml has the resource manager properties set. It has all been
    set automatically by CDM and I haven't changed anything.

    Any hints what could be causing the issue?
    Thanks in advance,

    Robert
  • Robert at May 16, 2013 at 11:06 am
    Hi again,

    Your suggestion about the memory issue actually did the trick. I was
    actually quite surprised that there is no log which can give a hint about a
    memory issue.
    For some reason the values that Cloudera Manager was using were quite low,
    so I´ve did the following:

    - Set the Container Memory (yarn.nodemanager.resource.memory-mb) to 4GB (my
    nodes have only 4GB of RAM so no point using the default 8GB)
    - Set the Java Heap Size of ResourceManager to 1GB

    Now the example is running fine and you saved my quite some time with your
    last suggestion.
    Thanks a lot for your help!
    Cheers,

    Robert

    On Thursday, 16 May 2013 11:45:26 UTC+2, Robert wrote:

    Hi Herman,

    I've added the YARN gateway role for the FE machine and also for all the
    other nodes (1 resource manager and 6 nodemanagers)
    Then I've run first the PI example with MR1 and it works fine. Then I've
    changed the priority for YARN to 93 so that the example runs with Yarn.
    I've checked the yarn.app.mapreduce.am.resource.mb and it is set to 1.5GB.

    The output of the resource manager log in /var/log/hadoop-yarn/is:

    2013-05-16 10:46:28,123 INFO
    org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated
    new applicationId: 1
    2013-05-16 10:46:28,786 INFO
    org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
    with id 1 submitted by user hdfs
    2013-05-16 10:46:28,789 INFO
    org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hdfs
    IP=10.95.108.245 OPERATION=Submit Application Request
    TARGET=ClientRMService RESULT=SUCCESS
    APPID=application_1368693860919_0001
    2013-05-16 10:46:28,817 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
    application_1368693860919_0001 State change from NEW to SUBMITTED
    2013-05-16 10:46:28,818 INFO
    org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
    Registering appattempt_1368693860919_0001_000001
    2013-05-16 10:46:28,819 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
    appattempt_1368693860919_0001_000001 State change from NEW to SUBMITTED
    2013-05-16 10:46:28,836 INFO
    org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
    Application Submission: application_1368693860919_0001 from hdfs, currently
    active: 1
    2013-05-16 10:46:28,839 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
    appattempt_1368693860919_0001_000001 State change from SUBMITTED to
    SCHEDULED
    2013-05-16 10:46:28,839 INFO
    org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
    application_1368693860919_0001 State change from SUBMITTED to ACCEPTED

    Is there any other log that I should look at which can give me a hint of
    what is going wrong? Still the tracking URL shows "
    The requested application does not appear to be running yet, and has not
    set a tracking URL." and the job sits there forever...

    In the WebUI of the resource manager (screenshot attached) does not seem
    to give more info about what is going on.
    Thanks again for your help.
    Kind regards,

    Robert



    On Thursday, 16 May 2013 01:46:55 UTC+2, Herman Chen wrote:

    Hi Robert,

    Sounds like you should add to your FrontEnd host the Gateway role for
    each service. That way you will have the client configurations available
    for whichever service you need to work with.

    So it looks like you have successfully submitted the job to YARN, but the
    job isn't making progress for some reason. The message "mynamenode/
    10.95.108.251:8032" is not the issue, since I see that on my successful
    job runs also. To debug you would have to dig into YARN logs or web UI for
    useful bits of information. One cause I have experienced for job stalling
    is having insufficient memory to allocate even the Application Manager,
    which by default requires 1.5GB. Good luck!

    Herman

    On Tue, May 14, 2013 at 6:41 AM, Robert wrote:

    Hi again,

    the last 3 lines of the log are actually the following:

    13/05/14 10:02:27 INFO client.YarnClientImpl: Submitted application
    application_1368517819509_0003 to ResourceManager at mynamenode/
    10.95.108.251:8032
    13/05/14 10:02:27 INFO mapreduce.Job: The url to track the job:
    http://mynamenode:8088/proxy/application_1368517819509_0003/
    13/05/14 10:02:27 INFO mapreduce.Job: Running job: job_1368517819509_0003

    I am wondering if the log which says "Submitted application
    application_1368517819509_0003 to ResourceManager at mynamenode/
    10.95.108.251:8032" is actually showing where the error is. I mean, the
    fact that is showing the FQDN and the IP is it purely informative or does
    it try to use as resourcemanager something like "mynamenode/
    10.95.108.251:8032"?
    Thanks in advance,

    Robert
    On Tuesday, 14 May 2013 10:10:18 UTC+2, Robert wrote:

    Hi Herman,

    I think now I got 1 step further. Basically the FrontEnd machine I am
    running the command from had no roles for Yarn service. I added it as a
    gateway and deployed again the client configuration.
    Now the command # alternatives --display hadoop-conf gives me the
    following:

    hadoop-conf - status is auto.
    link currently points to /etc/hadoop/conf.cloudera.**yarn1
    /opt/cloudera/parcels/CDH-4.2.**1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    - priority 10
    /etc/hadoop/conf.cloudera.**yarn1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.**yarn1.

    Now when running the Pi example I am still not getting it to complete:

    sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/lib/hadoop-**mapreduce/hadoop-mapreduce-**examples.jar
    pi 10 10

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/14 10:02:27 INFO service.AbstractService:
    Service:org.apache.hadoop.**yarn.client.YarnClientImpl is inited.
    13/05/14 10:02:27 INFO service.AbstractService:
    Service:org.apache.hadoop.**yarn.client.YarnClientImpl is started.
    13/05/14 10:02:27 INFO input.FileInputFormat: Total input paths to
    process : 10
    13/05/14 10:02:27 INFO mapreduce.JobSubmitter: number of splits:10
    13/05/14 10:02:27 WARN conf.Configuration: mapred.jar is deprecated.
    Instead, use mapreduce.job.jar
    13/05/14 10:02:27 WARN conf.Configuration: mapred.map.tasks.speculative.
    **execution is deprecated. Instead, use mapreduce.map.speculative
    13/05/14 10:02:27 WARN conf.Configuration: mapred.reduce.tasks is
    deprecated. Instead, use mapreduce.job.reduces
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.value.class is
    deprecated. Instead, use mapreduce.job.output.value.**class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.reduce.tasks.**speculative.execution
    is deprecated. Instead, use mapreduce.reduce.speculative
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.map.class is
    deprecated. Instead, use mapreduce.job.map.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.job.name is
    deprecated. Instead, use mapreduce.job.name
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.reduce.class is
    deprecated. Instead, use mapreduce.job.reduce.class
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.inputformat.class
    is deprecated. Instead, use mapreduce.job.inputformat.**class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.input.dir is
    deprecated. Instead, use mapreduce.input.**fileinputformat.inputdir
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.dir is
    deprecated. Instead, use mapreduce.output.**fileoutputformat.outputdir
    13/05/14 10:02:27 WARN conf.Configuration: mapreduce.outputformat.class
    is deprecated. Instead, use mapreduce.job.outputformat.**class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.map.tasks is
    deprecated. Instead, use mapreduce.job.maps
    13/05/14 10:02:27 WARN conf.Configuration: mapred.output.key.class is
    deprecated. Instead, use mapreduce.job.output.key.class
    13/05/14 10:02:27 WARN conf.Configuration: mapred.working.dir is
    deprecated. Instead, use mapreduce.job.working.dir
    13/05/14 10:02:27 INFO mapreduce.JobSubmitter: Submitting tokens for
    job: job_1368517819509_0003
    13/05/14 10:02:27 INFO client.YarnClientImpl: Submitted application
    application_1368517819509_0003 to ResourceManager at
    bi4td-devnamenode01.hi.inet/10**.95.108.251:8032<http://10.95.108.251:8032>
    13/05/14 10:02:27 INFO mapreduce.Job: The url to track the job:
    http://mynamenode01:8088/**proxy/application_**1368517819509_0003/<http://mynamenode01:8088/proxy/application_1368517819509_0003/>
    13/05/14 10:02:27 INFO mapreduce.Job: Running job:
    job_1368517819509_0003

    There is no sign of completion (the system sits there showing the last
    line with the job_id).
    If I go tho the url to track the job, I get the following error:

    The requested application does not appear to be running yet, and has
    not set a tracking URL.

    It would be great if you could suggest what else is missing in my
    configuration. My end goal is to have an Oozie job running but I took one
    step back as debugging the Pi example is probably less tricky.
    The cluster I am setting up is created with 9 machines: 1 FrontEnd
    (where we have oozie client, hue server, etc...), 1 namenode (which is also
    the resourcemanager), 6 datanodes and 1 host with the DBs. Should I add all
    the machines as gateways for a particular MapReduce service? Note that the
    oozie commands or mapreduce examples are always initiated from the FrontEnd
    machine.
    Thanks in advance.
    Cheers,

    Robert

    On Monday, 13 May 2013 19:06:39 UTC+2, Herman Chen wrote:

    Hi Robert,

    Deploy Client Configurations will deploy configs for a particular
    service to a host as long as that host contains any role belonging to the
    service. Does the host that you're issuing the command from contain any
    MapReduce or YARN roles? If you prefer using a host without the roles, you
    can add to the host a Gateway role for that service, which will ensure that
    client configurations are deployed. Hope this helps.

    Herman

    On Mon, May 13, 2013 at 8:15 AM, Robert wrote:

    Hi Herman,

    I've just done that on the main page of CDM and I also restarted the
    cluster by selecting the option in CDM.
    The output of the command "alternatives --display hadoop-conf" is
    still the same, i.e.:

    hadoop-conf - status is auto.
    link currently points to /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    /opt/cloudera/parcels/CDH-4.2.**1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty
    - priority 10
    Current `best' version is /opt/cloudera/parcels/CDH-4.2.**
    1-1.cdh4.2.1.p0.5/etc/hadoop/**conf.empty.

    I did use the wizard to set-up the cluster so I am kind of stuck as I
    really don't understand what is going wrong.
    Cheers,

    Robert
    On Monday, 13 May 2013 16:57:31 UTC+2, Herman Chen wrote:

    Hi Robert,

    It looks like you need to Deploy Client Configuration, one of the
    cluster-level action. The wizard would have automatically run the command
    if you used that to setup the cluster.

    Herman

    On Mon, May 13, 2013 at 1:12 AM, Robert wrote:

    Hi Herman,

    this is the output I get when running the command you suggested:

    # alternatives --display hadoop-conf
    hadoop-conf - status is auto.
    link currently points to /opt/cloudera/parcels/CDH-4.2.****
    1-1.cdh4.2.1.p0.5/etc/hadoop/**c**onf.empty
    /opt/cloudera/parcels/CDH-4.2.****1-1.cdh4.2.1.p0.5/etc/hadoop/**c
    **onf.empty - priority 10
    Current `best' version is /opt/cloudera/parcels/CDH-4.2.****
    1-1.cdh4.2.1.p0.5/etc/hadoop/**c**onf.empty.

    I installed everything using CDM and I am surprised YARN is not
    showing up in the output of the above command.
    The YARN service is installed in the cluster and it shows good
    health. I am also able to set Oozie to use Yarn, although still not managed
    to get it to run properly.
    Any hints?
    Cheers,

    Robert

    On Saturday, 11 May 2013 00:50:30 UTC+2, Herman Chen wrote:

    Hi Robert,

    The default client configs under /etc/hadoop/conf use MR1. So your
    job failed because you were asking the MR1 client to talk to YARN port.

    The YARN client configs are also deployed via alternatives. You
    can list all deployed configs by:
    # alternatives --display hadoop-conf

    The output should be something like:
    hadoop-conf - status is auto.
    link currently points to /etc/hadoop/conf.cloudera.**MAPR****
    EDUCE-1
    /opt/cloudera/parcels/CDH-4.2.******1-1.cdh4.2.1.p0.5/etc/hadoop/*
    *c****onf.empty - priority 10
    /etc/hadoop/conf.cloudera.**HDFS****-1 - priority 90
    /etc/hadoop/conf.cloudera.**MAPR****EDUCE-1 - priority 92
    /etc/hadoop/conf.cloudera.**YARN****-1 - priority 91
    Current `best' version is /etc/hadoop/conf.cloudera.**MAPR****
    EDUCE-1.

    And you can manually override it to YARN by:
    # alternatives --set hadoop-conf /etc/hadoop/conf.cloudera.**YARN*
    ***-1

    Then your command (no need for -jt -fs) should be able to run with
    YARN.

    Lastly, if you want to use YARN on a regular basis, the priority
    91 can be configured in CM to be something higher. Then you don't need to
    manually set alternatives. Hope this helps!

    Herman


    On Fri, May 10, 2013 at 6:19 AM, Robert wrote:

    Hi all,

    I am configuring a cluster to use yarn (using cloudera manager
    free v4.5.2) and I am trying to run the Pi example and had no luck until
    now.
    I am running the example with the following command:
    sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH-4.2.******
    1-1.cdh4.2.1.p0.5/lib/hadoop-**m****apreduce/hadoop-mapreduce-**
    exam****ples.jar pi -jt mydevnamenode:8032 -fs
    hdfs://mynamenode:8020 10 10

    I am getting the following error:

    Number of Maps = 10
    Samples per Map = 10
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Wrote input for Map #5
    Wrote input for Map #6
    Wrote input for Map #7
    Wrote input for Map #8
    Wrote input for Map #9
    Starting Job
    13/05/10 15:07:28 ERROR security.UserGroupInformation:
    PriviledgedActionException as:hdfs (auth:SIMPLE)
    cause:org.apache.hadoop.ipc.**Re****moteException(java.io.**
    IOExcept****ion): Unknown rpc kind RPC_WRITABLE
    org.apache.hadoop.ipc.**RemoteEx****ception(java.io.**IOException):
    Unknown rpc kind RPC_WRITABLE


    When trying the same example with MRv1 it works correctly.
    Mapred-site-xml contains the mapreduce.framework.name set to
    yarn and yarn-site.xml has the resource manager properties set. It has all
    been set automatically by CDM and I haven't changed anything.

    Any hints what could be causing the issue?
    Thanks in advance,

    Robert

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedMay 10, '13 at 1:19p
activeMay 16, '13 at 11:06a
posts11
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Robert: 7 posts Herman Chen: 4 posts

People

Translate

site design / logo © 2022 Grokbase