FAQ
Hi all,

I am working on a 11 node CDH3 cluster installed using Cloudera Manager.
In order to tune my cluster, I was trying to change certain Hadoop
configuration (mapred-site.xml, hdf-site.xml, core-site.xml) property
values from CM. While doing that, I found that many properties are not
present in CM (mapred.job.shuffle.input.buffer.percent,
mapred.job.shuffle.merge.percent, mapred.job.reduce.input.buffer.percent,
mapred.inmem.merge.threshold to name a few).
However, I can see those properties in job.xml after running any job.

I was working on the configuration page inside "MapReduce" and "HDFS"
service sections for mapres-site.xml and hdfs-site.xml properties
respectively.

My 1st question is that from where Hadoop is getting those values when they
are not present in CM and also $HADOOP_HOME/conf. I have tried exploring
the filesystem to find files containing all the default values, but
unfortunately could not find.

Secondly, if I want to add those properties myself into CM, then from where
exactly in the configuration page can I add them?

Thirdly, MapReduce configuration page will give me mapred-site.xml
properties and HDFS configuration page will give me hdfs-site.xml
properties. But from where I can see the core-site.xml properties?

Someone please help me understand these above things.

Thanks,
Gaurav Dasgupta

Search Discussions

  • Joey Echeverria at Sep 6, 2012 at 7:59 pm
    1) If any configs are not set, they come from the appropriate
    *-default.xml file, or in some cases the default is hardcoded into the
    source.

    2) Anything you want to set that isn't exposed in the UI you can add
    in the safety valve:

    https://ccp.cloudera.com/display/ENT4DOC/Changing+Service+Configurations#ChangingServiceConfigurations-UsingaConfigurationSafetyValve

    Note that each role has it's own safety valve. There is also a
    service-wide safety valve for anything that applies to all roles in a
    service and a client safety valve for anything that needs to be set
    for clients. In some cases, you need to set a config for both the
    servie and client. In those cases, you have to set it in two places in
    the current UI.

    3) Even though it's labeled as core-site.xml, most things that end up
    there apply to only HDFS, MR, or clients. If you really think
    everything needs it, you'll need to copy it into the service-wide
    safety valve for both MR and HDFS as well as the client safety valve
    for both.

    I hope that helps.

    -Joey
    On Thu, Sep 6, 2012 at 2:23 PM, Gaurav Dasgupta wrote:
    Hi all,

    I am working on a 11 node CDH3 cluster installed using Cloudera Manager.
    In order to tune my cluster, I was trying to change certain Hadoop
    configuration (mapred-site.xml, hdf-site.xml, core-site.xml) property values
    from CM. While doing that, I found that many properties are not present in
    CM (mapred.job.shuffle.input.buffer.percent,
    mapred.job.shuffle.merge.percent, mapred.job.reduce.input.buffer.percent,
    mapred.inmem.merge.threshold to name a few).
    However, I can see those properties in job.xml after running any job.

    I was working on the configuration page inside "MapReduce" and "HDFS"
    service sections for mapres-site.xml and hdfs-site.xml properties
    respectively.

    My 1st question is that from where Hadoop is getting those values when they
    are not present in CM and also $HADOOP_HOME/conf. I have tried exploring the
    filesystem to find files containing all the default values, but
    unfortunately could not find.

    Secondly, if I want to add those properties myself into CM, then from where
    exactly in the configuration page can I add them?

    Thirdly, MapReduce configuration page will give me mapred-site.xml
    properties and HDFS configuration page will give me hdfs-site.xml
    properties. But from where I can see the core-site.xml properties?

    Someone please help me understand these above things.

    Thanks,
    Gaurav Dasgupta


    --
    Joey Echeverria
    Principal Solutions Architect
    Cloudera, Inc.
  • Philip Zeyliger at Sep 6, 2012 at 8:26 pm
    mapred.job.shuffle.input.buffer.percent, mapred.job.shuffle.merge.percent,
    mapred.job.reduce.input.buffer.percent, mapred.inmem.merge.threshold

    I'll also point out that many of the parameters you mentioned are settable
    "per job." As such, they're read from the configs on the submitting
    client, and end up in the job.xml file that way. You might not see changes
    to them unless the "client configuration" matches appropriately.

    -- Philip
    On Thu, Sep 6, 2012 at 12:59 PM, Joey Echeverria wrote:

    1) If any configs are not set, they come from the appropriate
    *-default.xml file, or in some cases the default is hardcoded into the
    source.

    2) Anything you want to set that isn't exposed in the UI you can add
    in the safety valve:


    https://ccp.cloudera.com/display/ENT4DOC/Changing+Service+Configurations#ChangingServiceConfigurations-UsingaConfigurationSafetyValve

    Note that each role has it's own safety valve. There is also a
    service-wide safety valve for anything that applies to all roles in a
    service and a client safety valve for anything that needs to be set
    for clients. In some cases, you need to set a config for both the
    servie and client. In those cases, you have to set it in two places in
    the current UI.

    3) Even though it's labeled as core-site.xml, most things that end up
    there apply to only HDFS, MR, or clients. If you really think
    everything needs it, you'll need to copy it into the service-wide
    safety valve for both MR and HDFS as well as the client safety valve
    for both.

    I hope that helps.

    -Joey
    On Thu, Sep 6, 2012 at 2:23 PM, Gaurav Dasgupta wrote:
    Hi all,

    I am working on a 11 node CDH3 cluster installed using Cloudera Manager.
    In order to tune my cluster, I was trying to change certain Hadoop
    configuration (mapred-site.xml, hdf-site.xml, core-site.xml) property values
    from CM. While doing that, I found that many properties are not present in
    CM (mapred.job.shuffle.input.buffer.percent,
    mapred.job.shuffle.merge.percent, mapred.job.reduce.input.buffer.percent,
    mapred.inmem.merge.threshold to name a few).
    However, I can see those properties in job.xml after running any job.

    I was working on the configuration page inside "MapReduce" and "HDFS"
    service sections for mapres-site.xml and hdfs-site.xml properties
    respectively.

    My 1st question is that from where Hadoop is getting those values when they
    are not present in CM and also $HADOOP_HOME/conf. I have tried exploring the
    filesystem to find files containing all the default values, but
    unfortunately could not find.

    Secondly, if I want to add those properties myself into CM, then from where
    exactly in the configuration page can I add them?

    Thirdly, MapReduce configuration page will give me mapred-site.xml
    properties and HDFS configuration page will give me hdfs-site.xml
    properties. But from where I can see the core-site.xml properties?

    Someone please help me understand these above things.

    Thanks,
    Gaurav Dasgupta


    --
    Joey Echeverria
    Principal Solutions Architect
    Cloudera, Inc.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedSep 6, '12 at 6:23p
activeSep 6, '12 at 8:26p
posts3
users3
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase