FAQ
Hi,

Where are the Hadoop configuration files (hdfs-*.xml and mapred-*.xml) read
in the code? In the org.apache.hadoop.conf.Configuration class, the
following code is in the static block.

if(cL.getResource("hadoop-site.xml")!=null) {
LOG.warn("DEPRECATED: hadoop-site.xml found in the classpath. " +
"Usage of hadoop-site.xml is deprecated. Instead use
core-site.xml, "
+ "mapred-site.xml and hdfs-site.xml to override properties of " +
"core-default.xml, mapred-default.xml and hdfs-default.xml " +
"respectively");
}
addDefaultResource("core-default.xml");
addDefaultResource("core-site.xml");

Thanks,
Praveen

Search Discussions

  • Harsh J at Sep 24, 2011 at 2:49 pm
    There are specific derivatives of Configuration class that each read
    certain *-site.xml files. This is because the XML files are service
    specific.

    Class 'JobConf' reads the mapred-site.xml, and class
    'HdfsConfiguration' reads the hdfs-site.xml.

    For the new MapReduce framework, class 'YarnConfiguration' reads the
    yarn-site.xml.

    On Sat, Sep 24, 2011 at 7:22 PM, Praveen Sripati
    wrote:
    Hi,

    Where are the Hadoop configuration files (hdfs-*.xml and mapred-*.xml)  read
    in the code? In the org.apache.hadoop.conf.Configuration class, the
    following code is in the static block.

    if(cL.getResource("hadoop-site.xml")!=null) {
    LOG.warn("DEPRECATED: hadoop-site.xml found in the classpath. " +
    "Usage of hadoop-site.xml is deprecated. Instead use
    core-site.xml, "
    + "mapred-site.xml and hdfs-site.xml to override properties of " +
    "core-default.xml, mapred-default.xml and hdfs-default.xml " +
    "respectively");
    }
    addDefaultResource("core-default.xml");
    addDefaultResource("core-site.xml");

    Thanks,
    Praveen


    --
    Harsh J
  • Steve Loughran at Sep 26, 2011 at 1:33 pm

    On 24/09/11 15:48, Harsh J wrote:
    There are specific derivatives of Configuration class that each read
    certain *-site.xml files. This is because the XML files are service
    specific.
    I'm confused now.

    My belief is that when a default configuration file is pushed to the
    list via Configuration.addDefaultResource(), then all Configuration
    instances that are created after that get the config, whether they are
    Configuration instances or subclasses thereof.

    For example, JobConf explicitly adds the MR files

    static{
    Configuration.addDefaultResource("mapred-default.xml");
    Configuration.addDefaultResource("mapred-site.xml");
    }


    If the resource hasn't been loaded already, that loading triggers a
    reload of all existing configurations with the loadResource flag set

    /* in org.apache.hadoop.conf.Configuration */

    public static synchronized void addDefaultResource(String name) {
    if(!defaultResources.contains(name)) {
    defaultResources.add(name);
    for(Configuration conf : REGISTRY.keySet()) {
    if(conf.loadDefaults) {
    conf.reloadConfiguration();
    }
    }
    }
    }

    Configuration.loadResource is true unless you construct an instance
    with new Configuration(false); the state propagates when you create a
    new Configuration instance off another.

    The way the constructor adds all Configuration instances to the static
    (weak ref) REGISTRY map is inefficient as the loadDefaults flag is only
    ever set in the ASF codebase at construction time; it would be better to
    make that flag static and only register instances with loadDefaults = true

    Now, for some extra fun, Configuration.reloadConfiguration() is not
    final. Which allows subclasses to do it, before even their static
    construction/initialisation is fully complete. I know this as I have
    done it, and would not recommend it to anyone. You can end up in that
    weird world of class initialisation time stack traces.

    To clean up Configuration, then, I would
    -make reloadConfiguration final
    -make loadDefaults static
    -only add confs to the keySet if loadDefaults = true
    -add some debug strings to see whats going on/wrong.

    This would break my code, but that's OK. What I did was not something
    I'd recommend to anyone else, and that class of mine is now marked as
    @Deprecated in my own codebase, as it was more trouble than it was
    worth. What was it trying to do? Get a live config from a Configuration
    Management service, and retain that bonding to the CM infrastructure
    even when cloned. This stops working once you start
    serializing/deserializing them, so it's not worth the hassle.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedSep 24, '11 at 1:52p
activeSep 26, '11 at 1:33p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase