FAQ
Unclear precedence of config files and property definitions
-----------------------------------------------------------

Key: HADOOP-127
URL: http://issues.apache.org/jira/browse/HADOOP-127
Project: Hadoop
Type: Bug

Components: conf
Environment: Hadoop 0.1.1, Nutch 0.8-dev
Reporter: Andrzej Bialecki


The order in which configuration resources are read is not sufficiently documented, and also there are no mechanisms preventing harmful re-definition of certain properties, if they are put in wrong config files.
From reading the code in Hadoop Configuration.java, JobConf.java and Nutch NutchConfiguration.java I _think_ this is what's happening.
There are two groups of resources: default resources, loaded first, and final resources, loaded at the end. All properties (re)-defined in files loaded later will override any previous definitions:

* default resources: loaded in the order as they are added. The following files are added here, in order:

1. hadoop-default.xml (Configuration)
2. nutch-default.xml (NutchConfiguration)
3. mapred-default.xml (JobConf)
4. job_xx_xxx.xml (JobConf, in JobConf(File config))

* final resource: which always come after default resources, i.e. if any value is defined here it will always override those set in default resources (NOTE: including per job settings!!!). The following files are added here, in reversed order:

2. hadoop-site.xml (Configuration)
1. nutch-site.xml (NutchConfiguration)

(i.e. hadoop-site.xml will take precedence over anything else defined in any other config file).

I would appreciate checking that this is indeed the case, and suggestions how to ensure that you cannot so easily shoot yourself in the foot if you define wrong properties in hadoop-site or nutch-site ...

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira

Search Discussions

  • Doug Cutting (JIRA) at Apr 10, 2006 at 11:28 pm
    [ http://issues.apache.org/jira/browse/HADOOP-127?page=comments#action_12373950 ]

    Doug Cutting commented on HADOOP-127:
    -------------------------------------

    I think you have it right. Some guidelines:

    Folks should only define things in the -site files if they want to force them for all code.

    Folks should not edit the -default files.

    Non-default settings that may be overridden by application code should be put in mapred-default.xml.

    Application settings are set in the job.

    Strictly speaking, it doesn't much matter whether you put something in a nutch- or hadoop- file, although the intent is to keep things that are specific to Hadoop in hadoop- files and things specific to Nutch in nutch- files.



    Unclear precedence of config files and property definitions
    -----------------------------------------------------------

    Key: HADOOP-127
    URL: http://issues.apache.org/jira/browse/HADOOP-127
    Project: Hadoop
    Type: Bug
    Components: conf
    Environment: Hadoop 0.1.1, Nutch 0.8-dev
    Reporter: Andrzej Bialecki
    The order in which configuration resources are read is not sufficiently documented, and also there are no mechanisms preventing harmful re-definition of certain properties, if they are put in wrong config files.
    From reading the code in Hadoop Configuration.java, JobConf.java and Nutch NutchConfiguration.java I _think_ this is what's happening.
    There are two groups of resources: default resources, loaded first, and final resources, loaded at the end. All properties (re)-defined in files loaded later will override any previous definitions:
    * default resources: loaded in the order as they are added. The following files are added here, in order:
    1. hadoop-default.xml (Configuration)
    2. nutch-default.xml (NutchConfiguration)
    3. mapred-default.xml (JobConf)
    4. job_xx_xxx.xml (JobConf, in JobConf(File config))
    * final resource: which always come after default resources, i.e. if any value is defined here it will always override those set in default resources (NOTE: including per job settings!!!). The following files are added here, in reversed order:
    2. hadoop-site.xml (Configuration)
    1. nutch-site.xml (NutchConfiguration)
    (i.e. hadoop-site.xml will take precedence over anything else defined in any other config file).
    I would appreciate checking that this is indeed the case, and suggestions how to ensure that you cannot so easily shoot yourself in the foot if you define wrong properties in hadoop-site or nutch-site ...
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators:
    http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see:
    http://www.atlassian.com/software/jira
  • Frédéric Bertin (JIRA) at Sep 1, 2006 at 2:47 pm
    [ http://issues.apache.org/jira/browse/HADOOP-127?page=comments#action_12432153 ]

    Frédéric Bertin commented on HADOOP-127:
    ----------------------------------------

    <quote>Folks should only define things in the -site files if they want to force them for all code. </quote>

    I should have read this earlier, it would have saved me some time.

    Actually, the fact that properties defined in hadoop-final.xml override EVERYTHING, included properties defined in job config files, is something very important that should be well documented, because it's not the intuitively expected behaviour (which, to me, was:
    - hadoop-default.xml, mapred-default.xml overrided by
    - hadoop-final.xml, overrided by
    - job config files

    I've searched the wiki (afterwards, unfortunately) and it's very well documented there. However, the comments included in hadoop-default.xml and other delivered config files are not clear about this. Maybe they should be detailed, or just link to the wiki page.


    Unclear precedence of config files and property definitions
    -----------------------------------------------------------

    Key: HADOOP-127
    URL: http://issues.apache.org/jira/browse/HADOOP-127
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Environment: Hadoop 0.1.1, Nutch 0.8-dev
    Reporter: Andrzej Bialecki

    The order in which configuration resources are read is not sufficiently documented, and also there are no mechanisms preventing harmful re-definition of certain properties, if they are put in wrong config files.
    From reading the code in Hadoop Configuration.java, JobConf.java and Nutch NutchConfiguration.java I _think_ this is what's happening.
    There are two groups of resources: default resources, loaded first, and final resources, loaded at the end. All properties (re)-defined in files loaded later will override any previous definitions:
    * default resources: loaded in the order as they are added. The following files are added here, in order:
    1. hadoop-default.xml (Configuration)
    2. nutch-default.xml (NutchConfiguration)
    3. mapred-default.xml (JobConf)
    4. job_xx_xxx.xml (JobConf, in JobConf(File config))
    * final resource: which always come after default resources, i.e. if any value is defined here it will always override those set in default resources (NOTE: including per job settings!!!). The following files are added here, in reversed order:
    2. hadoop-site.xml (Configuration)
    1. nutch-site.xml (NutchConfiguration)
    (i.e. hadoop-site.xml will take precedence over anything else defined in any other config file).
    I would appreciate checking that this is indeed the case, and suggestions how to ensure that you cannot so easily shoot yourself in the foot if you define wrong properties in hadoop-site or nutch-site ...
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Owen O'Malley (JIRA) at Sep 27, 2007 at 10:02 pm
    [ https://issues.apache.org/jira/browse/HADOOP-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Owen O'Malley resolved HADOOP-127.
    ----------------------------------

    Resolution: Duplicate

    I believe this was fixed by HADOOP-785.
    Unclear precedence of config files and property definitions
    -----------------------------------------------------------

    Key: HADOOP-127
    URL: https://issues.apache.org/jira/browse/HADOOP-127
    Project: Hadoop
    Issue Type: Bug
    Components: conf
    Environment: Hadoop 0.1.1, Nutch 0.8-dev
    Reporter: Andrzej Bialecki
    Assignee: Doug Cutting

    The order in which configuration resources are read is not sufficiently documented, and also there are no mechanisms preventing harmful re-definition of certain properties, if they are put in wrong config files.
    From reading the code in Hadoop Configuration.java, JobConf.java and Nutch NutchConfiguration.java I _think_ this is what's happening.
    There are two groups of resources: default resources, loaded first, and final resources, loaded at the end. All properties (re)-defined in files loaded later will override any previous definitions:
    * default resources: loaded in the order as they are added. The following files are added here, in order:
    1. hadoop-default.xml (Configuration)
    2. nutch-default.xml (NutchConfiguration)
    3. mapred-default.xml (JobConf)
    4. job_xx_xxx.xml (JobConf, in JobConf(File config))
    * final resource: which always come after default resources, i.e. if any value is defined here it will always override those set in default resources (NOTE: including per job settings!!!). The following files are added here, in reversed order:
    2. hadoop-site.xml (Configuration)
    1. nutch-site.xml (NutchConfiguration)
    (i.e. hadoop-site.xml will take precedence over anything else defined in any other config file).
    I would appreciate checking that this is indeed the case, and suggestions how to ensure that you cannot so easily shoot yourself in the foot if you define wrong properties in hadoop-site or nutch-site ...
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedApr 10, '06 at 9:48a
activeSep 27, '07 at 10:02p
posts4
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Owen O'Malley (JIRA): 4 posts

People

Translate

site design / logo © 2022 Grokbase