FAQ
variable expansion in Configuration
-----------------------------------

Key: HADOOP-463
URL: http://issues.apache.org/jira/browse/HADOOP-463
Project: Hadoop
Issue Type: Improvement
Components: conf
Reporter: Michel Tourn


Add variable expansion to Configuration class.
=================

This is necessary for shared, client-side configurations:

A Job submitter (an HDFS client) requires:
<name>dfs.data.dir</name><value>/tmp/${user.name}/dfs</value>

A local-mode mapreduce requires:
<name>mapred.temp.dir</name><value>/tmp/${user.name}/mapred/tmp</value>

Why this is necessary :
=================

Currently we use shared directories like:
<name>dfs.data.dir</name><value>/tmp/dfs</value>
This superficially seems to work.
After all, different JVM clients create their own private subdirectory map_xxxx., so they will not conflict.

What really happens:

1. /tmp/ is world-writable, as it's supposed to.
2. Hadoop will create missing subdirectories.
This is Java so that for ex. /tmp/system is created as writable only by the JVM process user
3. This is a shared client machine so next user's JVM will find /tmp/system owned by somebody else. Creating a directory within /tmp/system fails

Implementation of var expansion
=============
in class Configuration,
The Properties really store things like put("banner", "hello ${user.name}");
In public String get(String name): postprocess the returned value:
Use a regexp to find the pattern ${xxxx}
Lookup xxxx as a system property
If found, replace ${xxxx} by the system property value.
Else leave as-is. An unexpanded ${xxxx} is a hint that the variable name is invalid.


Other workarounds
===============
The other proposed workarounds are not as elegant as variable expansion.

Workaround 1:
have an installation script which does:
mkdir /tmp/dfs
chmod uga+rw /tmp/dfs
repeat for ALL configured subdirectories at ANY nesting level
keep the script in sync with changes to hadoop properties.
support on non-Unix platform
Make sure the installtion script runs before Hadoop runs for the first time.
If users change the permissions/delete any of the shared directories, it breaks again.

Workaround 2:
do the chmod operations from within the Hadoop code.
In pure java 1.4, 1.5 this is not possible.
It requires the Hadoop client process to have chmod privilege (rather than just mkdir privilege)
It requires to special-case directory creation code.




--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

  • Michel Tourn (JIRA) at Aug 18, 2006 at 10:14 pm
    [ http://issues.apache.org/jira/browse/HADOOP-463?page=all ]

    Michel Tourn updated HADOOP-463:
    --------------------------------

    Description:
    Add variable expansion to Configuration class.
    =================

    This is necessary for shared, client-side configurations:

    A Job submitter (an HDFS client) requires:
    <name>dfs.data.dir</name><value>/tmp/${user.name}/dfs</value>

    A local-mode mapreduce requires:
    <name>mapred.temp.dir</name><value>/tmp/${user.name}/mapred/tmp</value>

    Why this is necessary :
    =================

    Currently we use shared directories like:
    <name>dfs.data.dir</name><value>/tmp/dfs</value>
    This superficially seems to work.
    After all, different JVM clients create their own private subdirectory map_xxxx., so they will not conflict.

    What really happens:

    1. /tmp/ is world-writable, as it's supposed to.
    2. Hadoop will create missing subdirectories.
    This is Java so that for ex. /tmp/system is created as writable only by the JVM process user
    3. This is a shared client machine so next user's JVM will find /tmp/system owned by somebody else. Creating a directory within /tmp/system fails

    Implementation of var expansion
    =============
    in class Configuration,
    The Properties really store things like put("banner", "hello ${user.name}");
    In public String get(String name): postprocess the returned value:
    Use a regexp to find the pattern ${xxxx}
    Lookup xxxx as a system property
    If found, replace ${xxxx} by the system property value.
    Else leave as-is. An unexpanded ${xxxx} is a hint that the variable name is invalid.


    Other workarounds
    ===============
    The other proposed workarounds are not as elegant as variable expansion.

    Workaround 1:
    have an installation script which does:
    mkdir /tmp/dfs
    chmod uga+rw /tmp/dfs
    repeat for ALL configured subdirectories at ANY nesting level
    keep the script in sync with changes to hadoop XML configuration files.
    Support the script on non-Unix platform
    Make sure the installtion script runs before Hadoop runs for the first time.
    If users change the permissions/delete any of the shared directories, it breaks again.

    Workaround 2:
    do the chmod operations from within the Hadoop code.
    In pure java 1.4, 1.5 this is not possible.
    It requires the Hadoop client process to have chmod privilege (rather than just mkdir privilege)
    It requires to special-case directory creation code.




    was:
    Add variable expansion to Configuration class.
    =================

    This is necessary for shared, client-side configurations:

    A Job submitter (an HDFS client) requires:
    <name>dfs.data.dir</name><value>/tmp/${user.name}/dfs</value>

    A local-mode mapreduce requires:
    <name>mapred.temp.dir</name><value>/tmp/${user.name}/mapred/tmp</value>

    Why this is necessary :
    =================

    Currently we use shared directories like:
    <name>dfs.data.dir</name><value>/tmp/dfs</value>
    This superficially seems to work.
    After all, different JVM clients create their own private subdirectory map_xxxx., so they will not conflict.

    What really happens:

    1. /tmp/ is world-writable, as it's supposed to.
    2. Hadoop will create missing subdirectories.
    This is Java so that for ex. /tmp/system is created as writable only by the JVM process user
    3. This is a shared client machine so next user's JVM will find /tmp/system owned by somebody else. Creating a directory within /tmp/system fails

    Implementation of var expansion
    =============
    in class Configuration,
    The Properties really store things like put("banner", "hello ${user.name}");
    In public String get(String name): postprocess the returned value:
    Use a regexp to find the pattern ${xxxx}
    Lookup xxxx as a system property
    If found, replace ${xxxx} by the system property value.
    Else leave as-is. An unexpanded ${xxxx} is a hint that the variable name is invalid.


    Other workarounds
    ===============
    The other proposed workarounds are not as elegant as variable expansion.

    Workaround 1:
    have an installation script which does:
    mkdir /tmp/dfs
    chmod uga+rw /tmp/dfs
    repeat for ALL configured subdirectories at ANY nesting level
    keep the script in sync with changes to hadoop properties.
    support on non-Unix platform
    Make sure the installtion script runs before Hadoop runs for the first time.
    If users change the permissions/delete any of the shared directories, it breaks again.

    Workaround 2:
    do the chmod operations from within the Hadoop code.
    In pure java 1.4, 1.5 this is not possible.
    It requires the Hadoop client process to have chmod privilege (rather than just mkdir privilege)
    It requires to special-case directory creation code.




    variable expansion in Configuration
    -----------------------------------

    Key: HADOOP-463
    URL: http://issues.apache.org/jira/browse/HADOOP-463
    Project: Hadoop
    Issue Type: Improvement
    Components: conf
    Reporter: Michel Tourn

    Add variable expansion to Configuration class.
    =================
    This is necessary for shared, client-side configurations:
    A Job submitter (an HDFS client) requires:
    <name>dfs.data.dir</name><value>/tmp/${user.name}/dfs</value>
    A local-mode mapreduce requires:
    <name>mapred.temp.dir</name><value>/tmp/${user.name}/mapred/tmp</value>
    Why this is necessary :
    =================
    Currently we use shared directories like:
    <name>dfs.data.dir</name><value>/tmp/dfs</value>
    This superficially seems to work.
    After all, different JVM clients create their own private subdirectory map_xxxx., so they will not conflict.
    What really happens:
    1. /tmp/ is world-writable, as it's supposed to.
    2. Hadoop will create missing subdirectories.
    This is Java so that for ex. /tmp/system is created as writable only by the JVM process user
    3. This is a shared client machine so next user's JVM will find /tmp/system owned by somebody else. Creating a directory within /tmp/system fails
    Implementation of var expansion
    =============
    in class Configuration,
    The Properties really store things like put("banner", "hello ${user.name}");
    In public String get(String name): postprocess the returned value:
    Use a regexp to find the pattern ${xxxx}
    Lookup xxxx as a system property
    If found, replace ${xxxx} by the system property value.
    Else leave as-is. An unexpanded ${xxxx} is a hint that the variable name is invalid.
    Other workarounds
    ===============
    The other proposed workarounds are not as elegant as variable expansion.
    Workaround 1:
    have an installation script which does:
    mkdir /tmp/dfs
    chmod uga+rw /tmp/dfs
    repeat for ALL configured subdirectories at ANY nesting level
    keep the script in sync with changes to hadoop XML configuration files.
    Support the script on non-Unix platform
    Make sure the installtion script runs before Hadoop runs for the first time.
    If users change the permissions/delete any of the shared directories, it breaks again.
    Workaround 2:
    do the chmod operations from within the Hadoop code.
    In pure java 1.4, 1.5 this is not possible.
    It requires the Hadoop client process to have chmod privilege (rather than just mkdir privilege)
    It requires to special-case directory creation code.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Michel Tourn (JIRA) at Aug 28, 2006 at 10:02 pm
    [ http://issues.apache.org/jira/browse/HADOOP-463?page=all ]

    Michel Tourn updated HADOOP-463:
    --------------------------------

    Attachment: confvar.patch

    The patch implements the variable expansion described in this issue.
    There is also a junit test.

    On my shared client machine I use it like this.
    (${user.name} expands to the System property "michel")

    <property>
    <name>tmp.base</name>
    <value>/tmp/${user.name}</value>
    </property>

    <property>
    <name>dfs.name.dir</name>
    <value>${tmp.base}/hadoop/dfs/name</value>
    </property>

    etc.
    variable expansion in Configuration
    -----------------------------------

    Key: HADOOP-463
    URL: http://issues.apache.org/jira/browse/HADOOP-463
    Project: Hadoop
    Issue Type: Improvement
    Components: conf
    Reporter: Michel Tourn
    Attachments: confvar.patch


    Add variable expansion to Configuration class.
    =================
    This is necessary for shared, client-side configurations:
    A Job submitter (an HDFS client) requires:
    <name>dfs.data.dir</name><value>/tmp/${user.name}/dfs</value>
    A local-mode mapreduce requires:
    <name>mapred.temp.dir</name><value>/tmp/${user.name}/mapred/tmp</value>
    Why this is necessary :
    =================
    Currently we use shared directories like:
    <name>dfs.data.dir</name><value>/tmp/dfs</value>
    This superficially seems to work.
    After all, different JVM clients create their own private subdirectory map_xxxx., so they will not conflict.
    What really happens:
    1. /tmp/ is world-writable, as it's supposed to.
    2. Hadoop will create missing subdirectories.
    This is Java so that for ex. /tmp/system is created as writable only by the JVM process user
    3. This is a shared client machine so next user's JVM will find /tmp/system owned by somebody else. Creating a directory within /tmp/system fails
    Implementation of var expansion
    =============
    in class Configuration,
    The Properties really store things like put("banner", "hello ${user.name}");
    In public String get(String name): postprocess the returned value:
    Use a regexp to find the pattern ${xxxx}
    Lookup xxxx as a system property
    If found, replace ${xxxx} by the system property value.
    Else leave as-is. An unexpanded ${xxxx} is a hint that the variable name is invalid.
    Other workarounds
    ===============
    The other proposed workarounds are not as elegant as variable expansion.
    Workaround 1:
    have an installation script which does:
    mkdir /tmp/dfs
    chmod uga+rw /tmp/dfs
    repeat for ALL configured subdirectories at ANY nesting level
    keep the script in sync with changes to hadoop XML configuration files.
    Support the script on non-Unix platform
    Make sure the installtion script runs before Hadoop runs for the first time.
    If users change the permissions/delete any of the shared directories, it breaks again.
    Workaround 2:
    do the chmod operations from within the Hadoop code.
    In pure java 1.4, 1.5 this is not possible.
    It requires the Hadoop client process to have chmod privilege (rather than just mkdir privilege)
    It requires to special-case directory creation code.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira
  • Doug Cutting (JIRA) at Aug 28, 2006 at 10:50 pm
    [ http://issues.apache.org/jira/browse/HADOOP-463?page=all ]

    Doug Cutting resolved HADOOP-463.
    ---------------------------------

    Fix Version/s: 0.6.0
    Resolution: Fixed

    I just committed this. I also changed hadoop-default.xml to use this so that temporary directories now contain the user's name. Thanks, Michel!
    variable expansion in Configuration
    -----------------------------------

    Key: HADOOP-463
    URL: http://issues.apache.org/jira/browse/HADOOP-463
    Project: Hadoop
    Issue Type: Improvement
    Components: conf
    Reporter: Michel Tourn
    Fix For: 0.6.0

    Attachments: confvar.patch


    Add variable expansion to Configuration class.
    =================
    This is necessary for shared, client-side configurations:
    A Job submitter (an HDFS client) requires:
    <name>dfs.data.dir</name><value>/tmp/${user.name}/dfs</value>
    A local-mode mapreduce requires:
    <name>mapred.temp.dir</name><value>/tmp/${user.name}/mapred/tmp</value>
    Why this is necessary :
    =================
    Currently we use shared directories like:
    <name>dfs.data.dir</name><value>/tmp/dfs</value>
    This superficially seems to work.
    After all, different JVM clients create their own private subdirectory map_xxxx., so they will not conflict.
    What really happens:
    1. /tmp/ is world-writable, as it's supposed to.
    2. Hadoop will create missing subdirectories.
    This is Java so that for ex. /tmp/system is created as writable only by the JVM process user
    3. This is a shared client machine so next user's JVM will find /tmp/system owned by somebody else. Creating a directory within /tmp/system fails
    Implementation of var expansion
    =============
    in class Configuration,
    The Properties really store things like put("banner", "hello ${user.name}");
    In public String get(String name): postprocess the returned value:
    Use a regexp to find the pattern ${xxxx}
    Lookup xxxx as a system property
    If found, replace ${xxxx} by the system property value.
    Else leave as-is. An unexpanded ${xxxx} is a hint that the variable name is invalid.
    Other workarounds
    ===============
    The other proposed workarounds are not as elegant as variable expansion.
    Workaround 1:
    have an installation script which does:
    mkdir /tmp/dfs
    chmod uga+rw /tmp/dfs
    repeat for ALL configured subdirectories at ANY nesting level
    keep the script in sync with changes to hadoop XML configuration files.
    Support the script on non-Unix platform
    Make sure the installtion script runs before Hadoop runs for the first time.
    If users change the permissions/delete any of the shared directories, it breaks again.
    Workaround 2:
    do the chmod operations from within the Hadoop code.
    In pure java 1.4, 1.5 this is not possible.
    It requires the Hadoop client process to have chmod privilege (rather than just mkdir privilege)
    It requires to special-case directory creation code.
    --
    This message is automatically generated by JIRA.
    -
    If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
    -
    For more information on JIRA, see: http://www.atlassian.com/software/jira

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedAug 18, '06 at 10:12p
activeAug 28, '06 at 10:50p
posts4
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Doug Cutting (JIRA): 4 posts

People

Translate

site design / logo © 2021 Grokbase