FAQ
New lines and leading spaces are not trimmed of a value when configuration is read
----------------------------------------------------------------------------------

Key: HADOOP-4212
URL: https://issues.apache.org/jira/browse/HADOOP-4212
Project: Hadoop Core
Issue Type: Bug
Components: conf
Affects Versions: 0.18.1
Environment: Generic
Reporter: Sreekanth Ramakrishnan
Assignee: Sreekanth Ramakrishnan
Priority: Minor


While configuration value is read the leading and trailing spaces and new line characters are taken into account.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Sreekanth Ramakrishnan (JIRA) at Sep 19, 2008 at 5:05 am
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632511#action_12632511 ]

    Sreekanth Ramakrishnan commented on HADOOP-4212:
    ------------------------------------------------

    Create following property in configuration file :
    {noformat}
    <property>
    <name>test.property.name1</name>
    <value>
    100
    </value>
    </property>
    {noformat}


    Try to programatically get the value of property _test.property.name1_ it would not be equal to _"100"_ but would be equal to _"\n100\n"_
    New lines and leading spaces are not trimmed of a value when configuration is read
    ----------------------------------------------------------------------------------

    Key: HADOOP-4212
    URL: https://issues.apache.org/jira/browse/HADOOP-4212
    Project: Hadoop Core
    Issue Type: Bug
    Components: conf
    Affects Versions: 0.18.1
    Environment: Generic
    Reporter: Sreekanth Ramakrishnan
    Assignee: Sreekanth Ramakrishnan
    Priority: Minor

    While configuration value is read the leading and trailing spaces and new line characters are taken into account.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sreekanth Ramakrishnan (JIRA) at Sep 19, 2008 at 5:07 am
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Sreekanth Ramakrishnan updated HADOOP-4212:
    -------------------------------------------

    Attachment: HADOOP-4212-TESTCASE.patch

    Attaching test case used for testing this bug
    New lines and leading spaces are not trimmed of a value when configuration is read
    ----------------------------------------------------------------------------------

    Key: HADOOP-4212
    URL: https://issues.apache.org/jira/browse/HADOOP-4212
    Project: Hadoop Core
    Issue Type: Bug
    Components: conf
    Affects Versions: 0.18.1
    Environment: Generic
    Reporter: Sreekanth Ramakrishnan
    Assignee: Sreekanth Ramakrishnan
    Priority: Minor
    Attachments: HADOOP-4212-TESTCASE.patch


    While configuration value is read the leading and trailing spaces and new line characters are taken into account.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sreekanth Ramakrishnan (JIRA) at Sep 19, 2008 at 5:25 am
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Sreekanth Ramakrishnan updated HADOOP-4212:
    -------------------------------------------

    Attachment: HADOOP-4212-1.patch

    Attaching patch with updated test case.

    - Introducing an attribute for the value called _preserve-whitespace_ . if the value of the attribute is set to true. Then value is read as is with leading and trailing whitespaces. Example :
    {noformat}
    <property>
    <name>test.key.withoutwhitespace</name>
    <value>
    Test Value
    </value>
    </property>
    <property>
    <name>test.key.withwhitespace</name>
    <value preserve-whitespace="true">
    Test Value
    </value>
    </property>
    {noformat}

    The value of the _test.key.withoutwhitespace_ would be "Test Value" and _test.key.withwhitespace_ is "\nTest Value\n".

    New lines and leading spaces are not trimmed of a value when configuration is read
    ----------------------------------------------------------------------------------

    Key: HADOOP-4212
    URL: https://issues.apache.org/jira/browse/HADOOP-4212
    Project: Hadoop Core
    Issue Type: Bug
    Components: conf
    Affects Versions: 0.18.1
    Environment: Generic
    Reporter: Sreekanth Ramakrishnan
    Assignee: Sreekanth Ramakrishnan
    Priority: Minor
    Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch


    While configuration value is read the leading and trailing spaces and new line characters are taken into account.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tom White (JIRA) at Sep 19, 2008 at 9:57 am
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632606#action_12632606 ]

    Tom White commented on HADOOP-4212:
    -----------------------------------

    How about using XML's xml:space="preserve" instead (http://www.w3.org/TR/REC-xml/#sec-white-space)?
    New lines and leading spaces are not trimmed of a value when configuration is read
    ----------------------------------------------------------------------------------

    Key: HADOOP-4212
    URL: https://issues.apache.org/jira/browse/HADOOP-4212
    Project: Hadoop Core
    Issue Type: Bug
    Components: conf
    Affects Versions: 0.18.1
    Environment: Generic
    Reporter: Sreekanth Ramakrishnan
    Assignee: Sreekanth Ramakrishnan
    Priority: Minor
    Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch


    While configuration value is read the leading and trailing spaces and new line characters are taken into account.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Steve Loughran (JIRA) at Sep 22, 2008 at 11:05 am
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633212#action_12633212 ]

    Steve Loughran commented on HADOOP-4212:
    ----------------------------------------

    Are there any places where preserving leading/trailing whitespace is depended on?
    New lines and leading spaces are not trimmed of a value when configuration is read
    ----------------------------------------------------------------------------------

    Key: HADOOP-4212
    URL: https://issues.apache.org/jira/browse/HADOOP-4212
    Project: Hadoop Core
    Issue Type: Bug
    Components: conf
    Affects Versions: 0.18.1
    Environment: Generic
    Reporter: Sreekanth Ramakrishnan
    Assignee: Sreekanth Ramakrishnan
    Priority: Minor
    Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch


    While configuration value is read the leading and trailing spaces and new line characters are taken into account.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sreekanth Ramakrishnan (JIRA) at Sep 22, 2008 at 11:17 am
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633215#action_12633215 ]

    Sreekanth Ramakrishnan commented on HADOOP-4212:
    ------------------------------------------------

    Only place it would and should be dependent on is Filesystem path's but does trailing space make sense with respect to them? Won't it be problematic when looking at windows file system?

    With respect to xml:space attribute it would be use only if the parser which we use is a XML validating parser. I dont see that we are using XML validating parser when parsing for the configuration. So should we now parse all configuration using validation turned on?

    Moreover, there is a similar issue reported here [HADOOP-2366|https://issues.apache.org/jira/browse/HADOOP-2366]
    New lines and leading spaces are not trimmed of a value when configuration is read
    ----------------------------------------------------------------------------------

    Key: HADOOP-4212
    URL: https://issues.apache.org/jira/browse/HADOOP-4212
    Project: Hadoop Core
    Issue Type: Bug
    Components: conf
    Affects Versions: 0.18.1
    Environment: Generic
    Reporter: Sreekanth Ramakrishnan
    Assignee: Sreekanth Ramakrishnan
    Priority: Minor
    Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch


    While configuration value is read the leading and trailing spaces and new line characters are taken into account.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Steve Loughran (JIRA) at Sep 23, 2008 at 9:54 am
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633665#action_12633665 ]

    Steve Loughran commented on HADOOP-4212:
    ----------------------------------------

    -You can look for the xml:space attribute on any element and act on it; when working with XSD-schema'd docs I think xerces behaves differently when it hits it, but I forget these things.

    -yes, it would cause windows to behave differently and not allow filenames with trailing spaces, or other strings. But I dont see that filenames with trailing spaces and carriage returns do actually make sense, even on windows. Spaces mid-path, maybe, but leading or trailing? Danger.

    FWIW, I'm not using the XML format for our configurations; we use our own configuration format

    http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/components/hadoop/src/org/smartfrog/services/hadoop/components/hadoopconfiguration.sf?view=markup

    Looking at the current declarations, there's nowhere where white space is useful, and there are places (in comma separated lists), where it may already be harmful and need filtering. There may be some inconsistency between filenames (HADOOP-2366) and user group information, where spaces between words are allowed in hadoop.job.ugi. I would propose

    -consistent filtering of spaces wherever lists are taken (strip leading, trailing),
    -trim leading, tailing whitespace

    What may make sense is to allow quoted whitespace, so you could have a list of directories, those in quotes would be passed down as is:

    <name>dfs.data.dir</name>
    <value>/mnt/hstore2/hdfs , "/home/user2/temp hadoop dir"</value>

    This would resolve to a list with two entries ["/mnt/hstore2/hdfs","/home/user2/temp hadoop dir"]




    New lines and leading spaces are not trimmed of a value when configuration is read
    ----------------------------------------------------------------------------------

    Key: HADOOP-4212
    URL: https://issues.apache.org/jira/browse/HADOOP-4212
    Project: Hadoop Core
    Issue Type: Bug
    Components: conf
    Affects Versions: 0.18.1
    Environment: Generic
    Reporter: Sreekanth Ramakrishnan
    Assignee: Sreekanth Ramakrishnan
    Priority: Minor
    Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch


    While configuration value is read the leading and trailing spaces and new line characters are taken into account.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Sep 26, 2008 at 12:12 am
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634704#action_12634704 ]

    Tsz Wo (Nicholas), SZE commented on HADOOP-4212:
    ------------------------------------------------
    Won't it be problematic when looking at windows file system?
    Configuration is not a part of file system API. It should support generic usage.

    When I worked on HADOOP-2461, I think that the property names should be trimmed but not the values. Otherwise, it forbids the potential use of leading and trailing spaces. If there is a need, the codes using the conf values should do the trimming.
    New lines and leading spaces are not trimmed of a value when configuration is read
    ----------------------------------------------------------------------------------

    Key: HADOOP-4212
    URL: https://issues.apache.org/jira/browse/HADOOP-4212
    Project: Hadoop Core
    Issue Type: Bug
    Components: conf
    Affects Versions: 0.18.1
    Environment: Generic
    Reporter: Sreekanth Ramakrishnan
    Assignee: Sreekanth Ramakrishnan
    Priority: Minor
    Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch


    While configuration value is read the leading and trailing spaces and new line characters are taken into account.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sreekanth Ramakrishnan (JIRA) at Oct 24, 2008 at 8:00 am
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642404#action_12642404 ]

    Sreekanth Ramakrishnan commented on HADOOP-4212:
    ------------------------------------------------

    The most of the use cases, except the filename's don't require a trailing whitespace for instance [HADOOP-4416|https://issues.apache.org/jira/browse/HADOOP-4416], So shouldn't we just treat file paths as special case and allow users to mention path which have trailing and leading spaces in special way? That way we can make sure that the user intends to use them instead of adding a space by accident?
    New lines and leading spaces are not trimmed of a value when configuration is read
    ----------------------------------------------------------------------------------

    Key: HADOOP-4212
    URL: https://issues.apache.org/jira/browse/HADOOP-4212
    Project: Hadoop Core
    Issue Type: Bug
    Components: conf
    Affects Versions: 0.18.1
    Environment: Generic
    Reporter: Sreekanth Ramakrishnan
    Assignee: Sreekanth Ramakrishnan
    Priority: Minor
    Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch


    While configuration value is read the leading and trailing spaces and new line characters are taken into account.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Oct 24, 2008 at 8:46 am
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642416#action_12642416 ]

    Chris Douglas commented on HADOOP-4212:
    ---------------------------------------

    bq. The most of the use cases, except the filename's don't require a trailing whitespace for instance HADOOP-4416, So shouldn't we just treat file paths as special case and allow users to mention path which have trailing and leading spaces in special way? That way we can make sure that the user intends to use them instead of adding a space by accident?

    I disagree. Configuration should assume the semantics of the value Strings are known to the user defining the property. Where the user _instructs_ the Configuration to interpret the value, the semantics are explicit and sanitizing input is reasonable. In addition to resolving class names, it would be reasonable to trim values interpreted in getInt, getLong, etc., though it would still be an incompatible change (IIRC, the default value is returned for invalid input). It's certainly an incompatible change in the general case. I think we should close this issue as "Won't fix" and consider broadening the scope of HADOOP-4416.
    New lines and leading spaces are not trimmed of a value when configuration is read
    ----------------------------------------------------------------------------------

    Key: HADOOP-4212
    URL: https://issues.apache.org/jira/browse/HADOOP-4212
    Project: Hadoop Core
    Issue Type: Bug
    Components: conf
    Affects Versions: 0.18.1
    Environment: Generic
    Reporter: Sreekanth Ramakrishnan
    Assignee: Sreekanth Ramakrishnan
    Priority: Minor
    Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch


    While configuration value is read the leading and trailing spaces and new line characters are taken into account.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sreekanth Ramakrishnan (JIRA) at Oct 24, 2008 at 10:46 am
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642433#action_12642433 ]

    Sreekanth Ramakrishnan commented on HADOOP-4212:
    ------------------------------------------------

    I think we should have a generic way of reading values than to expect each implementer of new getXXX() method in Configuration or a class which subclasses configuration to remember he has to deal with a leading or a trailing space.

    bq. In addition to resolving class names, it would be reasonable to trim values interpreted in getInt, getLong, etc., though it would still be an incompatible change
    As per the suggestion all that remains untouched in the Configuration getXXX methods are getLocalPath() and getFile(). I still feel that getString() method should be trimmed and be passed to the user. For I have one use case which I am mentioning here:


    An HADOOP administrator configures a job queue: with name A and accidentially adds a space at the end.

    User looks at the ./hadoop queue list and finds out there is a job queue A, and does not notice the extra space which is hidden in output in console and in web. And mentions in his jobconf to submit to job queue A, the system checks for queue information in job sees that there is no queue called A(without space i.e.) and submits the job to the default queue. Which is wrong.

    You might argue that as an implementer, I should do checking with trimming the space, but then again this can cause bugs to due accidental mis-configuration. I would lean in towards trimming the space unless explicitly mentioned not to for getString method.
    New lines and leading spaces are not trimmed of a value when configuration is read
    ----------------------------------------------------------------------------------

    Key: HADOOP-4212
    URL: https://issues.apache.org/jira/browse/HADOOP-4212
    Project: Hadoop Core
    Issue Type: Bug
    Components: conf
    Affects Versions: 0.18.1
    Environment: Generic
    Reporter: Sreekanth Ramakrishnan
    Assignee: Sreekanth Ramakrishnan
    Priority: Minor
    Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch


    While configuration value is read the leading and trailing spaces and new line characters are taken into account.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Chris Douglas (JIRA) at Oct 24, 2008 at 6:38 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642508#action_12642508 ]

    Chris Douglas commented on HADOOP-4212:
    ---------------------------------------

    bq. User looks at the ./hadoop queue list and finds out there is a job queue A, and does not notice the extra space which is hidden in output in console and in web. And mentions in his jobconf to submit to job queue A, the system checks for queue information in job sees that there is no queue called A(without space i.e.) and submits the job to the default queue. Which is wrong.
    bq. You might argue that as an implementer, I should do checking with trimming the space, but then again this can cause bugs to due accidental mis-configuration.

    I don't follow. Configuration should call trim before returning the queue name, but if the callee(s) were to call trim on the same String, it "can cause bugs [due to] accidental misconfiguration?"
    New lines and leading spaces are not trimmed of a value when configuration is read
    ----------------------------------------------------------------------------------

    Key: HADOOP-4212
    URL: https://issues.apache.org/jira/browse/HADOOP-4212
    Project: Hadoop Core
    Issue Type: Bug
    Components: conf
    Affects Versions: 0.18.1
    Environment: Generic
    Reporter: Sreekanth Ramakrishnan
    Assignee: Sreekanth Ramakrishnan
    Priority: Minor
    Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch


    While configuration value is read the leading and trailing spaces and new line characters are taken into account.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Tsz Wo (Nicholas), SZE (JIRA) at Jun 30, 2009 at 9:49 pm
    [ https://issues.apache.org/jira/browse/HADOOP-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Tsz Wo (Nicholas), SZE resolved HADOOP-4212.
    --------------------------------------------

    Resolution: Duplicate

    This duplicate HADOOP-2366.
    New lines and leading spaces are not trimmed of a value when configuration is read
    ----------------------------------------------------------------------------------

    Key: HADOOP-4212
    URL: https://issues.apache.org/jira/browse/HADOOP-4212
    Project: Hadoop Common
    Issue Type: Bug
    Components: conf
    Affects Versions: 0.18.1
    Environment: Generic
    Reporter: Sreekanth Ramakrishnan
    Assignee: Sreekanth Ramakrishnan
    Priority: Minor
    Attachments: HADOOP-4212-1.patch, HADOOP-4212-TESTCASE.patch


    While configuration value is read the leading and trailing spaces and new line characters are taken into account.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Rafael Turk at Jul 2, 2009 at 11:33 pm
    unsubscribe

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedSep 19, '08 at 5:03a
activeJul 2, '09 at 11:33p
posts15
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase