FAQ
Configuration should provide a way to write only properties that have been set
------------------------------------------------------------------------------

Key: HADOOP-5708
URL: https://issues.apache.org/jira/browse/HADOOP-5708
Project: Hadoop Core
Issue Type: Improvement
Components: conf
Affects Versions: 0.19.1
Reporter: Topher ZiCornell
Priority: Minor


The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.

Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.

I propose to add methods to write out only the overlay values in Object and XML formats.

I see two options for implementing this:
1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)

Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.

Oh, and in case it's not clear, I'm offering to make this change and submit it.

Thoughts?

. Topher

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Doug Cutting (JIRA) at Apr 20, 2009 at 9:37 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700952#action_12700952 ]

    Doug Cutting commented on HADOOP-5708:
    --------------------------------------
    The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    The defaults come from the jar file, and the jar must currently have the same version of the code in the cluster, so in practice we overwrite things with the same values.

    Long-term, if we permit users with one version of Hadoop installed to submit jobs to a cluster running a different version, then sending the defaults could perhaps cause problems. But anything that should not be overridden should be declared final in the cluster's configuration, and otherwise the user's configuration, including defaults, should be observed, no?

    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor

    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Topher ZiCornell (JIRA) at Apr 20, 2009 at 10:09 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700963#action_12700963 ]

    Topher ZiCornell commented on HADOOP-5708:
    ------------------------------------------
    The defaults come from the jar file, and the jar must currently have the same version of the code in the cluster, so in practice we overwrite things with the same values.
    I think you might be making the assumption that the defaults you package with the jar and ship out are the only defaults that could possibly be loaded. That's not true.

    The defaults are loaded from the classpath. There are many ways defaults can be introduced for specific environments. Hod itself took advantage of this by writing a hadoop-site.xml file in a directory, which then gets added by the hadoop script to the front of the classpath so that it's the first instance of that file encountered. Even extending that example a bit, hod pulls _it's_ defaults from a default configuration directory, which may or may not be what was packaged with Hadoop.

    In short, the product team doesn't (and shouldn't) need to be aware of what the operations team is setting as the defaults.
    anything that should not be overridden should be declared final in the cluster's configuration, and otherwise the user's configuration, including defaults, should be observed, no?
    Actually, that's not the issue. You're looking at the scenario where I hand-craft my job XML files with only the settings I want to set.

    Let me clarify a bit: I'm lazy. I make my computer do that work for me. It builds the job for me (well, for my team, but nevermind that). If I write that job's Configuration out, it includes all the settings of whatever the defaults are on the computer I'm currently on. When that XML then gets loaded, all those defaults are treated as if they are user-overrides, when in fact that are not.

    In a nutshell: There is currently no way to write an XML just of my settings so that it can be loaded in again.

    . Topher

    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor

    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Apr 20, 2009 at 10:25 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700969#action_12700969 ]

    Doug Cutting commented on HADOOP-5708:
    --------------------------------------
    In a nutshell: There is currently no way to write an XML just of my settings so that it can be loaded in again.
    Sure there is, if you're willing to manipulate the classpath, as you've already stated you are: just put empty {core,hdfs}mapred}-default.xml files on it.
    Hod itself took advantage of this by writing a hadoop-site.xml file in a directory [ ... ]
    Those are not defaults, but site-specific configurations.
    When that XML then gets loaded, all those defaults are treated as if they are user-overrides, when in fact that are not.
    If you've changed your local configuration, overriding some things so that they differ from the distribution, then your local settings should in general be seen on the server when it runs your code, no matter which file they come from.

    I'm not sure what you're trying to fix. If there is a particular configuration property that your local settings are overriding on the server, then perhaps the server's -site config file should be changed to mark that property as final, so that it is not overridden by user code. Or is there some other specific problem you are having?
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor

    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Topher ZiCornell (JIRA) at Apr 20, 2009 at 10:37 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700975#action_12700975 ]

    Topher ZiCornell commented on HADOOP-5708:
    ------------------------------------------
    Sure there is, if you're willing to manipulate the classpath, as you've already stated you are: just put empty {core,hdfs}mapred}-default.xml files on it.
    That's not an option if I'm already in a running JVM. Unless I want to spawn a child-JVM, but that seems a bit heavy-handed.
    Those are not defaults, but site-specific configurations.
    Yes. Does that impact my point somehow?
    I'm not sure what you're trying to fix.
    It does feel like we're missing the connection here.

    I'll send you a direct internal email describing more details around this. (I don't want to get into internal stuff on a public forum.) Perhaps you have some other suggestions, then. ;)

    . Topher
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor

    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Topher ZiCornell (JIRA) at Apr 20, 2009 at 10:54 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700978#action_12700978 ]

    Topher ZiCornell commented on HADOOP-5708:
    ------------------------------------------

    And, just to add: I'm not trying to "fix" anything. I don't think the current implementation is broken. In fact, I think most of the time it's exactly what is needed and desired.

    I'm only trying to provide an extension that would be useful in certain, admittedly extreme, situations. I happen to be bumping into a situation in which this addition would make things much nicer.

    . Topher
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor

    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Apr 20, 2009 at 11:09 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700980#action_12700980 ]

    Doug Cutting commented on HADOOP-5708:
    --------------------------------------
    I happen to be bumping into a situation in which this addition would make things much nicer.
    Can you describe your situation in more detail?
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor

    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Sharad Agarwal (JIRA) at Apr 21, 2009 at 6:30 am
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701071#action_12701071 ]

    Sharad Agarwal commented on HADOOP-5708:
    ----------------------------------------

    An alternate way, would be to instantiate Configuration with loadDefaults flag set to false. That way writeXml won't write default/site properties.
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor

    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Topher ZiCornell (JIRA) at Apr 21, 2009 at 11:30 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701320#action_12701320 ]

    Topher ZiCornell commented on HADOOP-5708:
    ------------------------------------------

    That looks like a viable solution once we get to 0.20. Thank you. The constructor that sets that setting doesn't exist in 0.18, so the current hack I have in place will have to live for a bit longer.

    . Topher
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor

    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Topher ZiCornell (JIRA) at Apr 21, 2009 at 11:31 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Topher ZiCornell resolved HADOOP-5708.
    --------------------------------------

    Resolution: Invalid

    Another solution exists to get the same results as this proposal.
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor

    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shevek (JIRA) at Apr 23, 2009 at 5:04 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702009#action_12702009 ]

    Shevek commented on HADOOP-5708:
    --------------------------------

    I am working on this ticket.
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor

    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shevek (JIRA) at Apr 23, 2009 at 5:50 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shevek updated HADOOP-5708:
    ---------------------------

    Attachment: 01_configuration.patch

    Here's a fairly good stab. It passes all tests in conf/*. I started documenting the structure more clearly, I'll finish when I get back to the office. It has some advantages:

    * There is no longer a REGISTRY of Configurations - memory management is better.
    * Configurations may be arbitrarily layered.
    * Adding a global default does not cause everyone to reload themselves.

    The semantics of overlays are a bit weird, and may be unnecessary now that we have layers, but it made sense to conform exactly to the existing APIs.

    I know the Java memory model extremely well, and in my opinion, if this object _is_ accessed from more than one thread, you're up the creek without the paddle. If you need it thread-safe without losing performance, please say so and I will do an optimal implementation.
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor
    Attachments: 01_configuration.patch


    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Apr 23, 2009 at 6:15 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702038#action_12702038 ]

    Doug Cutting commented on HADOOP-5708:
    --------------------------------------
    I am working on this ticket.
    Shevek, this issue was closed, since the current implementation already supports the requested feature.

    Your re-implementation of Configuration should be submitted in a new issue.

    A few comments on your patch:
    - it includes a lot of whitespace-only changes that make it hard to read
    - the formatting is not in accord with Hadoop conventions
    - while writeXml()'s documentation says that non-default values are not written, in fact they are. This permits, for example the values from one's core-site.xml to be transmitted with a job, as they should be. So the documentation is in error here, not the implementation. If defaults are inherited, then, when writing, we should first write non-default values, then only write each value from the defaults if it's not already been written.
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor
    Attachments: 01_configuration.patch


    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shevek (JIRA) at Apr 23, 2009 at 6:37 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702050#action_12702050 ]

    Shevek commented on HADOOP-5708:
    --------------------------------

    Thanks for the comments, this was a feet-wet patch, so I'll fix as you suggest.

    Does it make sense to have a three-layered default, where core-site is transmitted, but core-default is not? It's dead easy now that Configuration points to a parent.

    Any comments on the threading? Does it need fixing for concurrent access?

    The documentation on how it works and what it _ought_ to do still needs fixing. I only started on that, and haven't done a full read-through.

    Also, the assertions in the test suite are still backwards, etc. :-(
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor
    Attachments: 01_configuration.patch


    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Apr 23, 2009 at 7:05 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702058#action_12702058 ]

    Doug Cutting commented on HADOOP-5708:
    --------------------------------------
    Does it make sense to have a three-layered default, where core-site is transmitted, but core-default is not?
    It's tempting to try to optimize things this way, but I fear the complexity adds confusion. For example, if/when we enable cross-version job submission, i think the right behavior is to use the defaults from the client as much as possible. When it's not possible, the server should ignore them, but it can only have this option if they're transmitted.
    Any comments on the threading? Does it need fixing for concurrent access?
    Configuration should indeed be thread-safe. I did not review your patch with that in mind, though.

    Please open a new issue for this. Thanks!
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor
    Attachments: 01_configuration.patch


    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shevek (JIRA) at Apr 24, 2009 at 7:36 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702498#action_12702498 ]

    Shevek commented on HADOOP-5708:
    --------------------------------

    New issue 5743. All interested parties ...
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor
    Attachments: 01_configuration.patch


    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shevek (JIRA) at Apr 24, 2009 at 8:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702522#action_12702522 ]

    Shevek commented on HADOOP-5708:
    --------------------------------

    Change to wire format would also allow preservation of finalParameters, which is no longer broken with this patch.
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor
    Attachments: 01_configuration.patch


    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shevek (JIRA) at Apr 24, 2009 at 8:48 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shevek updated HADOOP-5708:
    ---------------------------

    Attachment: 02_configuration.patch

    Kinda needs review. Lots of questions in the code. A better synchronization structure rather requires a slightly different data model, where finalParameters is stored within the Map.
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor
    Attachments: 01_configuration.patch, 02_configuration.patch


    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shevek (JIRA) at Apr 24, 2009 at 8:48 pm
    [ https://issues.apache.org/jira/browse/HADOOP-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702544#action_12702544 ]

    Shevek commented on HADOOP-5708:
    --------------------------------

    Comments on wrong bug, sorry. Transferring.
    Configuration should provide a way to write only properties that have been set
    ------------------------------------------------------------------------------

    Key: HADOOP-5708
    URL: https://issues.apache.org/jira/browse/HADOOP-5708
    Project: Hadoop Core
    Issue Type: Improvement
    Components: conf
    Affects Versions: 0.19.1
    Reporter: Topher ZiCornell
    Priority: Minor
    Attachments: 01_configuration.patch, 02_configuration.patch


    The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
    Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
    I propose to add methods to write out only the overlay values in Object and XML formats.
    I see two options for implementing this:
    1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
    2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
    Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
    Oh, and in case it's not clear, I'm offering to make this change and submit it.
    Thoughts?
    . Topher
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedApr 20, '09 at 8:19p
activeApr 24, '09 at 8:48p
posts19
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Shevek (JIRA): 19 posts

People

Translate

site design / logo © 2022 Grokbase