Topher ZiCornell commented on HADOOP-5708:
The defaults come from the jar file, and the jar must currently have the same version of the code in the cluster, so in practice we overwrite things with the same values.
I think you might be making the assumption that the defaults you package with the jar and ship out are the only defaults that could possibly be loaded. That's not true.
The defaults are loaded from the classpath. There are many ways defaults can be introduced for specific environments. Hod itself took advantage of this by writing a hadoop-site.xml file in a directory, which then gets added by the hadoop script to the front of the classpath so that it's the first instance of that file encountered. Even extending that example a bit, hod pulls _it's_ defaults from a default configuration directory, which may or may not be what was packaged with Hadoop.
In short, the product team doesn't (and shouldn't) need to be aware of what the operations team is setting as the defaults.
anything that should not be overridden should be declared final in the cluster's configuration, and otherwise the user's configuration, including defaults, should be observed, no?
Actually, that's not the issue. You're looking at the scenario where I hand-craft my job XML files with only the settings I want to set.
Let me clarify a bit: I'm lazy. I make my computer do that work for me. It builds the job for me (well, for my team, but nevermind that). If I write that job's Configuration out, it includes all the settings of whatever the defaults are on the computer I'm currently on. When that XML then gets loaded, all those defaults are treated as if they are user-overrides, when in fact that are not.
In a nutshell: There is currently no way to write an XML just of my settings so that it can be loaded in again.
Configuration should provide a way to write only properties that have been set
Project: Hadoop Core
Issue Type: Improvement
Affects Versions: 0.19.1
Reporter: Topher ZiCornell
The Configuration.write and .writeXml methods always output all properties, whether they came from a default source, a loaded resource file, or an "overlay" set call. There should be a way to write only the properties that were set, leaving out the properties that came from a default source.
Why? Suppose I build a configuration on a machine that is not associated with a grid, write it out to XML, then try to load it on a grid gateway. The configuration would contain all of the defaults picked up from my non-grid machine, and would completely overwrite all the defaults on that grid.
I propose to add methods to write out only the overlay values in Object and XML formats.
I see two options for implementing this:
1) Either completely new methods could be crafted (writeOverlay(DataOutput) and writeOverlayXml(OutputStream), or
2) The existing write() and writeXml() methods could be adjusted to take an additional parameter indicating whether the full properties or overlay properties should be written. (Of course, the existing write() and writeXml() methods would remain, defaulting to the current behavior.)
Option 1 has less impact to existing code. Option 2 is a cleaner implementation with less code-duplication involved. I would much prefer to do option 2.
Oh, and in case it's not clear, I'm offering to make this change and submit it.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.