On Oct 16, 2010, at 5:54 AM, Mag Gam wrote:
I am trying to trim down my configuration files to avoid confusion.
If I were to setup something like this: 1 name node (192.168.0.1) , 1
secondary name node (192.168.0.2), and 10 clients. The name node and
secondary namenode will not serve as a data node.
The clients will access thru a shared NFS configuration, core-site.xml
The datanode will only have, dfs.data.dir in hdfs-site.xml
The name node will have, dfs.name.dir in hdfs-site.xml
I am really not sure what I must do for my secondary name node. What
should I place there?
The idea is I want to reduce my configuration files. Any advise?
You are fighting a losing battle. Throw in the towel now and plan on pushing huge balls of configs. Work on a strategy for clients that are out of sync.
a) Any production Hadoop deployment should expect to have somewhere around 3-4 different sets of configuration files if you configure even some of the intermediate features. [Hello separate audit logs!]
b) You should expect to have all the config files for all the different services. The devs are just now starting to separate what needs what so hopefully by 0.22 we won't have to have worry whether we have all of our bases covered. Remember: if it isn't defined, it will use a default and therefore still work...just not the way you intend.
c) Don't use NFS for pushing configuration files unless it is an HA-NFS configuration, full/redundant connectivity to every switch, etc. The first time the NFS server falls out from underneath the grid, you're going to be in a world of hurt.
We really need https://issues.apache.org/jira/browse/HADOOP-5670
to make this whole mess go away. Until it is nailed down which components need which config vars, it is unlikely to either a) happen or b) work in a way that is actually operable for any reasonable production deployment.