FAQ
Hi,

A few questions and observations. Cloudera Manager 4. I would
like to set replication value to 2.

1. After I set replication to 2 in the GUI, and restart the service,
I notice that /etc/hadoop/conf/hdfs-site.xml still has a value, and
that value is set to 3.

<property>
<name>dfs.replication</name>
<value>3</value>
</property>

How did that get there? Should it be there? If it is there,
would it affect anything? Does this file over-ride other values?

2. Anyways, I remove it from /etc/hadoop/conf/hdfs-site.xml and
restart the service.

When putting a file into hdfs "hdfs dfs -put test.txt /tmp/" I
then see that it's replicated 3 times. So my setting of 2 didn't
work. The value of 2 does appear in the /var/run/cloudera-scm-agent/
process/91-hdfs-NAMENODE/hdfs-site.xml

3. Does this setting ( dfs.replication ) matter for NAMENODE or
DATANODE, or both? Seems like a NAMENODE sort of setting. Why it
it going into the DATANODE files?

4. Is there a way to dump all the settings ( such as
dfs.replication ) regarding a running process, from the command
line. For example, if I have a running namenode, I could type "hdfs
dfsadmin -listsettings" or similar sort of command. Maybe it already
exists? I would like to see what the running process thinks it's own
dfs.replication value is supposed to be.

Search Discussions

  • Sam Darwin at Jun 28, 2012 at 3:19 pm
    # hdfs fsck -blocks
    ...
    Default replication factor: 2
    ...

    however, the file gets put onto three nodes:

    Total number of blocks: 1
    -8524410614713809938: 10.4.58.112:50010 View Block Info
    10.242.65.36:50010 View Block Info 10.243.94.60:50010 View Block
    Info
  • Harsh J at Jun 28, 2012 at 3:22 pm
    Hey Sam,
    On Thu, Jun 28, 2012 at 8:29 PM, Sam Darwin wrote:
    Hi,

    A few questions and observations.    Cloudera Manager 4.     I would
    like to set replication value to 2.

    1.  After I set replication to 2 in the GUI, and restart the service,
    I notice that /etc/hadoop/conf/hdfs-site.xml still has a value, and
    that value is set to 3.

    <property>
    <name>dfs.replication</name>
    <value>3</value>
    </property>

    How did that get there?   Should it be there?     If it is there,
    would it affect anything?     Does this file over-ride other values?
    This file is what clients use it. After updating the dfs.replication
    in CM GUI, ensure you update your client configs via
    https://ccp.cloudera.com/display/FREE4DOC/Deploying+Client+Configuration+Files
    2.   Anyways, I remove it from  /etc/hadoop/conf/hdfs-site.xml and
    restart the service.

    When putting a file into hdfs  "hdfs dfs -put test.txt /tmp/"    I
    then see that it's replicated 3 times.     So my setting of 2 didn't
    work.    The value of 2 does appear in the /var/run/cloudera-scm-agent/
    process/91-hdfs-NAMENODE/hdfs-site.xml
    The default is 3, so if you haven't specified anything at clients, it
    uses 3. New files will hence have 3 as its rep-factor. The /var/run/
    config is for the NameNode service alone, and clients do not use it.
    For clients, please follow my above instructions.
    3.   Does this setting ( dfs.replication ) matter for NAMENODE or
    DATANODE, or both?    Seems like a NAMENODE sort of setting.  Why it
    it going into the DATANODE files?
    It is relevant to clients but yes NameNodes requires this too.
    DataNodes shouldn't require it but there's no harm in specifying it
    there, however, we'll investigate this can be rather removed from DN
    configs just to keep configs as a proper set. In any case, this isn't
    your issue.
    4.   Is there a way to dump all the settings ( such as
    dfs.replication ) regarding a running process, from the command
    line.    For example, if I have a running namenode, I could type "hdfs
    dfsadmin -listsettings" or similar sort of command.  Maybe it already
    exists?   I would like to see what the running process thinks it's own
    dfs.replication value is supposed to be.
    You may get a service's configuration via, say for NameNode,
    http://NNHOST:50070/conf. Same works for DN/SNN, and the HBase
    services too.

    However, for your issue, follow
    https://ccp.cloudera.com/display/FREE4DOC/Deploying+Client+Configuration+Files
    for a solution since your client configs have gone stale. CM4 provides
    a way to auto-deploy configs (without requiring an extract, from UI
    itself) at gateway nodes. So you can choose to go that way too.

    --
    Harsh J

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedJun 28, '12 at 2:59p
activeJun 28, '12 at 3:22p
posts3
users2
websitecloudera.com
irc#hadoop

2 users in discussion

Sam Darwin: 2 posts Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase