FAQ
Hi

I am facing an issue where in I tried to set the "dfs.replication" factor
to 1 from the default value of 3 on a 5 node CDH4 cluster and restarted the
services after doing so, but
change is not reflected when generate files using mapreduce , it still
makes 3 copies of the generated data.

While trying to run a hbase job (putting data in hbase table using
mapreduce job) it throws zookeeper connection exception, the possible
remedy of the error was to set the number of max client connections to
something more than the default value which is 30. Did the changes and
restarted the services still the error shows while running the program.
Guess here too the property is not getting reflected.

Please help in this regard . Thanks in advance!!

Search Discussions

  • Serega Sheypak at May 30, 2013 at 10:34 am
    1. it still makes 3 copies of the generated data.
    What do you mean by that? What command do you use get get replication
    factor of job output? Maybe you are talking about 3 files in output?

    2. Existing data should be re-preplicated. If you had replication-factor =
    3 and then you set replication-factor=1 NameNode should shedule deletion of
    overriplicated blocks.
    Probably, I did put configuration in the wrong place. Can you tell how did
    you change replication factor using Cloudera manager?
    See it here:
    http://hadoop.apache.org/docs/r1.0.4/hdfs_design.html#Data+Replication

    Decrease Replication Factor

    When the replication factor of a file is reduced, the NameNode selects
    excess replicas that can be deleted. The next Heartbeat transfers this
    information to the DataNode. The DataNode then removes the corresponding
    blocks and the corresponding free space appears in the cluster. Once again,
    there might be a time delay between the completion of the setReplication API
    call and the appearance of free space in the cluster.


    2013/5/30 ajit kumar <ajitkr90@gmail.com>
    Hi

    I am facing an issue where in I tried to set the "dfs.replication" factor
    to 1 from the default value of 3 on a 5 node CDH4 cluster and restarted the
    services after doing so, but
    change is not reflected when generate files using mapreduce , it still
    makes 3 copies of the generated data.

    While trying to run a hbase job (putting data in hbase table using
    mapreduce job) it throws zookeeper connection exception, the possible
    remedy of the error was to set the number of max client connections to
    something more than the default value which is 30. Did the changes and
    restarted the services still the error shows while running the program.
    Guess here too the property is not getting reflected.

    Please help in this regard . Thanks in advance!!

  • Philip Zeyliger at May 30, 2013 at 4:09 pm
    Hi Ajit,

    Replication factors are a per-file property, so existing files need to be
    manually updated with "hadoop fs -setrep" if you want to change their
    replication. Furthermore, this is a client property, so you need to
    "redeploy client configuration."

    As for the zookeeper issue, the obvious thing to check is whether or not
    zookeeper has been restarted. You can see how many open connections it has
    with "lsof" (or by connecting to it and using the amusingly named "four
    letter words"
    http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands like
    'stat').

    Cheers,

    -- Philip

    On Thu, May 30, 2013 at 3:34 AM, Serega Sheypak wrote:

    1. it still makes 3 copies of the generated data.
    What do you mean by that? What command do you use get get replication
    factor of job output? Maybe you are talking about 3 files in output?

    2. Existing data should be re-preplicated. If you had replication-factor =
    3 and then you set replication-factor=1 NameNode should shedule deletion of
    overriplicated blocks.
    Probably, I did put configuration in the wrong place. Can you tell how did
    you change replication factor using Cloudera manager?
    See it here:
    http://hadoop.apache.org/docs/r1.0.4/hdfs_design.html#Data+Replication

    Decrease Replication Factor

    When the replication factor of a file is reduced, the NameNode selects
    excess replicas that can be deleted. The next Heartbeat transfers this
    information to the DataNode. The DataNode then removes the corresponding
    blocks and the corresponding free space appears in the cluster. Once again,
    there might be a time delay between the completion of the setReplication API
    call and the appearance of free space in the cluster.


    2013/5/30 ajit kumar <ajitkr90@gmail.com>
    Hi

    I am facing an issue where in I tried to set the "dfs.replication" factor
    to 1 from the default value of 3 on a 5 node CDH4 cluster and restarted the
    services after doing so, but
    change is not reflected when generate files using mapreduce , it still
    makes 3 copies of the generated data.

    While trying to run a hbase job (putting data in hbase table using
    mapreduce job) it throws zookeeper connection exception, the possible
    remedy of the error was to set the number of max client connections to
    something more than the default value which is 30. Did the changes and
    restarted the services still the error shows while running the program.
    Guess here too the property is not getting reflected.

    Please help in this regard . Thanks in advance!!

  • Ajit kumar at May 31, 2013 at 11:37 am
    Hi Philip,
              I am not trying to change the replication factor of existing files
    . As I am using cloudera manager to administrate my 5 node cluster ,Now due
    to some reasone i don't want the newly created files replication factor 3
    so I have changed the replication factor property to 1 for the entire
    cluster using cloudera manager and restarted the hdfs service but again
    i'm getting the replication factor 3 for newly created file instead of 1
    .So what could be the possible reason behind that ,Please help regarding
    this.

    thanx
    On Thursday, 30 May 2013 21:38:47 UTC+5:30, Philip Zeyliger wrote:

    Hi Ajit,

    Replication factors are a per-file property, so existing files need to be
    manually updated with "hadoop fs -setrep" if you want to change their
    replication. Furthermore, this is a client property, so you need to
    "redeploy client configuration."

    As for the zookeeper issue, the obvious thing to check is whether or not
    zookeeper has been restarted. You can see how many open connections it has
    with "lsof" (or by connecting to it and using the amusingly named "four
    letter words"
    http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands like
    'stat').

    Cheers,

    -- Philip


    On Thu, May 30, 2013 at 3:34 AM, Serega Sheypak <serega....@gmail.com<javascript:>
    wrote:
    1. it still makes 3 copies of the generated data.
    What do you mean by that? What command do you use get get replication
    factor of job output? Maybe you are talking about 3 files in output?

    2. Existing data should be re-preplicated. If you had replication-factor
    = 3 and then you set replication-factor=1 NameNode should shedule deletion
    of overriplicated blocks.
    Probably, I did put configuration in the wrong place. Can you tell how
    did you change replication factor using Cloudera manager?
    See it here:
    http://hadoop.apache.org/docs/r1.0.4/hdfs_design.html#Data+Replication

    Decrease Replication Factor

    When the replication factor of a file is reduced, the NameNode selects
    excess replicas that can be deleted. The next Heartbeat transfers this
    information to the DataNode. The DataNode then removes the corresponding
    blocks and the corresponding free space appears in the cluster. Once again,
    there might be a time delay between the completion of the setReplication API
    call and the appearance of free space in the cluster.


    2013/5/30 ajit kumar <ajit...@gmail.com <javascript:>>
    Hi

    I am facing an issue where in I tried to set the "dfs.replication"
    factor to 1 from the default value of 3 on a 5 node CDH4 cluster and
    restarted the services after doing so, but
    change is not reflected when generate files using mapreduce , it still
    makes 3 copies of the generated data.

    While trying to run a hbase job (putting data in hbase table using
    mapreduce job) it throws zookeeper connection exception, the possible
    remedy of the error was to set the number of max client connections to
    something more than the default value which is 30. Did the changes and
    restarted the services still the error shows while running the program.
    Guess here too the property is not getting reflected.

    Please help in this regard . Thanks in advance!!

  • Tony Li Xu at May 31, 2013 at 1:59 pm
    Hi Ajit:

    I assume you created your new file from command line? Have you ever tried
    create file from CM (for example, from "Hue Web UI" -> "File Browser" ->
    "Upload"?) after you updated "dfs.replication" and restarted hdfs service?

    --
    Tony

    On Fri, May 31, 2013 at 7:37 AM, ajit kumar wrote:

    Hi Philip,
    I am not trying to change the replication factor of existing
    files . As I am using cloudera manager to administrate my 5 node cluster
    ,Now due to some reasone i don't want the newly created files replication
    factor 3 so I have changed the replication factor property to 1 for the
    entire cluster using cloudera manager and restarted the hdfs service but
    again i'm getting the replication factor 3 for newly created file instead
    of 1 .So what could be the possible reason behind that ,Please help
    regarding this.

    thanx

    On Thursday, 30 May 2013 21:38:47 UTC+5:30, Philip Zeyliger wrote:

    Hi Ajit,

    Replication factors are a per-file property, so existing files need to be
    manually updated with "hadoop fs -setrep" if you want to change their
    replication. Furthermore, this is a client property, so you need to
    "redeploy client configuration."

    As for the zookeeper issue, the obvious thing to check is whether or not
    zookeeper has been restarted. You can see how many open connections it has
    with "lsof" (or by connecting to it and using the amusingly named "four
    letter words" http://zookeeper.**apache.org/doc/r3.1.2/**
    zookeeperAdmin.html#sc_**zkCommands<http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands> like
    'stat').

    Cheers,

    -- Philip

    On Thu, May 30, 2013 at 3:34 AM, Serega Sheypak wrote:

    1. it still makes 3 copies of the generated data.
    What do you mean by that? What command do you use get get replication
    factor of job output? Maybe you are talking about 3 files in output?

    2. Existing data should be re-preplicated. If you had replication-factor
    = 3 and then you set replication-factor=1 NameNode should shedule deletion
    of overriplicated blocks.
    Probably, I did put configuration in the wrong place. Can you tell how
    did you change replication factor using Cloudera manager?
    See it here:
    http://hadoop.apache.org/docs/**r1.0.4/hdfs_design.html#Data+**
    Replication<http://hadoop.apache.org/docs/r1.0.4/hdfs_design.html#Data+Replication>

    Decrease Replication Factor

    When the replication factor of a file is reduced, the NameNode selects
    excess replicas that can be deleted. The next Heartbeat transfers this
    information to the DataNode. The DataNode then removes the corresponding
    blocks and the corresponding free space appears in the cluster. Once again,
    there might be a time delay between the completion of the setReplication API
    call and the appearance of free space in the cluster.


    2013/5/30 ajit kumar <ajit...@gmail.com>

    Hi
    I am facing an issue where in I tried to set the "dfs.replication"
    factor to 1 from the default value of 3 on a 5 node CDH4 cluster and
    restarted the services after doing so, but
    change is not reflected when generate files using mapreduce , it still
    makes 3 copies of the generated data.

    While trying to run a hbase job (putting data in hbase table using
    mapreduce job) it throws zookeeper connection exception, the possible
    remedy of the error was to set the number of max client connections to
    something more than the default value which is 30. Did the changes and
    restarted the services still the error shows while running the program.
    Guess here too the property is not getting reflected.

    Please help in this regard . Thanks in advance!!

  • Tony Li Xu at May 31, 2013 at 2:44 pm
    Hi Ajit:

    I should be more specific. Here is what I found:

    1. If you update your "dfs.replication" parameter in CM, you need to
    restart your hdfs service, and do a "Deploy Client Configuration". After
    you restarted your hdfs service, you can use CM to create new files in HDFS
    (as I mentioned in my last email, from "Hue Web UI" -> "File Browser" ->
    "Upload") , all newly created files will have the new replication factor
    (But any files created from command line still have the old replication
    factor).

    2. If you want to create files in HDFS from command line, you still need to
    modify the "hdfs-site.xml" file in /etc/hadoop/conf (which is a symbolic
    link to /etc/hadoop/conf.cloudera.mapreduce1 in my case) manually. If you
    open the "hdfs-site.xml" file you will see the "dfs.replication" value is
    still 3. Make sure you update it on all nodes have HDFS service. Restart of
    HDFS service is not required.
    ==================================
    Before I manually update "dfs.replication" in hdfs-site.xml on this
    datanode:
    $ hadoop fs -copyFromLocal ./test-file2 /user/tony/
    $ hadoop fs -ls /user/tony/
    Found 1 items
    -rw-r--r-- *3* tony tony 21 2013-05-31 10:38 /user/tony/test-file2

    After I manually update "dfs.replication" to 2 in hdfs-site.xml on this
    datanode:
    $ hadoop fs -rm /user/tony/test-file2
    $ hadoop fs -copyFromLocal ./test-file2 /user/tony/
    $ hadoop fs -ls /user/tony/
    Found 1 items
    -rw-r--r-- *2* tony tony 21 2013-05-31 10:39 /user/tony/test-file2

    The replication factor is 2 now.
    ===============================

    Hope this helps.

    --
    Tony
    On Fri, May 31, 2013 at 9:58 AM, Tony Li Xu wrote:

    Hi Ajit:

    I assume you created your new file from command line? Have you ever tried
    create file from CM (for example, from "Hue Web UI" -> "File Browser" ->
    "Upload"?) after you updated "dfs.replication" and restarted hdfs service?

    --
    Tony

    On Fri, May 31, 2013 at 7:37 AM, ajit kumar wrote:

    Hi Philip,
    I am not trying to change the replication factor of existing
    files . As I am using cloudera manager to administrate my 5 node cluster
    ,Now due to some reasone i don't want the newly created files replication
    factor 3 so I have changed the replication factor property to 1 for the
    entire cluster using cloudera manager and restarted the hdfs service but
    again i'm getting the replication factor 3 for newly created file instead
    of 1 .So what could be the possible reason behind that ,Please help
    regarding this.

    thanx

    On Thursday, 30 May 2013 21:38:47 UTC+5:30, Philip Zeyliger wrote:

    Hi Ajit,

    Replication factors are a per-file property, so existing files need to
    be manually updated with "hadoop fs -setrep" if you want to change their
    replication. Furthermore, this is a client property, so you need to
    "redeploy client configuration."

    As for the zookeeper issue, the obvious thing to check is whether or not
    zookeeper has been restarted. You can see how many open connections it has
    with "lsof" (or by connecting to it and using the amusingly named "four
    letter words" http://zookeeper.**apache.org/doc/r3.1.2/**
    zookeeperAdmin.html#sc_**zkCommands<http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands> like
    'stat').

    Cheers,

    -- Philip

    On Thu, May 30, 2013 at 3:34 AM, Serega Sheypak wrote:

    1. it still makes 3 copies of the generated data.
    What do you mean by that? What command do you use get get replication
    factor of job output? Maybe you are talking about 3 files in output?

    2. Existing data should be re-preplicated. If you had
    replication-factor = 3 and then you set replication-factor=1 NameNode
    should shedule deletion of overriplicated blocks.
    Probably, I did put configuration in the wrong place. Can you tell how
    did you change replication factor using Cloudera manager?
    See it here:
    http://hadoop.apache.org/docs/**r1.0.4/hdfs_design.html#Data+**
    Replication<http://hadoop.apache.org/docs/r1.0.4/hdfs_design.html#Data+Replication>

    Decrease Replication Factor

    When the replication factor of a file is reduced, the NameNode selects
    excess replicas that can be deleted. The next Heartbeat transfers this
    information to the DataNode. The DataNode then removes the corresponding
    blocks and the corresponding free space appears in the cluster. Once again,
    there might be a time delay between the completion of the
    setReplication API call and the appearance of free space in the
    cluster.


    2013/5/30 ajit kumar <ajit...@gmail.com>

    Hi
    I am facing an issue where in I tried to set the "dfs.replication"
    factor to 1 from the default value of 3 on a 5 node CDH4 cluster and
    restarted the services after doing so, but
    change is not reflected when generate files using mapreduce , it still
    makes 3 copies of the generated data.

    While trying to run a hbase job (putting data in hbase table using
    mapreduce job) it throws zookeeper connection exception, the possible
    remedy of the error was to set the number of max client connections to
    something more than the default value which is 30. Did the changes and
    restarted the services still the error shows while running the program.
    Guess here too the property is not getting reflected.

    Please help in this regard . Thanks in advance!!

  • Darren Lo at May 31, 2013 at 4:06 pm
    /etc/hadoop/conf is updated whenever you run the command "Deploy Client
    Configuration" in CM. It is updated for every host that has an HDFS role.
    If your host does not have an HDFS role, you can add a Gateway to that host
    for the HDFS service. You probably also want a MapReduce gateway on the
    same host.

    Did you run Deploy Client Configuration?

    On Fri, May 31, 2013 at 7:44 AM, Tony Li Xu wrote:

    Hi Ajit:

    I should be more specific. Here is what I found:

    1. If you update your "dfs.replication" parameter in CM, you need to
    restart your hdfs service, and do a "Deploy Client Configuration". After
    you restarted your hdfs service, you can use CM to create new files in HDFS
    (as I mentioned in my last email, from "Hue Web UI" -> "File Browser" ->
    "Upload") , all newly created files will have the new replication factor
    (But any files created from command line still have the old replication
    factor).

    2. If you want to create files in HDFS from command line, you still need
    to modify the "hdfs-site.xml" file in /etc/hadoop/conf (which is a symbolic
    link to /etc/hadoop/conf.cloudera.mapreduce1 in my case) manually. If you
    open the "hdfs-site.xml" file you will see the "dfs.replication" value is
    still 3. Make sure you update it on all nodes have HDFS service. Restart of
    HDFS service is not required.
    ==================================
    Before I manually update "dfs.replication" in hdfs-site.xml on this
    datanode:
    $ hadoop fs -copyFromLocal ./test-file2 /user/tony/
    $ hadoop fs -ls /user/tony/
    Found 1 items
    -rw-r--r-- *3* tony tony 21 2013-05-31 10:38
    /user/tony/test-file2

    After I manually update "dfs.replication" to 2 in hdfs-site.xml on this
    datanode:
    $ hadoop fs -rm /user/tony/test-file2
    $ hadoop fs -copyFromLocal ./test-file2 /user/tony/
    $ hadoop fs -ls /user/tony/
    Found 1 items
    -rw-r--r-- *2* tony tony 21 2013-05-31 10:39
    /user/tony/test-file2

    The replication factor is 2 now.
    ===============================

    Hope this helps.

    --
    Tony
    On Fri, May 31, 2013 at 9:58 AM, Tony Li Xu wrote:

    Hi Ajit:

    I assume you created your new file from command line? Have you ever tried
    create file from CM (for example, from "Hue Web UI" -> "File Browser" ->
    "Upload"?) after you updated "dfs.replication" and restarted hdfs service?

    --
    Tony

    On Fri, May 31, 2013 at 7:37 AM, ajit kumar wrote:

    Hi Philip,
    I am not trying to change the replication factor of existing
    files . As I am using cloudera manager to administrate my 5 node cluster
    ,Now due to some reasone i don't want the newly created files replication
    factor 3 so I have changed the replication factor property to 1 for the
    entire cluster using cloudera manager and restarted the hdfs service but
    again i'm getting the replication factor 3 for newly created file instead
    of 1 .So what could be the possible reason behind that ,Please help
    regarding this.

    thanx

    On Thursday, 30 May 2013 21:38:47 UTC+5:30, Philip Zeyliger wrote:

    Hi Ajit,

    Replication factors are a per-file property, so existing files need to
    be manually updated with "hadoop fs -setrep" if you want to change their
    replication. Furthermore, this is a client property, so you need to
    "redeploy client configuration."

    As for the zookeeper issue, the obvious thing to check is whether or
    not zookeeper has been restarted. You can see how many open connections it
    has with "lsof" (or by connecting to it and using the amusingly named "four
    letter words" http://zookeeper.**apache.org/doc/r3.1.2/**
    zookeeperAdmin.html#sc_**zkCommands<http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands> like
    'stat').

    Cheers,

    -- Philip

    On Thu, May 30, 2013 at 3:34 AM, Serega Sheypak wrote:

    1. it still makes 3 copies of the generated data.
    What do you mean by that? What command do you use get get replication
    factor of job output? Maybe you are talking about 3 files in output?

    2. Existing data should be re-preplicated. If you had
    replication-factor = 3 and then you set replication-factor=1 NameNode
    should shedule deletion of overriplicated blocks.
    Probably, I did put configuration in the wrong place. Can you tell how
    did you change replication factor using Cloudera manager?
    See it here:
    http://hadoop.apache.org/docs/**r1.0.4/hdfs_design.html#Data+**
    Replication<http://hadoop.apache.org/docs/r1.0.4/hdfs_design.html#Data+Replication>

    Decrease Replication Factor

    When the replication factor of a file is reduced, the NameNode selects
    excess replicas that can be deleted. The next Heartbeat transfers this
    information to the DataNode. The DataNode then removes the corresponding
    blocks and the corresponding free space appears in the cluster. Once again,
    there might be a time delay between the completion of the
    setReplication API call and the appearance of free space in the
    cluster.


    2013/5/30 ajit kumar <ajit...@gmail.com>

    Hi
    I am facing an issue where in I tried to set the "dfs.replication"
    factor to 1 from the default value of 3 on a 5 node CDH4 cluster and
    restarted the services after doing so, but
    change is not reflected when generate files using mapreduce , it
    still makes 3 copies of the generated data.

    While trying to run a hbase job (putting data in hbase table using
    mapreduce job) it throws zookeeper connection exception, the possible
    remedy of the error was to set the number of max client connections to
    something more than the default value which is 30. Did the changes and
    restarted the services still the error shows while running the program.
    Guess here too the property is not getting reflected.

    Please help in this regard . Thanks in advance!!


    --
    Thanks,
    Darren
  • Tony Li Xu at May 31, 2013 at 5:09 pm
    In my case, since the "/etc/hadoop/conf" is linked to
    "conf.cloudera.mapreduce1", add gateways in "mapreduce" service and
    then "Deploy
    Client Configuration" in CM fixed this issue.

    Thanks Darren.

    --
    Tony

    On Fri, May 31, 2013 at 12:06 PM, Darren Lo wrote:

    /etc/hadoop/conf is updated whenever you run the command "Deploy Client
    Configuration" in CM. It is updated for every host that has an HDFS role.
    If your host does not have an HDFS role, you can add a Gateway to that host
    for the HDFS service. You probably also want a MapReduce gateway on the
    same host.

    Did you run Deploy Client Configuration?

    On Fri, May 31, 2013 at 7:44 AM, Tony Li Xu wrote:

    Hi Ajit:

    I should be more specific. Here is what I found:

    1. If you update your "dfs.replication" parameter in CM, you need to
    restart your hdfs service, and do a "Deploy Client Configuration". After
    you restarted your hdfs service, you can use CM to create new files in HDFS
    (as I mentioned in my last email, from "Hue Web UI" -> "File Browser" ->
    "Upload") , all newly created files will have the new replication factor
    (But any files created from command line still have the old replication
    factor).

    2. If you want to create files in HDFS from command line, you still need
    to modify the "hdfs-site.xml" file in /etc/hadoop/conf (which is a symbolic
    link to /etc/hadoop/conf.cloudera.mapreduce1 in my case) manually. If
    you open the "hdfs-site.xml" file you will see the "dfs.replication"
    value is still 3. Make sure you update it on all nodes have HDFS service.
    Restart of HDFS service is not required.
    ==================================
    Before I manually update "dfs.replication" in hdfs-site.xml on this
    datanode:
    $ hadoop fs -copyFromLocal ./test-file2 /user/tony/
    $ hadoop fs -ls /user/tony/
    Found 1 items
    -rw-r--r-- *3* tony tony 21 2013-05-31 10:38
    /user/tony/test-file2

    After I manually update "dfs.replication" to 2 in hdfs-site.xml on this
    datanode:
    $ hadoop fs -rm /user/tony/test-file2
    $ hadoop fs -copyFromLocal ./test-file2 /user/tony/
    $ hadoop fs -ls /user/tony/
    Found 1 items
    -rw-r--r-- *2* tony tony 21 2013-05-31 10:39
    /user/tony/test-file2

    The replication factor is 2 now.
    ===============================

    Hope this helps.

    --
    Tony
    On Fri, May 31, 2013 at 9:58 AM, Tony Li Xu wrote:

    Hi Ajit:

    I assume you created your new file from command line? Have you ever
    tried create file from CM (for example, from "Hue Web UI" -> "File Browser"
    -> "Upload"?) after you updated "dfs.replication" and restarted hdfs
    service?

    --
    Tony

    On Fri, May 31, 2013 at 7:37 AM, ajit kumar wrote:

    Hi Philip,
    I am not trying to change the replication factor of existing
    files . As I am using cloudera manager to administrate my 5 node cluster
    ,Now due to some reasone i don't want the newly created files replication
    factor 3 so I have changed the replication factor property to 1 for the
    entire cluster using cloudera manager and restarted the hdfs service but
    again i'm getting the replication factor 3 for newly created file instead
    of 1 .So what could be the possible reason behind that ,Please help
    regarding this.

    thanx

    On Thursday, 30 May 2013 21:38:47 UTC+5:30, Philip Zeyliger wrote:

    Hi Ajit,

    Replication factors are a per-file property, so existing files need to
    be manually updated with "hadoop fs -setrep" if you want to change their
    replication. Furthermore, this is a client property, so you need to
    "redeploy client configuration."

    As for the zookeeper issue, the obvious thing to check is whether or
    not zookeeper has been restarted. You can see how many open connections it
    has with "lsof" (or by connecting to it and using the amusingly named "four
    letter words" http://zookeeper.**apache.org/doc/r3.1.2/**
    zookeeperAdmin.html#sc_**zkCommands<http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands> like
    'stat').

    Cheers,

    -- Philip

    On Thu, May 30, 2013 at 3:34 AM, Serega Sheypak wrote:

    1. it still makes 3 copies of the generated data.
    What do you mean by that? What command do you use get get replication
    factor of job output? Maybe you are talking about 3 files in output?

    2. Existing data should be re-preplicated. If you had
    replication-factor = 3 and then you set replication-factor=1 NameNode
    should shedule deletion of overriplicated blocks.
    Probably, I did put configuration in the wrong place. Can you tell
    how did you change replication factor using Cloudera manager?
    See it here:
    http://hadoop.apache.org/docs/**r1.0.4/hdfs_design.html#Data+**
    Replication<http://hadoop.apache.org/docs/r1.0.4/hdfs_design.html#Data+Replication>

    Decrease Replication Factor

    When the replication factor of a file is reduced, the NameNode
    selects excess replicas that can be deleted. The next Heartbeat transfers
    this information to the DataNode. The DataNode then removes the
    corresponding blocks and the corresponding free space appears in the
    cluster. Once again, there might be a time delay between the completion of
    the setReplication API call and the appearance of free space in the
    cluster.


    2013/5/30 ajit kumar <ajit...@gmail.com>

    Hi
    I am facing an issue where in I tried to set the "dfs.replication"
    factor to 1 from the default value of 3 on a 5 node CDH4 cluster and
    restarted the services after doing so, but
    change is not reflected when generate files using mapreduce , it
    still makes 3 copies of the generated data.

    While trying to run a hbase job (putting data in hbase table using
    mapreduce job) it throws zookeeper connection exception, the possible
    remedy of the error was to set the number of max client connections to
    something more than the default value which is 30. Did the changes and
    restarted the services still the error shows while running the program.
    Guess here too the property is not getting reflected.

    Please help in this regard . Thanks in advance!!


    --
    Thanks,
    Darren
  • Darren Lo at May 31, 2013 at 5:17 pm
    Glad you solved the problem!

    I forgot to mention that HDFS, YARN, and MR all use the same conf dir, and
    by default MR wins (controlled by alternatives priority, which you can
    configure for each service under Gateway settings).

    As you found, gateways can be used to control where configs get deployed.

    Thanks,
    Darren

    On Fri, May 31, 2013 at 10:09 AM, Tony Li Xu wrote:

    In my case, since the "/etc/hadoop/conf" is linked to
    "conf.cloudera.mapreduce1", add gateways in "mapreduce" service and then "Deploy
    Client Configuration" in CM fixed this issue.

    Thanks Darren.

    --
    Tony

    On Fri, May 31, 2013 at 12:06 PM, Darren Lo wrote:

    /etc/hadoop/conf is updated whenever you run the command "Deploy Client
    Configuration" in CM. It is updated for every host that has an HDFS role.
    If your host does not have an HDFS role, you can add a Gateway to that host
    for the HDFS service. You probably also want a MapReduce gateway on the
    same host.

    Did you run Deploy Client Configuration?

    On Fri, May 31, 2013 at 7:44 AM, Tony Li Xu wrote:

    Hi Ajit:

    I should be more specific. Here is what I found:

    1. If you update your "dfs.replication" parameter in CM, you need to
    restart your hdfs service, and do a "Deploy Client Configuration".
    After you restarted your hdfs service, you can use CM to create new files
    in HDFS (as I mentioned in my last email, from "Hue Web UI" -> "File
    Browser" -> "Upload") , all newly created files will have the new
    replication factor (But any files created from command line still have the
    old replication factor).

    2. If you want to create files in HDFS from command line, you still need
    to modify the "hdfs-site.xml" file in /etc/hadoop/conf (which is a symbolic
    link to /etc/hadoop/conf.cloudera.mapreduce1 in my case) manually. If
    you open the "hdfs-site.xml" file you will see the "dfs.replication"
    value is still 3. Make sure you update it on all nodes have HDFS service.
    Restart of HDFS service is not required.
    ==================================
    Before I manually update "dfs.replication" in hdfs-site.xml on this
    datanode:
    $ hadoop fs -copyFromLocal ./test-file2 /user/tony/
    $ hadoop fs -ls /user/tony/
    Found 1 items
    -rw-r--r-- *3* tony tony 21 2013-05-31 10:38
    /user/tony/test-file2

    After I manually update "dfs.replication" to 2 in hdfs-site.xml on this
    datanode:
    $ hadoop fs -rm /user/tony/test-file2
    $ hadoop fs -copyFromLocal ./test-file2 /user/tony/
    $ hadoop fs -ls /user/tony/
    Found 1 items
    -rw-r--r-- *2* tony tony 21 2013-05-31 10:39
    /user/tony/test-file2

    The replication factor is 2 now.
    ===============================

    Hope this helps.

    --
    Tony
    On Fri, May 31, 2013 at 9:58 AM, Tony Li Xu wrote:

    Hi Ajit:

    I assume you created your new file from command line? Have you ever
    tried create file from CM (for example, from "Hue Web UI" -> "File Browser"
    -> "Upload"?) after you updated "dfs.replication" and restarted hdfs
    service?

    --
    Tony

    On Fri, May 31, 2013 at 7:37 AM, ajit kumar wrote:

    Hi Philip,
    I am not trying to change the replication factor of existing
    files . As I am using cloudera manager to administrate my 5 node cluster
    ,Now due to some reasone i don't want the newly created files replication
    factor 3 so I have changed the replication factor property to 1 for the
    entire cluster using cloudera manager and restarted the hdfs service but
    again i'm getting the replication factor 3 for newly created file instead
    of 1 .So what could be the possible reason behind that ,Please help
    regarding this.

    thanx

    On Thursday, 30 May 2013 21:38:47 UTC+5:30, Philip Zeyliger wrote:

    Hi Ajit,

    Replication factors are a per-file property, so existing files need
    to be manually updated with "hadoop fs -setrep" if you want to change their
    replication. Furthermore, this is a client property, so you need to
    "redeploy client configuration."

    As for the zookeeper issue, the obvious thing to check is whether or
    not zookeeper has been restarted. You can see how many open connections it
    has with "lsof" (or by connecting to it and using the amusingly named "four
    letter words" http://zookeeper.**apache.org/doc/r3.1.2/**
    zookeeperAdmin.html#sc_**zkCommands<http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands> like
    'stat').

    Cheers,

    -- Philip


    On Thu, May 30, 2013 at 3:34 AM, Serega Sheypak <serega....@gmail.com
    wrote:
    1. it still makes 3 copies of the generated data.
    What do you mean by that? What command do you use get get
    replication factor of job output? Maybe you are talking about 3 files in
    output?

    2. Existing data should be re-preplicated. If you had
    replication-factor = 3 and then you set replication-factor=1 NameNode
    should shedule deletion of overriplicated blocks.
    Probably, I did put configuration in the wrong place. Can you tell
    how did you change replication factor using Cloudera manager?
    See it here:
    http://hadoop.apache.org/docs/**r1.0.4/hdfs_design.html#Data+**
    Replication<http://hadoop.apache.org/docs/r1.0.4/hdfs_design.html#Data+Replication>

    Decrease Replication Factor

    When the replication factor of a file is reduced, the NameNode
    selects excess replicas that can be deleted. The next Heartbeat transfers
    this information to the DataNode. The DataNode then removes the
    corresponding blocks and the corresponding free space appears in the
    cluster. Once again, there might be a time delay between the completion of
    the setReplication API call and the appearance of free space in the
    cluster.


    2013/5/30 ajit kumar <ajit...@gmail.com>

    Hi
    I am facing an issue where in I tried to set the "dfs.replication"
    factor to 1 from the default value of 3 on a 5 node CDH4 cluster and
    restarted the services after doing so, but
    change is not reflected when generate files using mapreduce , it
    still makes 3 copies of the generated data.

    While trying to run a hbase job (putting data in hbase table using
    mapreduce job) it throws zookeeper connection exception, the possible
    remedy of the error was to set the number of max client connections to
    something more than the default value which is 30. Did the changes and
    restarted the services still the error shows while running the program.
    Guess here too the property is not getting reflected.

    Please help in this regard . Thanks in advance!!


    --
    Thanks,
    Darren

    --
    Thanks,
    Darren
  • Ajit kumar at Jun 6, 2013 at 6:55 am
    its help me and resolved the problem with cloudera manager .Thanks a lot
    guys.
  • Hari Sekhon at May 31, 2013 at 2:02 pm
    Did you "Deploy Client Configuration"?

    On 31 May 2013 12:37, ajit kumar wrote:

    Hi Philip,
    I am not trying to change the replication factor of existing
    files . As I am using cloudera manager to administrate my 5 node cluster
    ,Now due to some reasone i don't want the newly created files replication
    factor 3 so I have changed the replication factor property to 1 for the
    entire cluster using cloudera manager and restarted the hdfs service but
    again i'm getting the replication factor 3 for newly created file instead
    of 1 .So what could be the possible reason behind that ,Please help
    regarding this.

    thanx

    On Thursday, 30 May 2013 21:38:47 UTC+5:30, Philip Zeyliger wrote:

    Hi Ajit,

    Replication factors are a per-file property, so existing files need to be
    manually updated with "hadoop fs -setrep" if you want to change their
    replication. Furthermore, this is a client property, so you need to
    "redeploy client configuration."

    As for the zookeeper issue, the obvious thing to check is whether or not
    zookeeper has been restarted. You can see how many open connections it has
    with "lsof" (or by connecting to it and using the amusingly named "four
    letter words" http://zookeeper.**apache.org/doc/r3.1.2/**
    zookeeperAdmin.html#sc_**zkCommands<http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_zkCommands> like
    'stat').

    Cheers,

    -- Philip

    On Thu, May 30, 2013 at 3:34 AM, Serega Sheypak wrote:

    1. it still makes 3 copies of the generated data.
    What do you mean by that? What command do you use get get replication
    factor of job output? Maybe you are talking about 3 files in output?

    2. Existing data should be re-preplicated. If you had replication-factor
    = 3 and then you set replication-factor=1 NameNode should shedule deletion
    of overriplicated blocks.
    Probably, I did put configuration in the wrong place. Can you tell how
    did you change replication factor using Cloudera manager?
    See it here:
    http://hadoop.apache.org/docs/**r1.0.4/hdfs_design.html#Data+**
    Replication<http://hadoop.apache.org/docs/r1.0.4/hdfs_design.html#Data+Replication>

    Decrease Replication Factor

    When the replication factor of a file is reduced, the NameNode selects
    excess replicas that can be deleted. The next Heartbeat transfers this
    information to the DataNode. The DataNode then removes the corresponding
    blocks and the corresponding free space appears in the cluster. Once again,
    there might be a time delay between the completion of the setReplication API
    call and the appearance of free space in the cluster.


    2013/5/30 ajit kumar <ajit...@gmail.com>

    Hi
    I am facing an issue where in I tried to set the "dfs.replication"
    factor to 1 from the default value of 3 on a 5 node CDH4 cluster and
    restarted the services after doing so, but
    change is not reflected when generate files using mapreduce , it still
    makes 3 copies of the generated data.

    While trying to run a hbase job (putting data in hbase table using
    mapreduce job) it throws zookeeper connection exception, the possible
    remedy of the error was to set the number of max client connections to
    something more than the default value which is 30. Did the changes and
    restarted the services still the error shows while running the program.
    Guess here too the property is not getting reflected.

    Please help in this regard . Thanks in advance!!

  • Ajit kumar at May 31, 2013 at 11:38 am
    Hi Philip,
              I am not trying to change the replication factor of existing files
    . As I am using cloudera manager to administrate my 5 node cluster ,Now due
    to some reasone i don't want the newly created files replication factor 3
    so I have changed the replication factor property to 1 for the entire
    cluster using cloudera manager and restarted the hdfs service but again
    i'm getting the replication factor 3 for newly created file instead of 1
    .So what could be the possible reason behind that ,Please help regarding
    this.

    thanx

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupscm-users @
categorieshadoop
postedMay 30, '13 at 10:25a
activeJun 6, '13 at 6:55a
posts12
users6
websitecloudera.com
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase