FAQ
Currently I am using the default block size of 64MB. I would like to change
it for my cluster to 256 megabytes since I deal with large files (over
2GB). What is the best way to do this?

What file do I have to make the change on? Does it have to be applied on the
namenode or each individual data nodes? What has to get restarted,
namenode, datanode, or both?



--
--- Get your facts first, then you can distort them as you please.--

Search Discussions

  • Ayon Sinha at Feb 3, 2011 at 4:46 pm
    conf/hdfs-site.xml

    restart dfs. I believe it should be sufficient to restart the namenode only, but
    others can confirm.
    -Ayon



    ________________________________
    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Thu, February 3, 2011 4:35:09 AM
    Subject: changing the block size

    Currently I am using the default block size of 64MB. I would like to change it
    for my cluster to 256 megabytes since I deal with large files (over 2GB). What
    is the best way to do this?


    What file do I have to make the change on? Does it have to be applied on the
    namenode or each individual data nodes? What has to get restarted, namenode,
    datanode, or both?



    --
    --- Get your facts first, then you can distort them as you please.--
  • Allen Wittenauer at Feb 3, 2011 at 10:39 pm

    On Feb 3, 2011, at 8:45 AM, Ayon Sinha wrote:

    conf/hdfs-site.xml

    restart dfs. I believe it should be sufficient to restart the namenode only, but
    others can confirm.

    I'd recommend doing it all on nodes, including any clients connecting to the HDFS but aren't part of the HDFS.
  • Rita at Feb 6, 2011 at 4:50 pm
    Neither one was working.

    Is there anything I can do? I always have problems like this in hdfs. It
    seems even experts are guessing at the answers :-/

    On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote:

    conf/hdfs-site.*xml*

    restart dfs. I believe it should be sufficient to restart the namenode
    only, but others can confirm.

    -Ayon

    ------------------------------
    *From:* Rita <rmorgan466@gmail.com>
    *To:* hdfs-user@hadoop.apache.org
    *Sent:* Thu, February 3, 2011 4:35:09 AM
    *Subject:* changing the block size

    Currently I am using the default block size of 64MB. I would like to change
    it for my cluster to 256 megabytes since I deal with large files (over
    2GB). What is the best way to do this?

    What file do I have to make the change on? Does it have to be applied on
    the namenode or each individual data nodes? What has to get restarted,
    namenode, datanode, or both?



    --
    --- Get your facts first, then you can distort them as you please.--

    --
    --- Get your facts first, then you can distort them as you please.--
  • Ayon Sinha at Feb 6, 2011 at 5:15 pm
    The block size change will not affect the current files. It will only be used
    when storing new files on HDFS. The block size is eventually a property of the
    file. The HDFS config file only specifies a default block size for files that
    are created without a block size specification. If you want it to affect the
    current files you will have to write a script to copy to temp and back. I know
    of the shell command that sets the rep factor but dont know of an equivalent for
    block size. But it should be easy to write a script/DFS client code.

    sample API create(Path f, boolean overwrite, int bufferSize) which lets you
    specify the block size when you create a file.
    -Ayon





    ________________________________
    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Sun, February 6, 2011 8:50:11 AM
    Subject: Re: changing the block size

    Neither one was working.

    Is there anything I can do? I always have problems like this in hdfs. It seems
    even experts are guessing at the answers :-/



    On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote:

    conf/hdfs-site.xml
    restart dfs. I believe it should be sufficient to restart the namenode only, but
    others can confirm.

    -Ayon

    ________________________________
    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Thu, February 3, 2011 4:35:09 AM
    Subject: changing the block size


    Currently I am using the default block size of 64MB. I would like to change it
    for my cluster to 256 megabytes since I deal with large files (over 2GB). What
    is the best way to do this?


    What file do I have to make the change on? Does it have to be applied on the
    namenode or each individual data nodes? What has to get restarted, namenode,
    datanode, or both?



    --
    --- Get your facts first, then you can distort them as you please.--

    --
    --- Get your facts first, then you can distort them as you please.--
  • Bharath Mundlapudi at Feb 6, 2011 at 7:26 pm
    Can you tell us, how are you verifying if its not working?

    Edit

    conf/hdfs-site.xml dfs.block.size


    And restart the cluster.

    -Bharath




    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Cc:
    Sent: Sunday, February 6, 2011 8:50 AM
    Subject: Re: changing the block size


    Neither one was working.

    Is there anything I can do? I always have problems like this in hdfs. It seems even experts are guessing at the answers :-/



    On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote:

    conf/hdfs-site.xml
    restart dfs. I believe it should be sufficient to restart the namenode only, but others can confirm.

    -Ayon


    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Thu, February 3, 2011 4:35:09
    AM
    Subject: changing the block size

    Currently I am using the default block size of 64MB. I would like to change it for my cluster to 256 megabytes since I deal with large files (over 2GB). What is the best way to do this?
    What file do I have to make the change on? Does it have to be applied on the namenode or each individual data nodes? What has to get restarted, namenode, datanode, or both?



    --
    --- Get your facts first, then you can distort them as you please.--

    --
    --- Get your facts first, then you can distort them as you please.--
  • Rita at Feb 6, 2011 at 10:24 pm
    Bharath,
    So, I have to restart the entire cluster? So, I need to stop the namenode
    and then run start-dfs.sh ?

    Ayon,
    So, what I did was decommission a node, remove all of its data (rm -rf
    data.dir) and stopped the hdfs process on it. Then I made the change to
    conf/hdfs-site.xml on the data node and then I restarted the datanode. I
    then ran a balancer to take effect and I am still getting 64MB files instead
    of 128MB. :-/




    On Sun, Feb 6, 2011 at 2:25 PM, Bharath Mundlapudi wrote:

    Can you tell us, how are you verifying if its not working?

    Edit
    conf/hdfs-site.xml dfs.block.size


    And restart the cluster.

    -Bharath


    *From:* Rita <rmorgan466@gmail.com>
    *To:* hdfs-user@hadoop.apache.org
    *Cc:*
    *Sent:* Sunday, February 6, 2011 8:50 AM

    *Subject:* Re: changing the block size

    Neither one was working.

    Is there anything I can do? I always have problems like this in hdfs. It
    seems even experts are guessing at the answers :-/


    On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote:

    conf/hdfs-site.*xml*

    restart dfs. I believe it should be sufficient to restart the namenode
    only, but others can confirm.

    -Ayon

    *From:* Rita <rmorgan466@gmail.com>
    *To:* hdfs-user@hadoop.apache.org
    *Sent:* Thu, February 3, 2011 4:35:09 AM
    *Subject:* changing the block size

    Currently I am using the default block size of 64MB. I would like to change
    it for my cluster to 256 megabytes since I deal with large files (over
    2GB). What is the best way to do this?

    What file do I have to make the change on? Does it have to be applied on
    the namenode or each individual data nodes? What has to get restarted,
    namenode, datanode, or both?



    --
    --- Get your facts first, then you can distort them as you please.--




    --
    --- Get your facts first, then you can distort them as you please.--



    --
    --- Get your facts first, then you can distort them as you please.--
  • Ayon Sinha at Feb 6, 2011 at 10:31 pm
    do this test. do a copyFromLocal to create a new file in hdfs . check the block size of this new file. it should be 128mb if ur changes took effect.

    Sent from my iPhone
    On Feb 6, 2011, at 2:24 PM, Rita wrote:

    Bharath,
    So, I have to restart the entire cluster? So, I need to stop the namenode and then run start-dfs.sh ?

    Ayon,
    So, what I did was decommission a node, remove all of its data (rm -rf data.dir) and stopped the hdfs process on it. Then I made the change to conf/hdfs-site.xml on the data node and then I restarted the datanode. I then ran a balancer to take effect and I am still getting 64MB files instead of 128MB. :-/





    On Sun, Feb 6, 2011 at 2:25 PM, Bharath Mundlapudi wrote:
    Can you tell us, how are you verifying if its not working?

    Edit
    conf/hdfs-site.xml dfs.block.size


    And restart the cluster.

    -Bharath


    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Cc:
    Sent: Sunday, February 6, 2011 8:50 AM

    Subject: Re: changing the block size

    Neither one was working.

    Is there anything I can do? I always have problems like this in hdfs. It seems even experts are guessing at the answers :-/


    On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote:
    conf/hdfs-site.xml

    restart dfs. I believe it should be sufficient to restart the namenode only, but others can confirm.

    -Ayon

    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Thu, February 3, 2011 4:35:09 AM
    Subject: changing the block size

    Currently I am using the default block size of 64MB. I would like to change it for my cluster to 256 megabytes since I deal with large files (over 2GB). What is the best way to do this?

    What file do I have to make the change on? Does it have to be applied on the namenode or each individual data nodes? What has to get restarted, namenode, datanode, or both?



    --
    --- Get your facts first, then you can distort them as you please.--




    --
    --- Get your facts first, then you can distort them as you please.--






    --
    --- Get your facts first, then you can distort them as you please.--
  • Rita at Feb 6, 2011 at 10:36 pm
    I will try it tomorrow when I goto class.

    Thanks for the quick response! :-)

    On Sun, Feb 6, 2011 at 5:31 PM, Ayon Sinha wrote:

    do this test. do a copyFromLocal to create a new file in hdfs . check the
    block size of this new file. it should be 128mb if ur changes took effect.

    Sent from my iPhone

    On Feb 6, 2011, at 2:24 PM, Rita wrote:

    Bharath,
    So, I have to restart the entire cluster? So, I need to stop the namenode
    and then run start-dfs.sh ?

    Ayon,
    So, what I did was decommission a node, remove all of its data (rm -rf
    data.dir) and stopped the hdfs process on it. Then I made the change to
    conf/hdfs-site.xml on the data node and then I restarted the datanode. I
    then ran a balancer to take effect and I am still getting 64MB files instead
    of 128MB. :-/





    On Sun, Feb 6, 2011 at 2:25 PM, Bharath Mundlapudi <<bharathwork@yahoo.com>
    bharathwork@yahoo.com> wrote:
    Can you tell us, how are you verifying if its not working?

    Edit
    conf/hdfs-site.xml dfs.block.size


    And restart the cluster.

    -Bharath


    *From:* Rita < <rmorgan466@gmail.com>rmorgan466@gmail.com>
    *To:* <hdfs-user@hadoop.apache.org>hdfs-user@hadoop.apache.org
    *Cc:*
    *Sent:* Sunday, February 6, 2011 8:50 AM

    *Subject:* Re: changing the block size

    Neither one was working.

    Is there anything I can do? I always have problems like this in hdfs. It
    seems even experts are guessing at the answers :-/


    On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha < <ayonsinha@yahoo.com>
    ayonsinha@yahoo.com> wrote:

    conf/hdfs-site.*xml*

    restart dfs. I believe it should be sufficient to restart the namenode
    only, but others can confirm.

    -Ayon

    *From:* Rita < <rmorgan466@gmail.com>rmorgan466@gmail.com>
    *To:* <hdfs-user@hadoop.apache.org>hdfs-user@hadoop.apache.org
    *Sent:* Thu, February 3, 2011 4:35:09 AM
    *Subject:* changing the block size

    Currently I am using the default block size of 64MB. I would like to
    change it for my cluster to 256 megabytes since I deal with large files
    (over 2GB). What is the best way to do this?

    What file do I have to make the change on? Does it have to be applied on
    the namenode or each individual data nodes? What has to get restarted,
    namenode, datanode, or both?



    --
    --- Get your facts first, then you can distort them as you please.--




    --
    --- Get your facts first, then you can distort them as you please.--



    --
    --- Get your facts first, then you can distort them as you please.--

    --
    --- Get your facts first, then you can distort them as you please.--
  • Ayon Sinha at Feb 6, 2011 at 10:34 pm
    the split into blocks are created by namenode. datanodes merely stores and hands back what is asked for. also the block split is done during file creation and not during replication or balancing.

    Sent from my iPhone
    On Feb 6, 2011, at 2:24 PM, Rita wrote:

    Bharath,
    So, I have to restart the entire cluster? So, I need to stop the namenode and then run start-dfs.sh ?

    Ayon,
    So, what I did was decommission a node, remove all of its data (rm -rf data.dir) and stopped the hdfs process on it. Then I made the change to conf/hdfs-site.xml on the data node and then I restarted the datanode. I then ran a balancer to take effect and I am still getting 64MB files instead of 128MB. :-/





    On Sun, Feb 6, 2011 at 2:25 PM, Bharath Mundlapudi wrote:
    Can you tell us, how are you verifying if its not working?

    Edit
    conf/hdfs-site.xml dfs.block.size


    And restart the cluster.

    -Bharath


    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Cc:
    Sent: Sunday, February 6, 2011 8:50 AM

    Subject: Re: changing the block size

    Neither one was working.

    Is there anything I can do? I always have problems like this in hdfs. It seems even experts are guessing at the answers :-/


    On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote:
    conf/hdfs-site.xml

    restart dfs. I believe it should be sufficient to restart the namenode only, but others can confirm.

    -Ayon

    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Thu, February 3, 2011 4:35:09 AM
    Subject: changing the block size

    Currently I am using the default block size of 64MB. I would like to change it for my cluster to 256 megabytes since I deal with large files (over 2GB). What is the best way to do this?

    What file do I have to make the change on? Does it have to be applied on the namenode or each individual data nodes? What has to get restarted, namenode, datanode, or both?



    --
    --- Get your facts first, then you can distort them as you please.--




    --
    --- Get your facts first, then you can distort them as you please.--






    --
    --- Get your facts first, then you can distort them as you please.--
  • Allen Wittenauer at Feb 8, 2011 at 12:15 am

    On Feb 6, 2011, at 2:24 PM, Rita wrote:
    So, what I did was decommission a node, remove all of its data (rm -rf
    data.dir) and stopped the hdfs process on it. Then I made the change to
    conf/hdfs-site.xml on the data node and then I restarted the datanode. I
    then ran a balancer to take effect and I am still getting 64MB files instead
    of 128MB. :-/

    Right.

    As previously mentioned, changing the block size does not change the blocks of the previously written files. In other words, changing the block size does not act as a merging function at the datanode level. In order to change pre-existing files, you'll need to copy the files to a new location, delete the old ones, then mv the new versions back.
  • Bharath Mundlapudi at Feb 7, 2011 at 12:46 am
    Answer depends on what you are trying to achieve. Assuming you are trying to store a file in HDFS using put or copyFromLocal.
    You no need to restart the entire cluster, just Namenode restart is sufficient.

    hadoop-daemon.sh stop namenode
    hadoop-daemon.sh start namenode

    -Bharath





    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org; Bharath Mundlapudi <bharathwork@yahoo.com>
    Cc:
    Sent: Sunday, February 6, 2011 2:24 PM
    Subject: Re: changing the block size


    Bharath,
    So, I have to restart the entire cluster? So, I need to stop the namenode and then run start-dfs.sh ?

    Ayon,
    So, what I did was decommission a node, remove all of its data (rm -rf data.dir) and stopped the hdfs process on it. Then I made the change to conf/hdfs-site.xml on the data node and then I restarted the datanode. I then ran a balancer to take effect and I am still getting 64MB files instead of 128MB. :-/






    On Sun, Feb 6, 2011 at 2:25 PM, Bharath Mundlapudi wrote:

    Can you tell us, how are you verifying if its not working?
    Edit
    conf/hdfs-site.xml dfs.block.size

    And restart the cluster.

    -Bharath




    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Cc:
    Sent: Sunday, February 6, 2011 8:50 AM

    Subject: Re: changing the block size



    Neither one was working.

    Is there anything I can do? I always have problems like this in hdfs. It seems even experts are guessing at the answers :-/



    On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote:

    conf/hdfs-site.xml
    restart dfs. I believe it should be sufficient to restart the namenode only, but others can confirm.

    -Ayon


    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Thu, February 3, 2011 4:35:09
    AM
    Subject: changing the block size

    Currently I am using the default block size of 64MB. I would like to change it for my cluster to 256 megabytes since I deal with large files (over 2GB). What is the best way to do this?
    What file do I have to make the change on? Does it have to be applied on the namenode or each individual data nodes? What has to get restarted, namenode, datanode, or both?



    --
    --- Get your facts first, then you can distort them as you please.--

    --
    --- Get your facts first, then you can distort them as you please.--



    --
    --- Get your facts first, then you can distort them as you please.--
  • Bharath Mundlapudi at Feb 7, 2011 at 12:53 am
    Edit conf/hdfs-site.xml for block size in Namenode. But clean way is to copy this file across the cluster. This new value becomes cluster default.

    -Bharath



    From: Bharath Mundlapudi <bharathwork@yahoo.com>
    To: Rita <rmorgan466@gmail.com>; "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
    Cc:
    Sent: Sunday, February 6, 2011 4:45 PM
    Subject: Re: changing the block size


    Answer depends on what you are trying to achieve. Assuming you are trying to store a file in HDFS using put or copyFromLocal.
    You no need to restart the entire cluster, just Namenode restart is sufficient.

    hadoop-daemon.sh stop namenode
    hadoop-daemon.sh start namenode

    -Bharath





    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org; Bharath Mundlapudi <bharathwork@yahoo.com>
    Cc:
    Sent: Sunday, February 6, 2011 2:24 PM
    Subject: Re: changing the block size


    Bharath,
    So, I have to restart the entire cluster? So, I need to stop the namenode and then run start-dfs.sh ?

    Ayon,
    So, what I did was decommission a node, remove all of its data (rm -rf data.dir) and stopped the hdfs process on it. Then I made the change to conf/hdfs-site.xml on the data node and then I restarted the datanode. I then ran a balancer to take effect and I am still getting 64MB files instead of 128MB. :-/






    On Sun, Feb 6, 2011 at 2:25 PM, Bharath Mundlapudi wrote:

    Can you tell us, how are you verifying if its not working?
    Edit
    conf/hdfs-site.xml dfs.block.size

    And restart the cluster.

    -Bharath




    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Cc:
    Sent: Sunday, February 6, 2011 8:50 AM

    Subject: Re: changing the block size



    Neither one was working.

    Is there anything I can do? I always have problems like this in hdfs. It seems even experts are guessing at the answers :-/



    On Thu, Feb 3, 2011 at 11:45 AM, Ayon Sinha wrote:

    conf/hdfs-site.xml
    restart dfs. I believe it should be sufficient to restart the namenode only, but others can confirm.

    -Ayon


    From: Rita <rmorgan466@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Thu, February 3, 2011 4:35:09
    AM
    Subject: changing the block size

    Currently I am using the default block size of 64MB. I would like to change it for my cluster to 256 megabytes since I deal with large files (over 2GB). What is the best way to do this?
    What file do I have to make the change on? Does it have to be applied on the namenode or each individual data nodes? What has to get restarted, namenode, datanode, or both?



    --
    --- Get your facts first, then you can distort them as you please.--

    --
    --- Get your facts first, then you can distort them as you please.--



    --
    --- Get your facts first, then you can distort them as you please.--

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedFeb 3, '11 at 12:35p
activeFeb 8, '11 at 12:15a
posts13
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase