FAQ
I am trying to configure a large install and I have a question about
the configuration of Data Nodes. Each data node has multiple drives.
Each drive is 1TB in size. In the hdfs-site.xml, I can have multiple
directories (which will be mounted drives) specified as shown by:

<property>
<name>dfs.data.dir</name>
<value>/mount1,/mount2,/mount3,....</value>
<final>true</final>
</property>

For the drive that has the OS, only 100G will be used for the OS. Is
it good practice to have a partition on the drive that has the OS used
for the dfs.data.dir? Will this slow things down? Will the size
difference available to each directory be a problem? Also, if it is
not a good idea to use the OS drive, then how about pointing logs to
that drive?

andrew

Search Discussions

  • Alex Loddengaard at Jul 7, 2010 at 6:23 pm
    I would recommend not putting / in dfs.data.dir. You'll want that space for
    logs, which will grow very large in heavily-used clusters (userlogs in
    particular).

    / for OS and logs
    /mount* for mapred.local.dir and dfs.data.dir

    Hope this helps.

    Alex
    On Wed, Jul 7, 2010 at 10:38 AM, A Levine wrote:

    I am trying to configure a large install and I have a question about
    the configuration of Data Nodes. Each data node has multiple drives.
    Each drive is 1TB in size. In the hdfs-site.xml, I can have multiple
    directories (which will be mounted drives) specified as shown by:

    <property>
    <name>dfs.data.dir</name>
    <value>/mount1,/mount2,/mount3,....</value>
    <final>true</final>
    </property>

    For the drive that has the OS, only 100G will be used for the OS. Is
    it good practice to have a partition on the drive that has the OS used
    for the dfs.data.dir? Will this slow things down? Will the size
    difference available to each directory be a problem? Also, if it is
    not a good idea to use the OS drive, then how about pointing logs to
    that drive?

    andrew
  • Allen Wittenauer at Jul 7, 2010 at 7:18 pm

    On Jul 7, 2010, at 10:38 AM, A Levine wrote:
    For the drive that has the OS, only 100G will be used for the OS. Is
    it good practice to have a partition on the drive that has the OS used
    for the dfs.data.dir?
    I've always partitioned out the root drive so that there is a dedicated file system for Hadoop. In other words, the root disk has two (or more) mount points. Just don't use / directly in the hadoop configs. That's asking for trouble.
    Will this slow things down?
    Another spindle = more happiness.

    No. It will speed things up, unless you do a *lot* of heavy streaming.
    Will the size difference available to each directory be a problem?
    It shouldn't be. Most OS partitions are barely a blip. The system will just think you are using more map red space there. :)
    Also, if it is not a good idea to use the OS drive, then how about pointing logs to
    that drive?
    That would work too, but for the most part, compute node logs are fairly useless until you need to do deep debugging. So it is kind of waste of space.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 7, '10 at 5:46p
activeJul 7, '10 at 7:18p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase