FAQ
Hi,
Does someone has some data regarding maximum possible number of files over HDFS ?

my second question is, I created small files with small block size up to one lac and read the files from HDFS, reading performance remains almost unaffected with increasing number of files.

The possible reasons I could think are:

1 . One lac isn't a big number to disturb HDFS performance (I used 1 namenode and 4 data nodes)

2. As reading is done directly from datanode with first time interaction with namenode, so reading from different nodes doesn't affect the performance.


If someone could add or negate some information it will be highly appreciated.

Cheers,
Wasim

Search Discussions

  • Brian Bockelman at Jun 5, 2009 at 4:56 pm

    On Jun 5, 2009, at 11:51 AM, Wasim Bari wrote:

    Hi,
    Does someone has some data regarding maximum possible number of
    files over HDFS ?
    Hey Wasim,

    I don't think that there is a maximum limit. Remember:
    1) Less is better. HDFS is optimized for big files.
    2) The amount of memory the HDFS namenode needs is a function of the
    number of files. If you have a huge number of files, you get a huge
    memory requirement.

    1-2 million files is fairly safe if you have a normal-looking namenode
    server (8-16GB RAM). I know some of our UCSD colleagues just ran a
    test where they were able to put more than .5M files in a single
    directory and still have a useable file system.

    Brian
    my second question is, I created small files with small block size
    up to one lac and read the files from HDFS, reading performance
    remains almost unaffected with increasing number of files.

    The possible reasons I could think are:

    1 . One lac isn't a big number to disturb HDFS performance (I used
    1 namenode and 4 data nodes)

    2. As reading is done directly from datanode with first time
    interaction with namenode, so reading from different nodes doesn't
    affect the performance.


    If someone could add or negate some information it will be highly
    appreciated.

    Cheers,
    Wasim
  • Konstantin Shvachko at Jun 5, 2009 at 5:43 pm
    There are some name-node memory estimates in this jira.
    http://issues.apache.org/jira/browse/HADOOP-1687

    With 16 GB you can normally have 60 million objects (files
    + blocks) on the name-node. The number of files would depend
    on the file to block ratio.

    --Konstantin


    Brian Bockelman wrote:
    On Jun 5, 2009, at 11:51 AM, Wasim Bari wrote:

    Hi,
    Does someone has some data regarding maximum possible number of
    files over HDFS ?
    Hey Wasim,

    I don't think that there is a maximum limit. Remember:
    1) Less is better. HDFS is optimized for big files.
    2) The amount of memory the HDFS namenode needs is a function of the
    number of files. If you have a huge number of files, you get a huge
    memory requirement.

    1-2 million files is fairly safe if you have a normal-looking namenode
    server (8-16GB RAM). I know some of our UCSD colleagues just ran a test
    where they were able to put more than .5M files in a single directory
    and still have a useable file system.

    Brian
    my second question is, I created small files with small block size up
    to one lac and read the files from HDFS, reading performance remains
    almost unaffected with increasing number of files.

    The possible reasons I could think are:

    1 . One lac isn't a big number to disturb HDFS performance (I used 1
    namenode and 4 data nodes)

    2. As reading is done directly from datanode with first time
    interaction with namenode, so reading from different nodes doesn't
    affect the performance.


    If someone could add or negate some information it will be highly
    appreciated.

    Cheers,
    Wasim

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 5, '09 at 4:51p
activeJun 5, '09 at 5:43p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase