FAQ
I have a datanode with a ~900GB hard drive in it:

Filesystem Size Used Avail Use% Mounted on
/dev/hda1 878G 384G 450G 47% /

But the NameNode GUI shows 2.57TB:

Node Last
Contact Admin State Configured
Capacity (TB) Used
(TB) Non DFS
Used (TB) Remaining
(TB) Used
(%) Remaining
(%) Blocks hadoopnode2

In Service
2.57
0.37
0.88
1.32
14.44
51.19
7582

I have three other nodes that are identical to this one, but they are all
correctly defined in size. Does anyone know what would cause this? I'm
assuming eventually HDFS will attempt to put too much data on this node, and
things will go Very Badly.

--
Tim Ellis
Data Architect, Riot Games

ps - Another mystery: the other three nodes have 0.56TB data each on them,
but this one has only 0.37TB.

Search Discussions

  • Joey Echeverria at Jun 13, 2011 at 7:15 pm
    By any chance, do you have 3 directories set in dfs.data.dir all of which
    are on /dev/hda1?

    -Joey
    On Mon, Jun 13, 2011 at 3:01 PM, Time Less wrote:

    I have a datanode with a ~900GB hard drive in it:

    Filesystem Size Used Avail Use% Mounted on
    /dev/hda1 878G 384G 450G 47% /

    But the NameNode GUI shows 2.57TB:

    Node Last
    Contact Admin State Configured
    Capacity (TB) Used
    (TB) Non DFS
    Used (TB) Remaining
    (TB) Used
    (%) Remaining
    (%) Blocks hadoopnode2

    In Service
    2.57
    0.37
    0.88
    1.32
    14.44
    51.19
    7582

    I have three other nodes that are identical to this one, but they are all
    correctly defined in size. Does anyone know what would cause this? I'm
    assuming eventually HDFS will attempt to put too much data on this node, and
    things will go Very Badly.

    --
    Tim Ellis
    Data Architect, Riot Games

    ps - Another mystery: the other three nodes have 0.56TB data each on them,
    but this one has only 0.37TB.

    --
    Joseph Echeverria
    Cloudera, Inc.
    443.305.9434
  • Aaron Eng at Jun 13, 2011 at 7:26 pm
    Hey Tim,

    In my experience, if you define multiple data dirs, then what HDFS does is
    something similar to "df <dir>" for each dir. If those dirs happen to be on
    the same partition, then it basically adds up the size of the partition X
    the number of dirs you listed. So if you have ~.9TB drive and HDFS shows
    ~2.6TB data then I'd imagine you have three DFS data dirs defined.

    On another note:
    I'm assuming eventually HDFS will attempt to put too much data on this
    node, and things will go Very Badly.

    Don't use space reported in the GUI as an indicator of cluster health. The
    situation you are referencing can happen even when the correct capacity is
    reported for a node. You have to keep in mind that balancing load/data
    between nodes is more of a manual process (via running the balancer). So
    just because the namenode knows how much space is on each node, that doesn't
    mean that data will be evenly distributed.

    So even if whats reported in the GUI is right, you should still be
    monitoring things on a finer grained level than what is shown there.
    On Mon, Jun 13, 2011 at 12:01 PM, Time Less wrote:

    I have a datanode with a ~900GB hard drive in it:

    Filesystem Size Used Avail Use% Mounted on
    /dev/hda1 878G 384G 450G 47% /

    But the NameNode GUI shows 2.57TB:

    Node Last
    Contact Admin State Configured
    Capacity (TB) Used
    (TB) Non DFS
    Used (TB) Remaining
    (TB) Used
    (%) Remaining
    (%) Blocks hadoopnode2

    In Service
    2.57
    0.37
    0.88
    1.32
    14.44
    51.19
    7582

    I have three other nodes that are identical to this one, but they are all
    correctly defined in size. Does anyone know what would cause this? I'm
    assuming eventually HDFS will attempt to put too much data on this node, and
    things will go Very Badly.

    --
    Tim Ellis
    Data Architect, Riot Games

    ps - Another mystery: the other three nodes have 0.56TB data each on them,
    but this one has only 0.37TB.
  • Time Less at Jun 16, 2011 at 11:34 pm
    Thanks for the replies, all. I think I might've at one point set multiple
    directories all backed by the same physical hardware (when trying to salvage
    the data stored in /tmp problem).

    I can't find that now, but perhaps I fixed the config and haven't yet
    bounced the datanode. I'll check on that at some point in the near future.

    On another note:Don't use space reported in the GUI as an indicator of
    cluster health. The situation you are referencing can happen even when the
    correct capacity is reported for a node. You have to keep in mind that
    balancing load/data between nodes is more of a manual process (via running
    the balancer). So just because the namenode knows how much space is on each
    node, that doesn't mean that data will be evenly distributed.

    Aaah. Interesting. Okay, I'll not trust the GUI anymore after this point.

    --
    Tim

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedJun 13, '11 at 7:02p
activeJun 16, '11 at 11:34p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase