FAQ
I'm running 0.19.0 on a 10 node cluster (8 core, 16GB RAM, 4x1.5TB). The
current status of my FS is approximately 1 million files and directories,
950k blocks, and heap size of 7GB (16GB reserved). Average block replication
is 3.8. I'm concerned that the heap size is steadily climbing... a 7GB heap
is substantially higher per file that I have on a similar 0.18.2 cluster,
which has closer to a 1GB heap.
My typical usage model is 1) write a number of small files into HDFS (tens
or hundreds of thousands at a time), 2) archive those files, 3) delete the
originals. I've tried dropping the replication factor of the _index and
_masterindex files without much effect on overall heap size. While I had
trash enabled at one point, I've since disabled it and deleted the .Trash
folders.

On namenode startup, I get a massive number of the following lines in my log
file:
2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.processReport: block blk_-2389330910609345428_7332878 on
172.16.129.33:50010 size 798080 does not belong to any file.
2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addToInvalidates: blk_-2389330910609345428 is added to invalidSet
of 172.16.129.33:50010

I suspect the original files may be left behind and causing the heap size
bloat. Is there any accounting mechanism to determine what is contributing
to my heap size?

Thanks,
Sean

Search Discussions

  • Jason hadoop at Feb 2, 2009 at 12:01 am
    If your datanodes are pausing and falling out of the cluster you will get a
    large workload for the namenode of blocks to replicate and when the paused
    datanode comes back, a large workload of blocks to delete.
    These lists are stored in memory on the namenode.
    The startup messages lead me to wonder if your datanodes are periodically
    pausing or are otherwise dropping in and out of the cluster.
    On Sat, Jan 31, 2009 at 2:20 PM, Sean Knapp wrote:

    I'm running 0.19.0 on a 10 node cluster (8 core, 16GB RAM, 4x1.5TB). The
    current status of my FS is approximately 1 million files and directories,
    950k blocks, and heap size of 7GB (16GB reserved). Average block
    replication
    is 3.8. I'm concerned that the heap size is steadily climbing... a 7GB heap
    is substantially higher per file that I have on a similar 0.18.2 cluster,
    which has closer to a 1GB heap.
    My typical usage model is 1) write a number of small files into HDFS (tens
    or hundreds of thousands at a time), 2) archive those files, 3) delete the
    originals. I've tried dropping the replication factor of the _index and
    _masterindex files without much effect on overall heap size. While I had
    trash enabled at one point, I've since disabled it and deleted the .Trash
    folders.

    On namenode startup, I get a massive number of the following lines in my
    log
    file:
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.processReport: block blk_-2389330910609345428_7332878 on
    172.16.129.33:50010 size 798080 does not belong to any file.
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.addToInvalidates: blk_-2389330910609345428 is added to
    invalidSet
    of 172.16.129.33:50010

    I suspect the original files may be left behind and causing the heap size
    bloat. Is there any accounting mechanism to determine what is contributing
    to my heap size?

    Thanks,
    Sean
  • Sean Knapp at Feb 2, 2009 at 12:12 am
    Jason,
    Thanks for the response. By falling out, do you mean a longer time since
    last contact (100s+), or fully timed out where it is dropped into dead
    nodes? The former happens fairly often, the latter only under serious load
    but not in the last day. Also, my namenode is now up to 10GB with less than
    700k files after some additional archiving.

    Thanks,
    Sean
    On Sun, Feb 1, 2009 at 4:00 PM, jason hadoop wrote:

    If your datanodes are pausing and falling out of the cluster you will get a
    large workload for the namenode of blocks to replicate and when the paused
    datanode comes back, a large workload of blocks to delete.
    These lists are stored in memory on the namenode.
    The startup messages lead me to wonder if your datanodes are periodically
    pausing or are otherwise dropping in and out of the cluster.
    On Sat, Jan 31, 2009 at 2:20 PM, Sean Knapp wrote:

    I'm running 0.19.0 on a 10 node cluster (8 core, 16GB RAM, 4x1.5TB). The
    current status of my FS is approximately 1 million files and directories,
    950k blocks, and heap size of 7GB (16GB reserved). Average block
    replication
    is 3.8. I'm concerned that the heap size is steadily climbing... a 7GB heap
    is substantially higher per file that I have on a similar 0.18.2 cluster,
    which has closer to a 1GB heap.
    My typical usage model is 1) write a number of small files into HDFS (tens
    or hundreds of thousands at a time), 2) archive those files, 3) delete the
    originals. I've tried dropping the replication factor of the _index and
    _masterindex files without much effect on overall heap size. While I had
    trash enabled at one point, I've since disabled it and deleted the .Trash
    folders.

    On namenode startup, I get a massive number of the following lines in my
    log
    file:
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.processReport: block blk_-2389330910609345428_7332878 on
    172.16.129.33:50010 size 798080 does not belong to any file.
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.addToInvalidates: blk_-2389330910609345428 is added to
    invalidSet
    of 172.16.129.33:50010

    I suspect the original files may be left behind and causing the heap size
    bloat. Is there any accounting mechanism to determine what is
    contributing
    to my heap size?

    Thanks,
    Sean
  • Jason hadoop at Feb 2, 2009 at 2:00 am
    When the nodes drop out into dead status it creates the workload and memory
    load for the namenode.
    I don't know if the 100+ second case does so also.
    On Sun, Feb 1, 2009 at 4:11 PM, Sean Knapp wrote:

    Jason,
    Thanks for the response. By falling out, do you mean a longer time since
    last contact (100s+), or fully timed out where it is dropped into dead
    nodes? The former happens fairly often, the latter only under serious load
    but not in the last day. Also, my namenode is now up to 10GB with less than
    700k files after some additional archiving.

    Thanks,
    Sean
    On Sun, Feb 1, 2009 at 4:00 PM, jason hadoop wrote:

    If your datanodes are pausing and falling out of the cluster you will get a
    large workload for the namenode of blocks to replicate and when the paused
    datanode comes back, a large workload of blocks to delete.
    These lists are stored in memory on the namenode.
    The startup messages lead me to wonder if your datanodes are periodically
    pausing or are otherwise dropping in and out of the cluster.
    On Sat, Jan 31, 2009 at 2:20 PM, Sean Knapp wrote:

    I'm running 0.19.0 on a 10 node cluster (8 core, 16GB RAM, 4x1.5TB).
    The
    current status of my FS is approximately 1 million files and
    directories,
    950k blocks, and heap size of 7GB (16GB reserved). Average block
    replication
    is 3.8. I'm concerned that the heap size is steadily climbing... a 7GB heap
    is substantially higher per file that I have on a similar 0.18.2
    cluster,
    which has closer to a 1GB heap.
    My typical usage model is 1) write a number of small files into HDFS (tens
    or hundreds of thousands at a time), 2) archive those files, 3) delete the
    originals. I've tried dropping the replication factor of the _index and
    _masterindex files without much effect on overall heap size. While I
    had
    trash enabled at one point, I've since disabled it and deleted the
    .Trash
    folders.

    On namenode startup, I get a massive number of the following lines in
    my
    log
    file:
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.processReport: block blk_-2389330910609345428_7332878 on
    172.16.129.33:50010 size 798080 does not belong to any file.
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.addToInvalidates: blk_-2389330910609345428 is added to
    invalidSet
    of 172.16.129.33:50010

    I suspect the original files may be left behind and causing the heap
    size
    bloat. Is there any accounting mechanism to determine what is
    contributing
    to my heap size?

    Thanks,
    Sean
  • Sean Knapp at Feb 2, 2009 at 2:58 am
    Jason,
    Thanks again for the response. Is there a way to inspect these lists to
    verify?

    Regards,
    Sean
    On Sun, Feb 1, 2009 at 6:00 PM, jason hadoop wrote:

    When the nodes drop out into dead status it creates the workload and memory
    load for the namenode.
    I don't know if the 100+ second case does so also.
    On Sun, Feb 1, 2009 at 4:11 PM, Sean Knapp wrote:

    Jason,
    Thanks for the response. By falling out, do you mean a longer time since
    last contact (100s+), or fully timed out where it is dropped into dead
    nodes? The former happens fairly often, the latter only under serious load
    but not in the last day. Also, my namenode is now up to 10GB with less than
    700k files after some additional archiving.

    Thanks,
    Sean

    On Sun, Feb 1, 2009 at 4:00 PM, jason hadoop <jason.hadoop@gmail.com>
    wrote:
    If your datanodes are pausing and falling out of the cluster you will
    get
    a
    large workload for the namenode of blocks to replicate and when the paused
    datanode comes back, a large workload of blocks to delete.
    These lists are stored in memory on the namenode.
    The startup messages lead me to wonder if your datanodes are
    periodically
    pausing or are otherwise dropping in and out of the cluster.
    On Sat, Jan 31, 2009 at 2:20 PM, Sean Knapp wrote:

    I'm running 0.19.0 on a 10 node cluster (8 core, 16GB RAM, 4x1.5TB).
    The
    current status of my FS is approximately 1 million files and
    directories,
    950k blocks, and heap size of 7GB (16GB reserved). Average block
    replication
    is 3.8. I'm concerned that the heap size is steadily climbing... a
    7GB
    heap
    is substantially higher per file that I have on a similar 0.18.2
    cluster,
    which has closer to a 1GB heap.
    My typical usage model is 1) write a number of small files into HDFS (tens
    or hundreds of thousands at a time), 2) archive those files, 3)
    delete
    the
    originals. I've tried dropping the replication factor of the _index
    and
    _masterindex files without much effect on overall heap size. While I
    had
    trash enabled at one point, I've since disabled it and deleted the
    .Trash
    folders.

    On namenode startup, I get a massive number of the following lines in
    my
    log
    file:
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK*
    NameSystem.processReport: block blk_-2389330910609345428_7332878 on
    172.16.129.33:50010 size 798080 does not belong to any file.
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK*
    NameSystem.addToInvalidates: blk_-2389330910609345428 is added to
    invalidSet
    of 172.16.129.33:50010

    I suspect the original files may be left behind and causing the heap
    size
    bloat. Is there any accounting mechanism to determine what is
    contributing
    to my heap size?

    Thanks,
    Sean
  • Brian Bockelman at Feb 2, 2009 at 2:07 am
    Hey Sean,

    Dumb question: how much memory is used after a garbage collection cycle?

    Look at the graph "jvm.metrics.memHeapUsedM":

    http://rcf.unl.edu/ganglia/?m=network_report&r=hour&s=descending&c=red&h=hadoop-name&sh=1&hc=4&z=small

    If you tell the JVM it has 16GB of memory to play with, it will often
    use a significant portion of that before it does a thorough GC. In
    our site, it actually only needs ~ 500MB, but sometimes it will hit
    1GB before GC is triggered. One of the vagaries of Java, eh?

    Trigger a GC and see how much is actually used.

    Brian
    On Feb 1, 2009, at 6:11 PM, Sean Knapp wrote:

    Jason,
    Thanks for the response. By falling out, do you mean a longer time
    since
    last contact (100s+), or fully timed out where it is dropped into dead
    nodes? The former happens fairly often, the latter only under
    serious load
    but not in the last day. Also, my namenode is now up to 10GB with
    less than
    700k files after some additional archiving.

    Thanks,
    Sean

    On Sun, Feb 1, 2009 at 4:00 PM, jason hadoop
    wrote:
    If your datanodes are pausing and falling out of the cluster you
    will get a
    large workload for the namenode of blocks to replicate and when the
    paused
    datanode comes back, a large workload of blocks to delete.
    These lists are stored in memory on the namenode.
    The startup messages lead me to wonder if your datanodes are
    periodically
    pausing or are otherwise dropping in and out of the cluster.
    On Sat, Jan 31, 2009 at 2:20 PM, Sean Knapp wrote:

    I'm running 0.19.0 on a 10 node cluster (8 core, 16GB RAM,
    4x1.5TB). The
    current status of my FS is approximately 1 million files and
    directories,
    950k blocks, and heap size of 7GB (16GB reserved). Average block
    replication
    is 3.8. I'm concerned that the heap size is steadily climbing... a
    7GB heap
    is substantially higher per file that I have on a similar 0.18.2
    cluster,
    which has closer to a 1GB heap.
    My typical usage model is 1) write a number of small files into HDFS (tens
    or hundreds of thousands at a time), 2) archive those files, 3)
    delete the
    originals. I've tried dropping the replication factor of the
    _index and
    _masterindex files without much effect on overall heap size. While
    I had
    trash enabled at one point, I've since disabled it and deleted
    the .Trash
    folders.

    On namenode startup, I get a massive number of the following lines
    in my
    log
    file:
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK*
    NameSystem.processReport: block blk_-2389330910609345428_7332878 on
    172.16.129.33:50010 size 798080 does not belong to any file.
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK*
    NameSystem.addToInvalidates: blk_-2389330910609345428 is added to
    invalidSet
    of 172.16.129.33:50010

    I suspect the original files may be left behind and causing the
    heap size
    bloat. Is there any accounting mechanism to determine what is
    contributing
    to my heap size?

    Thanks,
    Sean
  • Sean Knapp at Feb 2, 2009 at 3:00 am
    Brian,
    Thanks for jumping in as well. Is there a recommended way of manually
    triggering GC?

    Thanks,
    Sean
    On Sun, Feb 1, 2009 at 6:06 PM, Brian Bockelman wrote:

    Hey Sean,

    Dumb question: how much memory is used after a garbage collection cycle?

    Look at the graph "jvm.metrics.memHeapUsedM":


    http://rcf.unl.edu/ganglia/?m=network_report&r=hour&s=descending&c=red&h=hadoop-name&sh=1&hc=4&z=small

    If you tell the JVM it has 16GB of memory to play with, it will often use a
    significant portion of that before it does a thorough GC. In our site, it
    actually only needs ~ 500MB, but sometimes it will hit 1GB before GC is
    triggered. One of the vagaries of Java, eh?

    Trigger a GC and see how much is actually used.

    Brian


    On Feb 1, 2009, at 6:11 PM, Sean Knapp wrote:

    Jason,
    Thanks for the response. By falling out, do you mean a longer time since
    last contact (100s+), or fully timed out where it is dropped into dead
    nodes? The former happens fairly often, the latter only under serious load
    but not in the last day. Also, my namenode is now up to 10GB with less
    than
    700k files after some additional archiving.

    Thanks,
    Sean

    On Sun, Feb 1, 2009 at 4:00 PM, jason hadoop <jason.hadoop@gmail.com>
    wrote:

    If your datanodes are pausing and falling out of the cluster you will get
    a
    large workload for the namenode of blocks to replicate and when the
    paused
    datanode comes back, a large workload of blocks to delete.
    These lists are stored in memory on the namenode.
    The startup messages lead me to wonder if your datanodes are periodically
    pausing or are otherwise dropping in and out of the cluster.

    On Sat, Jan 31, 2009 at 2:20 PM, Sean Knapp wrote:

    I'm running 0.19.0 on a 10 node cluster (8 core, 16GB RAM, 4x1.5TB). The
    current status of my FS is approximately 1 million files and
    directories,
    950k blocks, and heap size of 7GB (16GB reserved). Average block
    replication
    is 3.8. I'm concerned that the heap size is steadily climbing... a 7GB heap
    is substantially higher per file that I have on a similar 0.18.2
    cluster,
    which has closer to a 1GB heap.
    My typical usage model is 1) write a number of small files into HDFS (tens
    or hundreds of thousands at a time), 2) archive those files, 3) delete the
    originals. I've tried dropping the replication factor of the _index and
    _masterindex files without much effect on overall heap size. While I had
    trash enabled at one point, I've since disabled it and deleted the
    .Trash
    folders.

    On namenode startup, I get a massive number of the following lines in my
    log
    file:
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.processReport: block blk_-2389330910609345428_7332878 on
    172.16.129.33:50010 size 798080 does not belong to any file.
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
    NameSystem.addToInvalidates: blk_-2389330910609345428 is added to
    invalidSet
    of 172.16.129.33:50010

    I suspect the original files may be left behind and causing the heap
    size
    bloat. Is there any accounting mechanism to determine what is
    contributing
    to my heap size?

    Thanks,
    Sean
  • Brian Bockelman at Feb 2, 2009 at 3:04 am
    Hey Sean,

    I use JMX monitoring -- which allows me to trigger GC via jconsole.
    There's decent documentation out there to making it work, but you'd
    have to restart the namenode to do it ... let the list know if you
    can't figure it out.

    Brian
    On Feb 1, 2009, at 8:59 PM, Sean Knapp wrote:

    Brian,
    Thanks for jumping in as well. Is there a recommended way of manually
    triggering GC?

    Thanks,
    Sean

    On Sun, Feb 1, 2009 at 6:06 PM, Brian Bockelman
    wrote:
    Hey Sean,

    Dumb question: how much memory is used after a garbage collection
    cycle?

    Look at the graph "jvm.metrics.memHeapUsedM":


    http://rcf.unl.edu/ganglia/?m=network_report&r=hour&s=descending&c=red&h=hadoop-name&sh=1&hc=4&z=small

    If you tell the JVM it has 16GB of memory to play with, it will
    often use a
    significant portion of that before it does a thorough GC. In our
    site, it
    actually only needs ~ 500MB, but sometimes it will hit 1GB before
    GC is
    triggered. One of the vagaries of Java, eh?

    Trigger a GC and see how much is actually used.

    Brian


    On Feb 1, 2009, at 6:11 PM, Sean Knapp wrote:

    Jason,
    Thanks for the response. By falling out, do you mean a longer time
    since
    last contact (100s+), or fully timed out where it is dropped into
    dead
    nodes? The former happens fairly often, the latter only under
    serious load
    but not in the last day. Also, my namenode is now up to 10GB with
    less
    than
    700k files after some additional archiving.

    Thanks,
    Sean

    On Sun, Feb 1, 2009 at 4:00 PM, jason hadoop
    <jason.hadoop@gmail.com>
    wrote:

    If your datanodes are pausing and falling out of the cluster you
    will get
    a
    large workload for the namenode of blocks to replicate and when the
    paused
    datanode comes back, a large workload of blocks to delete.
    These lists are stored in memory on the namenode.
    The startup messages lead me to wonder if your datanodes are
    periodically
    pausing or are otherwise dropping in and out of the cluster.

    On Sat, Jan 31, 2009 at 2:20 PM, Sean Knapp <sean@ooyala.com>
    wrote:

    I'm running 0.19.0 on a 10 node cluster (8 core, 16GB RAM,
    4x1.5TB). The
    current status of my FS is approximately 1 million files and
    directories,
    950k blocks, and heap size of 7GB (16GB reserved). Average block
    replication
    is 3.8. I'm concerned that the heap size is steadily climbing...
    a 7GB heap
    is substantially higher per file that I have on a similar 0.18.2
    cluster,
    which has closer to a 1GB heap.
    My typical usage model is 1) write a number of small files into
    HDFS (tens
    or hundreds of thousands at a time), 2) archive those files, 3)
    delete the
    originals. I've tried dropping the replication factor of the
    _index and
    _masterindex files without much effect on overall heap size.
    While I had
    trash enabled at one point, I've since disabled it and deleted the
    .Trash
    folders.

    On namenode startup, I get a massive number of the following
    lines in my
    log
    file:
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK*
    NameSystem.processReport: block blk_-2389330910609345428_7332878
    on
    172.16.129.33:50010 size 798080 does not belong to any file.
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK*
    NameSystem.addToInvalidates: blk_-2389330910609345428 is added to
    invalidSet
    of 172.16.129.33:50010

    I suspect the original files may be left behind and causing the
    heap
    size
    bloat. Is there any accounting mechanism to determine what is
    contributing
    to my heap size?

    Thanks,
    Sean
  • Jason hadoop at Feb 2, 2009 at 6:48 am
    If you set up your namenode for remote debugging, you could attach with
    eclipse or the debugger of your choice.

    Look at the objects in org.apache.hadoop.hdfs.server.namenode.FSNamesystem
    private UnderReplicatedBlocks neededReplications = new
    UnderReplicatedBlocks();
    private PendingReplicationBlocks pendingReplications;

    //
    // Keeps a Collection for every named machine containing
    // blocks that have recently been invalidated and are thought to live
    // on the machine in question.
    // Mapping: StorageID -> ArrayList<Block>
    //
    private Map<String, Collection<Block>> recentInvalidateSets =
    new TreeMap<String, Collection<Block>>();

    //
    // Keeps a TreeSet for every named node. Each treeset contains
    // a list of the blocks that are "extra" at that location. We'll
    // eventually remove these extras.
    // Mapping: StorageID -> TreeSet<Block>
    //
    Map<String, Collection<Block>> excessReplicateMap =
    new TreeMap<String, Collection<Block>>();

    Much of this is run out of a thread ReplicationMonitor.

    In our case we had datanodes with 2million blocks dropping off and on again,
    and this was trashing these queues with the 2million blocks on the
    datanodoes, re-replicating the blocks and then invalidating them all when
    the datanode came back.

    On Sun, Feb 1, 2009 at 7:03 PM, Brian Bockelman wrote:

    Hey Sean,

    I use JMX monitoring -- which allows me to trigger GC via jconsole.
    There's decent documentation out there to making it work, but you'd have to
    restart the namenode to do it ... let the list know if you can't figure it
    out.

    Brian


    On Feb 1, 2009, at 8:59 PM, Sean Knapp wrote:

    Brian,
    Thanks for jumping in as well. Is there a recommended way of manually
    triggering GC?

    Thanks,
    Sean

    On Sun, Feb 1, 2009 at 6:06 PM, Brian Bockelman <bbockelm@cse.unl.edu
    wrote: Hey Sean,
    Dumb question: how much memory is used after a garbage collection cycle?

    Look at the graph "jvm.metrics.memHeapUsedM":



    http://rcf.unl.edu/ganglia/?m=network_report&r=hour&s=descending&c=red&h=hadoop-name&sh=1&hc=4&z=small

    If you tell the JVM it has 16GB of memory to play with, it will often use
    a
    significant portion of that before it does a thorough GC. In our site,
    it
    actually only needs ~ 500MB, but sometimes it will hit 1GB before GC is
    triggered. One of the vagaries of Java, eh?

    Trigger a GC and see how much is actually used.

    Brian


    On Feb 1, 2009, at 6:11 PM, Sean Knapp wrote:

    Jason,
    Thanks for the response. By falling out, do you mean a longer time since
    last contact (100s+), or fully timed out where it is dropped into dead
    nodes? The former happens fairly often, the latter only under serious
    load
    but not in the last day. Also, my namenode is now up to 10GB with less
    than
    700k files after some additional archiving.

    Thanks,
    Sean

    On Sun, Feb 1, 2009 at 4:00 PM, jason hadoop <jason.hadoop@gmail.com>
    wrote:

    If your datanodes are pausing and falling out of the cluster you will
    get
    a
    large workload for the namenode of blocks to replicate and when the
    paused
    datanode comes back, a large workload of blocks to delete.
    These lists are stored in memory on the namenode.
    The startup messages lead me to wonder if your datanodes are
    periodically
    pausing or are otherwise dropping in and out of the cluster.

    On Sat, Jan 31, 2009 at 2:20 PM, Sean Knapp wrote:

    I'm running 0.19.0 on a 10 node cluster (8 core, 16GB RAM, 4x1.5TB).
    The
    current status of my FS is approximately 1 million files and
    directories,
    950k blocks, and heap size of 7GB (16GB reserved). Average block
    replication
    is 3.8. I'm concerned that the heap size is steadily climbing... a 7GB

    heap
    is substantially higher per file that I have on a similar 0.18.2
    cluster,
    which has closer to a 1GB heap.
    My typical usage model is 1) write a number of small files into HDFS

    (tens
    or hundreds of thousands at a time), 2) archive those files, 3) delete
    the
    originals. I've tried dropping the replication factor of the _index
    and
    _masterindex files without much effect on overall heap size. While I
    had
    trash enabled at one point, I've since disabled it and deleted the
    .Trash
    folders.

    On namenode startup, I get a massive number of the following lines in
    my
    log
    file:
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK*
    NameSystem.processReport: block blk_-2389330910609345428_7332878 on
    172.16.129.33:50010 size 798080 does not belong to any file.
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK*
    NameSystem.addToInvalidates: blk_-2389330910609345428 is added to
    invalidSet
    of 172.16.129.33:50010

    I suspect the original files may be left behind and causing the heap
    size
    bloat. Is there any accounting mechanism to determine what is

    contributing
    to my heap size?
    Thanks,
    Sean

  • Sean Knapp at Feb 4, 2009 at 6:32 pm
    Brian, Jason,
    Thanks again for your help. Just to close thread, while following your
    suggestions I found I had an incredibly large number of files on my data
    nodes that were being marked for invalidation at startup. I believe they
    were left behind from an old mass-delete that was followed by a shutdown
    before the deletes were performed. I've cleaned out those files and we're
    humming along with <1GB heap size.

    Thanks,
    Sean
    On Sun, Feb 1, 2009 at 10:48 PM, jason hadoop wrote:

    If you set up your namenode for remote debugging, you could attach with
    eclipse or the debugger of your choice.

    Look at the objects in org.apache.hadoop.hdfs.server.namenode.FSNamesystem
    private UnderReplicatedBlocks neededReplications = new
    UnderReplicatedBlocks();
    private PendingReplicationBlocks pendingReplications;

    //
    // Keeps a Collection for every named machine containing
    // blocks that have recently been invalidated and are thought to live
    // on the machine in question.
    // Mapping: StorageID -> ArrayList<Block>
    //
    private Map<String, Collection<Block>> recentInvalidateSets =
    new TreeMap<String, Collection<Block>>();

    //
    // Keeps a TreeSet for every named node. Each treeset contains
    // a list of the blocks that are "extra" at that location. We'll
    // eventually remove these extras.
    // Mapping: StorageID -> TreeSet<Block>
    //
    Map<String, Collection<Block>> excessReplicateMap =
    new TreeMap<String, Collection<Block>>();

    Much of this is run out of a thread ReplicationMonitor.

    In our case we had datanodes with 2million blocks dropping off and on
    again,
    and this was trashing these queues with the 2million blocks on the
    datanodoes, re-replicating the blocks and then invalidating them all when
    the datanode came back.


    On Sun, Feb 1, 2009 at 7:03 PM, Brian Bockelman <bbockelm@cse.unl.edu
    wrote:
    Hey Sean,

    I use JMX monitoring -- which allows me to trigger GC via jconsole.
    There's decent documentation out there to making it work, but you'd have to
    restart the namenode to do it ... let the list know if you can't figure it
    out.

    Brian


    On Feb 1, 2009, at 8:59 PM, Sean Knapp wrote:

    Brian,
    Thanks for jumping in as well. Is there a recommended way of manually
    triggering GC?

    Thanks,
    Sean

    On Sun, Feb 1, 2009 at 6:06 PM, Brian Bockelman <bbockelm@cse.unl.edu
    wrote: Hey Sean,
    Dumb question: how much memory is used after a garbage collection
    cycle?
    Look at the graph "jvm.metrics.memHeapUsedM":


    http://rcf.unl.edu/ganglia/?m=network_report&r=hour&s=descending&c=red&h=hadoop-name&sh=1&hc=4&z=small
    If you tell the JVM it has 16GB of memory to play with, it will often
    use
    a
    significant portion of that before it does a thorough GC. In our site,
    it
    actually only needs ~ 500MB, but sometimes it will hit 1GB before GC is
    triggered. One of the vagaries of Java, eh?

    Trigger a GC and see how much is actually used.

    Brian


    On Feb 1, 2009, at 6:11 PM, Sean Knapp wrote:

    Jason,
    Thanks for the response. By falling out, do you mean a longer time
    since
    last contact (100s+), or fully timed out where it is dropped into dead
    nodes? The former happens fairly often, the latter only under serious
    load
    but not in the last day. Also, my namenode is now up to 10GB with less
    than
    700k files after some additional archiving.

    Thanks,
    Sean

    On Sun, Feb 1, 2009 at 4:00 PM, jason hadoop <jason.hadoop@gmail.com>
    wrote:

    If your datanodes are pausing and falling out of the cluster you will
    get
    a
    large workload for the namenode of blocks to replicate and when the
    paused
    datanode comes back, a large workload of blocks to delete.
    These lists are stored in memory on the namenode.
    The startup messages lead me to wonder if your datanodes are
    periodically
    pausing or are otherwise dropping in and out of the cluster.

    On Sat, Jan 31, 2009 at 2:20 PM, Sean Knapp wrote:

    I'm running 0.19.0 on a 10 node cluster (8 core, 16GB RAM, 4x1.5TB).
    The
    current status of my FS is approximately 1 million files and
    directories,
    950k blocks, and heap size of 7GB (16GB reserved). Average block
    replication
    is 3.8. I'm concerned that the heap size is steadily climbing... a
    7GB
    heap
    is substantially higher per file that I have on a similar 0.18.2
    cluster,
    which has closer to a 1GB heap.
    My typical usage model is 1) write a number of small files into HDFS

    (tens
    or hundreds of thousands at a time), 2) archive those files, 3)
    delete
    the
    originals. I've tried dropping the replication factor of the _index
    and
    _masterindex files without much effect on overall heap size. While I
    had
    trash enabled at one point, I've since disabled it and deleted the
    .Trash
    folders.

    On namenode startup, I get a massive number of the following lines
    in
    my
    log
    file:
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK*
    NameSystem.processReport: block blk_-2389330910609345428_7332878 on
    172.16.129.33:50010 size 798080 does not belong to any file.
    2009-01-31 21:41:23,283 INFO org.apache.hadoop.hdfs.StateChange:
    BLOCK*
    NameSystem.addToInvalidates: blk_-2389330910609345428 is added to
    invalidSet
    of 172.16.129.33:50010

    I suspect the original files may be left behind and causing the heap
    size
    bloat. Is there any accounting mechanism to determine what is

    contributing
    to my heap size?
    Thanks,
    Sean

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJan 31, '09 at 10:20p
activeFeb 4, '09 at 6:32p
posts10
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase