FAQ
Can someone tell whether a file will occupy one or more blocks? for
example, the default block size is 64MB, and if I save a 4k file to HDFS,
will the 4K file occupy the whole 64MB block alone? so in this case, do I do
need to configure the block size to 10k if most of my files are less than
10K?

thanks,

Search Discussions

  • Jason hadoop at Apr 3, 2009 at 2:30 am
    HDFS only allocates as much physical disk space is required for a block, up
    to the block size for the file (+ some header data).
    So if you write a 4k file, the single block for that file will be around 4k.

    If you write a 65M file, there will be two blocks, one of roughly 64M, and
    one of roughly 1M.

    You can verify this yourself by, on a datanode, running *find
    ${dfs.data.dir} -iname blk'*' -type f -ls*

    Note: the above command will only work as expected if a single directory is
    defined for dfs block storage, and ${dfs.data.dir}, is replaced with the
    effective value of the configuration parameter dfs.data.dir, from your
    hadoop configuration.
    dfs.data.dir is commonly defined as ${hadoop.tmp.dir}/dfs/data.

    The following rather insane bash shell command will print out the value of
    dfs.data.dir on the local machine.
    It must be run from the hadoop installation directory, and makes 2 temporary
    names in /tmp/f.PID.input and /tmp/f.PID.output
    This little ditty relies on the fact that the configuration parameters are
    pushed into the process environment for streaming jobs.

    Streaming Rocks!

    B=/tmp/f.$$;
    date > ${B}.input;
    rmdir ${B}.output;
    bin/hadoop jar contrib/streaming/hadoop-*-streaming.jar -D
    fs.default.name=file:///
    -jt local -input ${B}.input -output ${B}.output -numReduceTasks 0 -mapper
    env;
    grep dfs.data.dir ${B}.output/part-00000;
    rm ${B}.input;
    rm -rf ${B}.output
    On Thu, Apr 2, 2009 at 6:44 PM, javateck javateck wrote:

    Can someone tell whether a file will occupy one or more blocks? for
    example, the default block size is 64MB, and if I save a 4k file to HDFS,
    will the 4K file occupy the whole 64MB block alone? so in this case, do I
    do
    need to configure the block size to 10k if most of my files are less than
    10K?

    thanks,


    --
    Alpha Chapters of my book on Hadoop are available
    http://www.apress.com/book/view/9781430219422
  • David Kellogg at Apr 6, 2009 at 10:18 pm
    I am running Hadoop streaming. After around 42 jobs on an 18-node
    cluster, the jobtracker stops responding. This happens on normally-
    working code. Here are the symptoms.

    1. A job is running, but it pauses with reduce stuck at XX%
    2. "hadoop job -list" hangs or takes a very long time to return
    3. In the Ganglia metrics on the Jobtracker node:
    a. jvm.metrics__JobTracker__gcTimeMillis rises above 20 k (20
    seconds) before failure
    b. jvm.metrics__JobTracker__memHeapUsedM rises above 600 before
    failure
    c. jvm.metrics__JobTracker__gcCount rises above 1 k before failure


    The ticker looks like this.

    09/04/06 03:06:28 INFO streaming.StreamJob: map 24% reduce 7%
    09/04/06 03:13:44 INFO streaming.StreamJob: map 25% reduce 7%
    After the 03:13:44 line, it hangs for more than 15 minutes.

    In the jobtracker log, I see this.

    2009-04-04 04:19:13,563 WARN org.apache.hadoop.hdfs.DFSClient: Error
    Recovery for block blk_-8143535428142072268_95993 failed because
    recovery from primary datanode 10.1.0.156:50010 failed 4 times. Will
    retry...

    After restarting both dfs and mapreduce on all nodes, the problem
    goes away, and the formally non-working job proceeds without failure.

    Does anyone else see this problem?

    David Kellogg
  • Amar Kamat at Apr 7, 2009 at 4:43 am

    David Kellogg wrote:
    I am running Hadoop streaming. After around 42 jobs on an 18-node
    cluster, the jobtracker stops responding. This happens on
    normally-working code. Here are the symptoms.

    1. A job is running, but it pauses with reduce stuck at XX%
    2. "hadoop job -list" hangs or takes a very long time to return
    3. In the Ganglia metrics on the Jobtracker node:
    a. jvm.metrics__JobTracker__gcTimeMillis rises above 20 k (20
    seconds) before failure
    b. jvm.metrics__JobTracker__memHeapUsedM rises above 600 before
    failure
    c. jvm.metrics__JobTracker__gcCount rises above 1 k before failure


    The ticker looks like this.

    09/04/06 03:06:28 INFO streaming.StreamJob: map 24% reduce 7%
    09/04/06 03:13:44 INFO streaming.StreamJob: map 25% reduce 7%
    After the 03:13:44 line, it hangs for more than 15 minutes.

    In the jobtracker log, I see this.

    2009-04-04 04:19:13,563 WARN org.apache.hadoop.hdfs.DFSClient: Error
    Recovery for block blk_-8143535428142072268_95993 failed because
    recovery from primary datanode 10.1.0.156:50010 failed 4 times. Will
    retry...

    After restarting both dfs and mapreduce on all nodes, the problem goes
    away, and the formally non-working job proceeds without failure.
    David,
    What version are you using?
    There can be because of :
    1) Number of tasks in jobtracker's memory might exceed its limits. What
    is the total number of tasks in the jobtracker's memory? What is the
    jobtracker's heap size? Try increasing the heap size and also try
    setting the mapred.jobtracker.completeuserjobs.maximum parameter to some
    low value.
    2) Sometimes some slow/bad datanode causes the jobtracker to get stuck.
    As you have mentioned this might be the cause. Can you let us know the
    output of 'kill -3' on jobtracker process.
    Does anyone else see this problem?

    David Kellogg
  • David Kellogg at Apr 7, 2009 at 5:49 pm
    Thanks.

    It is Hadoop 0.19.0. It looks like a peak of 90 concurrent map tasks
    and 54 concurrent reduce tasks, not necessarily at the same time. You
    asked how many are held in memory. I think the number is 100, the
    default.

    It looks like 130 MB peak heap size used on the jobtracker. The
    default 1000 is allocated, currently.

    I set mapred.jobtracker.completeuserjobs.maximum=5
    I increased HADOOP_HEAPSIZE to 2000 in hadoop-env.sh.
    I included the "kill -3" dump from the jobtracker, far below.

    I'll check back in 3 days, if and when the jobtracker fails again.

    Dave



    On Apr 6, 2009, at 9:42 PM, Amar Kamat wrote:

    David Kellogg wrote:
    I am running Hadoop streaming. After around 42 jobs on an 18-node
    cluster, the jobtracker stops responding. This happens on normally-
    working code. Here are the symptoms.

    1. A job is running, but it pauses with reduce stuck at XX%
    2. "hadoop job -list" hangs or takes a very long time to return
    3. In the Ganglia metrics on the Jobtracker node:
    a. jvm.metrics__JobTracker__gcTimeMillis rises above 20 k (20
    seconds) before failure
    b. jvm.metrics__JobTracker__memHeapUsedM rises above 600
    before failure
    c. jvm.metrics__JobTracker__gcCount rises above 1 k before
    failure


    The ticker looks like this.

    09/04/06 03:06:28 INFO streaming.StreamJob: map 24% reduce 7%
    09/04/06 03:13:44 INFO streaming.StreamJob: map 25% reduce 7%
    After the 03:13:44 line, it hangs for more than 15 minutes.

    In the jobtracker log, I see this.

    2009-04-04 04:19:13,563 WARN org.apache.hadoop.hdfs.DFSClient:
    Error Recovery for block blk_-8143535428142072268_95993 failed
    because recovery from primary datanode 10.1.0.156:50010 failed 4
    times. Will retry...

    After restarting both dfs and mapreduce on all nodes, the problem
    goes away, and the formally non-working job proceeds without failure.
    David,
    What version are you using?
    There can be because of :
    1) Number of tasks in jobtracker's memory might exceed its limits.
    What is the total number of tasks in the jobtracker's memory? What
    is the jobtracker's heap size? Try increasing the heap size and
    also try setting the mapred.jobtracker.completeuserjobs.maximum
    parameter to some low value.
    2) Sometimes some slow/bad datanode causes the jobtracker to get
    stuck. As you have mentioned this might be the cause. Can you let
    us know the output of 'kill -3' on jobtracker process.
    Does anyone else see this problem?

    David Kellogg
    Full thread dump Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed
    mode):

    "LeaseChecker" daemon prio=10 tid=0x00007f0c3042a800 nid=0x3ef3
    waiting on condition [0x0000000040bf7000..0x0000000040bf7b90]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run
    (DFSClient.java:979)
    at java.lang.Thread.run(Thread.java:619)

    "IPC Server handler 9 on 54311" daemon prio=10 tid=0x00007f0c307f0800
    nid=0x3dd9 waiting on condition [0x000000004313f000..0x000000004313fd10]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 8 on 54311" daemon prio=10 tid=0x00007f0c307ef400
    nid=0x3dd8 waiting on condition [0x000000004303e000..0x000000004303ed90]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 7 on 54311" daemon prio=10 tid=0x00007f0c307b0400
    nid=0x3dd7 waiting on condition [0x0000000042f3d000..0x0000000042f3da10]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 6 on 54311" daemon prio=10 tid=0x00007f0c307af000
    nid=0x3dd6 waiting on condition [0x0000000042e3c000..0x0000000042e3ca90]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 5 on 54311" daemon prio=10 tid=0x00007f0c30296800
    nid=0x3dd5 waiting on condition [0x0000000042d3b000..0x0000000042d3bb10]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 4 on 54311" daemon prio=10 tid=0x00007f0c302e7c00
    nid=0x3dd4 waiting on condition [0x0000000042c3a000..0x0000000042c3ab90]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 3 on 54311" daemon prio=10 tid=0x00007f0c302e6800
    nid=0x3dd3 waiting on condition [0x0000000042b39000..0x0000000042b39c10]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 2 on 54311" daemon prio=10 tid=0x00007f0c30337400
    nid=0x3dd2 waiting on condition [0x0000000042a38000..0x0000000042a38c90]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 1 on 54311" daemon prio=10 tid=0x00007f0c30336000
    nid=0x3dd1 waiting on condition [0x0000000042937000..0x0000000042937d10]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 0 on 54311" daemon prio=10 tid=0x00007f0c302a6000
    nid=0x3dd0 waiting on condition [0x0000000042836000..0x0000000042836d90]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server listener on 54311" daemon prio=10 tid=0x00007f0c302b6c00
    nid=0x3dcf runnable [0x0000000042735000..0x0000000042735a10]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
    215)
    at sun.nio.ch.EPollSelectorImpl.doSelect
    (EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
    69)
    - locked <0x00007f0c3ca96ac8> (a sun.nio.ch.Util$1)
    - locked <0x00007f0c3ca96ab0> (a java.util.Collections
    $UnmodifiableSet)
    - locked <0x00007f0c3ca96850> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
    at org.apache.hadoop.ipc.Server$Listener.run(Server.java:297)

    "IPC Server Responder" daemon prio=10 tid=0x00007f0c302b5800
    nid=0x3dce runnable [0x0000000042634000..0x0000000042634a90]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
    215)
    at sun.nio.ch.EPollSelectorImpl.doSelect
    (EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:
    69)
    - locked <0x00007f0c3cba4810> (a sun.nio.ch.Util$1)
    - locked <0x00007f0c3cba47f8> (a java.util.Collections
    $UnmodifiableSet)
    - locked <0x00007f0c3cba4520> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.hadoop.ipc.Server$Responder.run(Server.java:456)

    "expireLaunchingTasks" prio=10 tid=0x00007f0c308ba400 nid=0x3dcd
    waiting on condition [0x0000000042533000..0x0000000042533b10]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.mapred.JobTracker
    $ExpireLaunchingTasks.run(JobTracker.java:198)
    at java.lang.Thread.run(Thread.java:619)

    "retireJobs" prio=10 tid=0x00007f0c303a8800 nid=0x3dcc waiting on
    condition [0x0000000042432000..0x0000000042432b90]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.mapred.JobTracker$RetireJobs.run
    (JobTracker.java:351)
    at java.lang.Thread.run(Thread.java:619)

    "expireTrackers" prio=10 tid=0x00007f0c36940c00 nid=0x3dcb waiting on
    condition [0x0000000041c7f000..0x0000000041c7fc10]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.mapred.JobTracker$ExpireTrackers.run
    (JobTracker.java:281)
    at java.lang.Thread.run(Thread.java:619)

    "initJobs" prio=10 tid=0x00007f0c3693fc00 nid=0x3dca in Object.wait()
    [0x0000000041b7e000..0x0000000041b7ec90]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00007f0c3ca91d90> (a java.util.ArrayList)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.mapred.EagerTaskInitializationListener
    $JobInitThread.run(EagerTaskInitializationListener.java:51)
    - locked <0x00007f0c3ca91d90> (a java.util.ArrayList)
    at java.lang.Thread.run(Thread.java:619)

    "Thread-14" prio=10 tid=0x00007f0c3602f400 nid=0x3dc9 waiting on
    condition [0x0000000041a7d000..0x0000000041a7dd10]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3cf03938> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.DelayQueue.take(DelayQueue.java:160)
    at java.util.concurrent.DelayQueue.take(DelayQueue.java:39)
    at org.apache.hadoop.mapred.JobEndNotifier$1.run
    (JobEndNotifier.java:50)
    at java.lang.Thread.run(Thread.java:619)

    "Timer thread for monitoring mapred" daemon prio=10
    tid=0x00007f0c36d51000 nid=0x3dc4 in Object.wait()
    [0x000000004197c000..0x000000004197cb10]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.util.TimerThread.mainLoop(Timer.java:509)
    - locked <0x00007f0c3ceabad8> (a java.util.TaskQueue)
    at java.util.TimerThread.run(Timer.java:462)

    "Timer thread for monitoring jvm" daemon prio=10
    tid=0x00007f0c36d86000 nid=0x3dc3 in Object.wait()
    [0x00000000407b3000..0x00000000407b3b90]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.util.TimerThread.mainLoop(Timer.java:509)
    - locked <0x00007f0c3ceaeaf8> (a java.util.TaskQueue)
    at java.util.TimerThread.run(Timer.java:462)

    "SocketListener0-1" prio=10 tid=0x00007f0c36a4f000 nid=0x3dc2 in
    Object.wait() [0x000000004187b000..0x000000004187bc10]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at org.mortbay.util.ThreadPool$PoolThread.run
    (ThreadPool.java:522)
    - locked <0x00007f0c3ceb1760> (a org.mortbay.util.ThreadPool
    $PoolThread)

    "SocketListener0-0" prio=10 tid=0x00007f0c36d69400 nid=0x3dc1 in
    Object.wait() [0x000000004177a000..0x000000004177ac90]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at org.mortbay.util.ThreadPool$PoolThread.run
    (ThreadPool.java:522)
    - locked <0x00007f0c3ceb1a40> (a org.mortbay.util.ThreadPool
    $PoolThread)

    "Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=50030]"
    prio=10 tid=0x00007f0c36b6a800 nid=0x3dc0 runnable
    [0x0000000041679000..0x0000000041679d10]
    java.lang.Thread.State: RUNNABLE
    at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
    - locked <0x00007f0c3cc3b5c8> (a java.net.SocksSocketImpl)
    at java.net.ServerSocket.implAccept(ServerSocket.java:453)
    at java.net.ServerSocket.accept(ServerSocket.java:421)
    at org.mortbay.util.ThreadedServer.acceptSocket
    (ThreadedServer.java:432)
    at org.mortbay.util.ThreadedServer$Acceptor.run
    (ThreadedServer.java:631)

    "SessionScavenger" daemon prio=10 tid=0x00007f0c303cc000 nid=0x3dbf
    waiting on condition [0x0000000040e0b000..0x0000000040e0bd90]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.mortbay.jetty.servlet.AbstractSessionManager
    $SessionScavenger.run(AbstractSessionManager.java:587)

    "SessionScavenger" daemon prio=10 tid=0x00007f0c30346800 nid=0x3dbe
    waiting on condition [0x0000000041de8000..0x0000000041de8a10]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.mortbay.jetty.servlet.AbstractSessionManager
    $SessionScavenger.run(AbstractSessionManager.java:587)

    "SessionScavenger" daemon prio=10 tid=0x00007f0c3031e400 nid=0x3dbd
    waiting on condition [0x0000000041578000..0x0000000041578a90]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.mortbay.jetty.servlet.AbstractSessionManager
    $SessionScavenger.run(AbstractSessionManager.java:587)

    "Low Memory Detector" daemon prio=10 tid=0x00000000401c9400
    nid=0x3d7c runnable [0x0000000000000000..0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x00000000401c6000 nid=0x3d7b
    waiting on condition [0x0000000000000000..0x00000000423302b0]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x00000000401c4000 nid=0x3d7a
    waiting on condition [0x0000000000000000..0x000000004222f330]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x00000000401c2800 nid=0x3d79
    waiting on condition [0x0000000000000000..0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x000000004019d800 nid=0x3d78 in
    Object.wait() [0x000000004202e000..0x000000004202eb90]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
    - locked <0x00007f0c3cbfe4e0> (a java.lang.ref.ReferenceQueue
    $Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
    at java.lang.ref.Finalizer$FinalizerThread.run
    (Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x0000000040196c00 nid=0x3d77
    in Object.wait() [0x0000000041f2d000..0x0000000041f2dc10]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run
    (Reference.java:116)
    - locked <0x00007f0c3caf1408> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x0000000040111c00 nid=0x3d73 in Object.wait()
    [0x0000000041477000..0x0000000041477ed0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00007f0c3cba4020> (a
    org.apache.hadoop.ipc.RPC$Server)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.ipc.Server.join(Server.java:1021)
    - locked <0x00007f0c3cba4020> (a org.apache.hadoop.ipc.RPC
    $Server)
    at org.apache.hadoop.mapred.JobTracker.offerService
    (JobTracker.java:1309)
    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:
    2731)

    "VM Thread" prio=10 tid=0x0000000040191c00 nid=0x3d76 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x000000004011c400
    nid=0x3d74 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x000000004011d800
    nid=0x3d75 runnable

    "VM Periodic Task Thread" prio=10 tid=0x00000000401cb000 nid=0x3d7d
    waiting on condition

    JNI global references: 964

    Heap
    PSYoungGen total 198528K, used 68487K [0x00007f0c66540000,
    0x00007f0c77830000, 0x00007f0c7b290000)
    eden space 142144K, 48% used
    [0x00007f0c66540000,0x00007f0c6a821f70,0x00007f0c6f010000)
    from space 56384K, 0% used
    [0x00007f0c6f010000,0x00007f0c6f010000,0x00007f0c72720000)
    to space 57600K, 0% used
    [0x00007f0c73ff0000,0x00007f0c73ff0000,0x00007f0c77830000)
    PSOldGen total 682688K, used 672276K [0x00007f0c3ca90000,
    0x00007f0c66540000, 0x00007f0c66540000)
    object space 682688K, 98% used
    [0x00007f0c3ca90000,0x00007f0c65b15068,0x00007f0c66540000)
    PSPermGen total 21248K, used 16555K [0x00007f0c37690000,
    0x00007f0c38b50000, 0x00007f0c3ca90000)
    object space 21248K, 77% used
    [0x00007f0c37690000,0x00007f0c386bae60,0x00007f0c38b50000)
  • David Kellogg at May 7, 2009 at 1:11 am
    I still see the memory leak in the JobTracker, (version 0.19.0,
    streaming, Java version 1.6). Doubling the heap size simply doubled
    the time-to-failure. I ran hprof against the jobtracker process.

    It appears the Counters objects are instantiated many times. The
    stack traces often point back to Counters again.

    Can anyone guess what the problem is here?

    Dave

    SITES BEGIN (ordered by live bytes) Wed May 6 12:33:48 2009
    percent live alloc'ed stack class
    rank self accum bytes objs bytes objs trace name
    1 3.51% 3.51% 25115888 46858 255055064 475849 332154
    java.lang.Object[]
    2 3.49% 7.00% 24970632 46587 24970632 46587 332642
    java.lang.Object[]
    3 3.35% 10.35% 23959352 379880 43381832 677145 333210 char[]
    4 3.19% 13.53% 22805816 379875 41290952 677145 333204 char[]
    5 2.48% 16.02% 17782896 370477 22768176 474337 333211
    java.util.HashMap$Entry
    6 2.12% 18.14% 15195200 379880 27085800 677145 333209
    java.lang.String
    7 2.12% 20.26% 15195000 379875 27085800 677145 333203
    java.lang.String
    8 2.07% 22.33% 14819080 370477 18973480 474337 333198
    org.apache.hadoop.mapred.Counters$Counter
    9 1.97% 24.30% 14083560 92655 18559200 122100 333168
    java.util.HashMap$Entry[]
    10 1.54% 25.84% 11006400 45860 11006400 45860 331020
    org.apache.hadoop.mapred.TaskInProgress


    TRACE 332154:
    java.util.IdentityHashMap.init(IdentityHashMap.java:244)
    java.util.IdentityHashMap.<init>(IdentityHashMap.java:190)
    org.apache.hadoop.mapred.Counters.<init>(Counters.java:389)
    org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:351)
    org.apache.hadoop.mapred.TaskStatus.readTaskStatus
    (TaskStatus.java:393)
    org.apache.hadoop.mapred.TaskTrackerStatus.readFields
    (TaskTrackerStatus.java:297)
    org.apache.hadoop.io.ObjectWritable.readObject
    (ObjectWritable.java:237)
    org.apache.hadoop.ipc.RPC$Invocation.readFields(RPC.java:100)
    org.apache.hadoop.ipc.Server$Connection.processData(Server.java:
    845)
    org.apache.hadoop.ipc.Server$Connection.readAndProcess
    (Server.java:812)

    TRACE 332642:
    java.util.IdentityHashMap.init(IdentityHashMap.java:244)
    java.util.IdentityHashMap.<init>(IdentityHashMap.java:190)
    org.apache.hadoop.mapred.Counters.<init>(Counters.java:389)
    org.apache.hadoop.mapred.Task.<init>(Task.java:323)
    org.apache.hadoop.mapred.MapTask.<init>(MapTask.java:91)
    org.apache.hadoop.mapred.TaskInProgress.addRunningTask
    (TaskInProgress.java:857)
    org.apache.hadoop.mapred.TaskInProgress.getTaskToRun
    (TaskInProgress.java:844)
    org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask
    (JobInProgress.java:889)
    org.apache.hadoop.mapred.JobQueueTaskScheduler.assignTasks
    (JobQueueTaskScheduler.java:150)
    org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:1887)
    TRACE 333210:
    java.util.Arrays.copyOfRange(Arrays.java:3209)
    java.lang.String.<init>(String.java:216)
    java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
    java.nio.CharBuffer.toString(CharBuffer.java:1157)
    org.apache.hadoop.io.Text.decode(Text.java:350)
    org.apache.hadoop.io.Text.decode(Text.java:322)
    org.apache.hadoop.io.Text.readString(Text.java:403)
    org.apache.hadoop.mapred.Counters$Counter.readFields
    (Counters.java:90)
    org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:
    371)
    org.apache.hadoop.mapred.Counters.readFields(Counters.java:554)

    TRACE 333204:
    java.util.Arrays.copyOfRange(Arrays.java:3209)
    java.lang.String.<init>(String.java:216)
    java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
    java.nio.CharBuffer.toString(CharBuffer.java:1157)
    org.apache.hadoop.io.Text.decode(Text.java:350)
    org.apache.hadoop.io.Text.decode(Text.java:322)
    org.apache.hadoop.io.Text.readString(Text.java:403)
    org.apache.hadoop.mapred.Counters$Counter.readFields
    (Counters.java:88)
    org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:
    371)
    org.apache.hadoop.mapred.Counters.readFields(Counters.java:554)

    TRACE 333211:
    java.util.HashMap$Entry.<init>(HashMap.java:683)
    java.util.HashMap.addEntry(HashMap.java:753)
    java.util.HashMap.put(HashMap.java:385)
    org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:
    372)
    org.apache.hadoop.mapred.Counters.readFields(Counters.java:554)
    org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:355)
    org.apache.hadoop.mapred.TaskStatus.readTaskStatus
    (TaskStatus.java:393)
    org.apache.hadoop.mapred.TaskTrackerStatus.readFields
    (TaskTrackerStatus.java:297)
    org.apache.hadoop.io.ObjectWritable.readObject
    (ObjectWritable.java:237)
    org.apache.hadoop.ipc.RPC$Invocation.readFields(RPC.java:100)

    TRACE 333209:
    java.lang.String.<init>(String.java:203)
    java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
    java.nio.CharBuffer.toString(CharBuffer.java:1157)
    org.apache.hadoop.io.Text.decode(Text.java:350)
    org.apache.hadoop.io.Text.decode(Text.java:322)
    org.apache.hadoop.io.Text.readString(Text.java:403)
    org.apache.hadoop.mapred.Counters$Counter.readFields
    (Counters.java:90)
    org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:
    371)
    org.apache.hadoop.mapred.Counters.readFields(Counters.java:554)
    org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:355)

    TRACE 333203:
    java.lang.String.<init>(String.java:203)
    java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
    java.nio.CharBuffer.toString(CharBuffer.java:1157)
    org.apache.hadoop.io.Text.decode(Text.java:350)
    org.apache.hadoop.io.Text.decode(Text.java:322)
    org.apache.hadoop.io.Text.readString(Text.java:403)
    org.apache.hadoop.mapred.Counters$Counter.readFields
    (Counters.java:88)
    org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:
    371)
    org.apache.hadoop.mapred.Counters.readFields(Counters.java:554)
    org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:355)

    TRACE 333198:
    org.apache.hadoop.mapred.Counters$Counter.<init>(Counters.java:74)
    org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:
    370)
    org.apache.hadoop.mapred.Counters.readFields(Counters.java:554)
    org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:355)
    org.apache.hadoop.mapred.TaskStatus.readTaskStatus
    (TaskStatus.java:393)
    org.apache.hadoop.mapred.TaskTrackerStatus.readFields
    (TaskTrackerStatus.java:297)
    org.apache.hadoop.io.ObjectWritable.readObject
    (ObjectWritable.java:237)
    org.apache.hadoop.ipc.RPC$Invocation.readFields(RPC.java:100)
    org.apache.hadoop.ipc.Server$Connection.processData(Server.java:
    845)
    org.apache.hadoop.ipc.Server$Connection.readAndProcess
    (Server.java:812)

    TRACE 333168:
    java.util.HashMap.<init>(HashMap.java:209)
    org.apache.hadoop.mapred.Counters$Group.<init>(Counters.java:195)
    org.apache.hadoop.mapred.Counters.readFields(Counters.java:553)
    org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:355)
    org.apache.hadoop.mapred.TaskStatus.readTaskStatus
    (TaskStatus.java:393)
    org.apache.hadoop.mapred.TaskTrackerStatus.readFields
    (TaskTrackerStatus.java:297)
    org.apache.hadoop.io.ObjectWritable.readObject
    (ObjectWritable.java:237)
    org.apache.hadoop.ipc.RPC$Invocation.readFields(RPC.java:100)
    org.apache.hadoop.ipc.Server$Connection.processData(Server.java:
    845)
    org.apache.hadoop.ipc.Server$Connection.readAndProcess
    (Server.java:812)

    TRACE 331020:
    org.apache.hadoop.mapred.TaskInProgress.<init>
    (TaskInProgress.java:127)
    org.apache.hadoop.mapred.JobInProgress.initTasks
    (JobInProgress.java:389)
    org.apache.hadoop.mapred.EagerTaskInitializationListener
    $JobInitThread.run(EagerTaskInitializationListener.java:55)
    java.lang.Thread.run(Thread.java:619)




    On Apr 7, 2009, at 10:50 AM, David Kellogg wrote:

    Thanks.

    It is Hadoop 0.19.0. It looks like a peak of 90 concurrent map
    tasks and 54 concurrent reduce tasks, not necessarily at the same
    time. You asked how many are held in memory. I think the number is
    100, the default.

    It looks like 130 MB peak heap size used on the jobtracker. The
    default 1000 is allocated, currently.

    I set mapred.jobtracker.completeuserjobs.maximum=5
    I increased HADOOP_HEAPSIZE to 2000 in hadoop-env.sh.
    I included the "kill -3" dump from the jobtracker, far below.

    I'll check back in 3 days, if and when the jobtracker fails again.

    Dave



    On Apr 6, 2009, at 9:42 PM, Amar Kamat wrote:

    David Kellogg wrote:
    I am running Hadoop streaming. After around 42 jobs on an 18-node
    cluster, the jobtracker stops responding. This happens on
    normally-working code. Here are the symptoms.

    1. A job is running, but it pauses with reduce stuck at XX%
    2. "hadoop job -list" hangs or takes a very long time to return
    3. In the Ganglia metrics on the Jobtracker node:
    a. jvm.metrics__JobTracker__gcTimeMillis rises above 20 k
    (20 seconds) before failure
    b. jvm.metrics__JobTracker__memHeapUsedM rises above 600
    before failure
    c. jvm.metrics__JobTracker__gcCount rises above 1 k before
    failure


    The ticker looks like this.

    09/04/06 03:06:28 INFO streaming.StreamJob: map 24% reduce 7%
    09/04/06 03:13:44 INFO streaming.StreamJob: map 25% reduce 7%
    After the 03:13:44 line, it hangs for more than 15 minutes.

    In the jobtracker log, I see this.

    2009-04-04 04:19:13,563 WARN org.apache.hadoop.hdfs.DFSClient:
    Error Recovery for block blk_-8143535428142072268_95993 failed
    because recovery from primary datanode 10.1.0.156:50010 failed 4
    times. Will retry...

    After restarting both dfs and mapreduce on all nodes, the problem
    goes away, and the formally non-working job proceeds without
    failure.
    David,
    What version are you using?
    There can be because of :
    1) Number of tasks in jobtracker's memory might exceed its limits.
    What is the total number of tasks in the jobtracker's memory? What
    is the jobtracker's heap size? Try increasing the heap size and
    also try setting the mapred.jobtracker.completeuserjobs.maximum
    parameter to some low value.
    2) Sometimes some slow/bad datanode causes the jobtracker to get
    stuck. As you have mentioned this might be the cause. Can you let
    us know the output of 'kill -3' on jobtracker process.
    Does anyone else see this problem?

    David Kellogg
    Full thread dump Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed
    mode):

    "LeaseChecker" daemon prio=10 tid=0x00007f0c3042a800 nid=0x3ef3
    waiting on condition [0x0000000040bf7000..0x0000000040bf7b90]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run
    (DFSClient.java:979)
    at java.lang.Thread.run(Thread.java:619)

    "IPC Server handler 9 on 54311" daemon prio=10
    tid=0x00007f0c307f0800 nid=0x3dd9 waiting on condition
    [0x000000004313f000..0x000000004313fd10]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 8 on 54311" daemon prio=10
    tid=0x00007f0c307ef400 nid=0x3dd8 waiting on condition
    [0x000000004303e000..0x000000004303ed90]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 7 on 54311" daemon prio=10
    tid=0x00007f0c307b0400 nid=0x3dd7 waiting on condition
    [0x0000000042f3d000..0x0000000042f3da10]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 6 on 54311" daemon prio=10
    tid=0x00007f0c307af000 nid=0x3dd6 waiting on condition
    [0x0000000042e3c000..0x0000000042e3ca90]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 5 on 54311" daemon prio=10
    tid=0x00007f0c30296800 nid=0x3dd5 waiting on condition
    [0x0000000042d3b000..0x0000000042d3bb10]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 4 on 54311" daemon prio=10
    tid=0x00007f0c302e7c00 nid=0x3dd4 waiting on condition
    [0x0000000042c3a000..0x0000000042c3ab90]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 3 on 54311" daemon prio=10
    tid=0x00007f0c302e6800 nid=0x3dd3 waiting on condition
    [0x0000000042b39000..0x0000000042b39c10]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 2 on 54311" daemon prio=10
    tid=0x00007f0c30337400 nid=0x3dd2 waiting on condition
    [0x0000000042a38000..0x0000000042a38c90]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 1 on 54311" daemon prio=10
    tid=0x00007f0c30336000 nid=0x3dd1 waiting on condition
    [0x0000000042937000..0x0000000042937d10]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server handler 0 on 54311" daemon prio=10
    tid=0x00007f0c302a6000 nid=0x3dd0 waiting on condition
    [0x0000000042836000..0x0000000042836d90]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3ca963a8> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.LinkedBlockingQueue.take
    (LinkedBlockingQueue.java:358)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:878)

    "IPC Server listener on 54311" daemon prio=10
    tid=0x00007f0c302b6c00 nid=0x3dcf runnable
    [0x0000000042735000..0x0000000042735a10]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
    215)
    at sun.nio.ch.EPollSelectorImpl.doSelect
    (EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect
    (SelectorImpl.java:69)
    - locked <0x00007f0c3ca96ac8> (a sun.nio.ch.Util$1)
    - locked <0x00007f0c3ca96ab0> (a java.util.Collections
    $UnmodifiableSet)
    - locked <0x00007f0c3ca96850> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
    at org.apache.hadoop.ipc.Server$Listener.run(Server.java:297)

    "IPC Server Responder" daemon prio=10 tid=0x00007f0c302b5800
    nid=0x3dce runnable [0x0000000042634000..0x0000000042634a90]
    java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:
    215)
    at sun.nio.ch.EPollSelectorImpl.doSelect
    (EPollSelectorImpl.java:65)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect
    (SelectorImpl.java:69)
    - locked <0x00007f0c3cba4810> (a sun.nio.ch.Util$1)
    - locked <0x00007f0c3cba47f8> (a java.util.Collections
    $UnmodifiableSet)
    - locked <0x00007f0c3cba4520> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.apache.hadoop.ipc.Server$Responder.run(Server.java:456)

    "expireLaunchingTasks" prio=10 tid=0x00007f0c308ba400 nid=0x3dcd
    waiting on condition [0x0000000042533000..0x0000000042533b10]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.mapred.JobTracker
    $ExpireLaunchingTasks.run(JobTracker.java:198)
    at java.lang.Thread.run(Thread.java:619)

    "retireJobs" prio=10 tid=0x00007f0c303a8800 nid=0x3dcc waiting on
    condition [0x0000000042432000..0x0000000042432b90]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.mapred.JobTracker$RetireJobs.run
    (JobTracker.java:351)
    at java.lang.Thread.run(Thread.java:619)

    "expireTrackers" prio=10 tid=0x00007f0c36940c00 nid=0x3dcb waiting
    on condition [0x0000000041c7f000..0x0000000041c7fc10]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.mapred.JobTracker$ExpireTrackers.run
    (JobTracker.java:281)
    at java.lang.Thread.run(Thread.java:619)

    "initJobs" prio=10 tid=0x00007f0c3693fc00 nid=0x3dca in Object.wait
    () [0x0000000041b7e000..0x0000000041b7ec90]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00007f0c3ca91d90> (a java.util.ArrayList)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.mapred.EagerTaskInitializationListener
    $JobInitThread.run(EagerTaskInitializationListener.java:51)
    - locked <0x00007f0c3ca91d90> (a java.util.ArrayList)
    at java.lang.Thread.run(Thread.java:619)

    "Thread-14" prio=10 tid=0x00007f0c3602f400 nid=0x3dc9 waiting on
    condition [0x0000000041a7d000..0x0000000041a7dd10]
    java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <0x00007f0c3cf03938> (a
    java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park
    (LockSupport.java:158)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer
    $ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
    at java.util.concurrent.DelayQueue.take(DelayQueue.java:160)
    at java.util.concurrent.DelayQueue.take(DelayQueue.java:39)
    at org.apache.hadoop.mapred.JobEndNotifier$1.run
    (JobEndNotifier.java:50)
    at java.lang.Thread.run(Thread.java:619)

    "Timer thread for monitoring mapred" daemon prio=10
    tid=0x00007f0c36d51000 nid=0x3dc4 in Object.wait()
    [0x000000004197c000..0x000000004197cb10]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.util.TimerThread.mainLoop(Timer.java:509)
    - locked <0x00007f0c3ceabad8> (a java.util.TaskQueue)
    at java.util.TimerThread.run(Timer.java:462)

    "Timer thread for monitoring jvm" daemon prio=10
    tid=0x00007f0c36d86000 nid=0x3dc3 in Object.wait()
    [0x00000000407b3000..0x00000000407b3b90]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.util.TimerThread.mainLoop(Timer.java:509)
    - locked <0x00007f0c3ceaeaf8> (a java.util.TaskQueue)
    at java.util.TimerThread.run(Timer.java:462)

    "SocketListener0-1" prio=10 tid=0x00007f0c36a4f000 nid=0x3dc2 in
    Object.wait() [0x000000004187b000..0x000000004187bc10]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at org.mortbay.util.ThreadPool$PoolThread.run
    (ThreadPool.java:522)
    - locked <0x00007f0c3ceb1760> (a org.mortbay.util.ThreadPool
    $PoolThread)

    "SocketListener0-0" prio=10 tid=0x00007f0c36d69400 nid=0x3dc1 in
    Object.wait() [0x000000004177a000..0x000000004177ac90]
    java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at org.mortbay.util.ThreadPool$PoolThread.run
    (ThreadPool.java:522)
    - locked <0x00007f0c3ceb1a40> (a org.mortbay.util.ThreadPool
    $PoolThread)

    "Acceptor ServerSocket
    [addr=0.0.0.0/0.0.0.0,port=0,localport=50030]" prio=10
    tid=0x00007f0c36b6a800 nid=0x3dc0 runnable
    [0x0000000041679000..0x0000000041679d10]
    java.lang.Thread.State: RUNNABLE
    at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
    - locked <0x00007f0c3cc3b5c8> (a java.net.SocksSocketImpl)
    at java.net.ServerSocket.implAccept(ServerSocket.java:453)
    at java.net.ServerSocket.accept(ServerSocket.java:421)
    at org.mortbay.util.ThreadedServer.acceptSocket
    (ThreadedServer.java:432)
    at org.mortbay.util.ThreadedServer$Acceptor.run
    (ThreadedServer.java:631)

    "SessionScavenger" daemon prio=10 tid=0x00007f0c303cc000 nid=0x3dbf
    waiting on condition [0x0000000040e0b000..0x0000000040e0bd90]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.mortbay.jetty.servlet.AbstractSessionManager
    $SessionScavenger.run(AbstractSessionManager.java:587)

    "SessionScavenger" daemon prio=10 tid=0x00007f0c30346800 nid=0x3dbe
    waiting on condition [0x0000000041de8000..0x0000000041de8a10]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.mortbay.jetty.servlet.AbstractSessionManager
    $SessionScavenger.run(AbstractSessionManager.java:587)

    "SessionScavenger" daemon prio=10 tid=0x00007f0c3031e400 nid=0x3dbd
    waiting on condition [0x0000000041578000..0x0000000041578a90]
    java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.mortbay.jetty.servlet.AbstractSessionManager
    $SessionScavenger.run(AbstractSessionManager.java:587)

    "Low Memory Detector" daemon prio=10 tid=0x00000000401c9400
    nid=0x3d7c runnable [0x0000000000000000..0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread1" daemon prio=10 tid=0x00000000401c6000 nid=0x3d7b
    waiting on condition [0x0000000000000000..0x00000000423302b0]
    java.lang.Thread.State: RUNNABLE

    "CompilerThread0" daemon prio=10 tid=0x00000000401c4000 nid=0x3d7a
    waiting on condition [0x0000000000000000..0x000000004222f330]
    java.lang.Thread.State: RUNNABLE

    "Signal Dispatcher" daemon prio=10 tid=0x00000000401c2800
    nid=0x3d79 waiting on condition
    [0x0000000000000000..0x0000000000000000]
    java.lang.Thread.State: RUNNABLE

    "Finalizer" daemon prio=10 tid=0x000000004019d800 nid=0x3d78 in
    Object.wait() [0x000000004202e000..0x000000004202eb90]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:
    116)
    - locked <0x00007f0c3cbfe4e0> (a
    java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:
    132)
    at java.lang.ref.Finalizer$FinalizerThread.run
    (Finalizer.java:159)

    "Reference Handler" daemon prio=10 tid=0x0000000040196c00
    nid=0x3d77 in Object.wait() [0x0000000041f2d000..0x0000000041f2dc10]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at java.lang.ref.Reference$ReferenceHandler.run
    (Reference.java:116)
    - locked <0x00007f0c3caf1408> (a java.lang.ref.Reference$Lock)

    "main" prio=10 tid=0x0000000040111c00 nid=0x3d73 in Object.wait()
    [0x0000000041477000..0x0000000041477ed0]
    java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00007f0c3cba4020> (a
    org.apache.hadoop.ipc.RPC$Server)
    at java.lang.Object.wait(Object.java:485)
    at org.apache.hadoop.ipc.Server.join(Server.java:1021)
    - locked <0x00007f0c3cba4020> (a org.apache.hadoop.ipc.RPC
    $Server)
    at org.apache.hadoop.mapred.JobTracker.offerService
    (JobTracker.java:1309)
    at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:
    2731)

    "VM Thread" prio=10 tid=0x0000000040191c00 nid=0x3d76 runnable

    "GC task thread#0 (ParallelGC)" prio=10 tid=0x000000004011c400
    nid=0x3d74 runnable

    "GC task thread#1 (ParallelGC)" prio=10 tid=0x000000004011d800
    nid=0x3d75 runnable

    "VM Periodic Task Thread" prio=10 tid=0x00000000401cb000 nid=0x3d7d
    waiting on condition

    JNI global references: 964

    Heap
    PSYoungGen total 198528K, used 68487K [0x00007f0c66540000,
    0x00007f0c77830000, 0x00007f0c7b290000)
    eden space 142144K, 48% used
    [0x00007f0c66540000,0x00007f0c6a821f70,0x00007f0c6f010000)
    from space 56384K, 0% used
    [0x00007f0c6f010000,0x00007f0c6f010000,0x00007f0c72720000)
    to space 57600K, 0% used
    [0x00007f0c73ff0000,0x00007f0c73ff0000,0x00007f0c77830000)
    PSOldGen total 682688K, used 672276K [0x00007f0c3ca90000,
    0x00007f0c66540000, 0x00007f0c66540000)
    object space 682688K, 98% used
    [0x00007f0c3ca90000,0x00007f0c65b15068,0x00007f0c66540000)
    PSPermGen total 21248K, used 16555K [0x00007f0c37690000,
    0x00007f0c38b50000, 0x00007f0c3ca90000)
    object space 21248K, 77% used
    [0x00007f0c37690000,0x00007f0c386bae60,0x00007f0c38b50000)
  • Owen O'Malley at Apr 3, 2009 at 2:36 am
    The last block of an HDFS block only occupies the required space. So a
    4k file only consumes 4k on disk.

    -- Owen
    On Apr 2, 2009, at 18:44, javateck javateck wrote:

    Can someone tell whether a file will occupy one or more blocks? for
    example, the default block size is 64MB, and if I save a 4k file to
    HDFS,
    will the 4K file occupy the whole 64MB block alone? so in this case,
    do I do
    need to configure the block size to 10k if most of my files are less
    than
    10K?

    thanks,

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 3, '09 at 1:45a
activeMay 7, '09 at 1:11a
posts7
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase