FAQ
Hi All,

I am getting the following error on running a job on about 12 TB of data.
This happens before any mappers or reducers are launched.
Also the job starts fine if I reduce the amount of input data. Any ideas as
to what may be the reason for this error?

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
exceeded
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71)
at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
at org.apache.hadoop.io.UTF8.writeChars(UTF8.java:278)
at org.apache.hadoop.io.UTF8.writeString(UTF8.java:250)
at org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:131)
at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:111)
at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:741)
at org.apache.hadoop.ipc.Client.call(Client.java:1011)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
at $Proxy6.getBlockLocations(Unknown Source)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy6.getBlockLocations(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:359)
at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:380)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:178)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:234)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:946)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:938)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:854)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:807)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:807)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:781)
at
org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:876)

Gagan Bansal

Search Discussions

  • Harsh J at Jul 24, 2011 at 9:28 am
    Try with a higher heap size. Maybe the issue is due to too many splits
    being generated at the client side, leading to the heap filling up
    (iirc default heap would be used for RunJar ops, unless you pass
    HADOOP_CLIENT_OPTS=-Xmx512m or so to raise it).
    On Sun, Jul 24, 2011 at 2:36 PM, Gagan Bansal wrote:
    Hi All,
    I am getting the following error on running a job on about 12 TB of data.
    This happens before any mappers or reducers are launched.
    Also the job starts fine if I reduce the amount of input data. Any ideas as
    to what may be the reason for this error?
    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
    exceeded
    at java.util.Arrays.copyOf(Arrays.java:2786)
    at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71)
    at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
    at org.apache.hadoop.io.UTF8.writeChars(UTF8.java:278)
    at org.apache.hadoop.io.UTF8.writeString(UTF8.java:250)
    at org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:131)
    at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:111)
    at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:741)
    at org.apache.hadoop.ipc.Client.call(Client.java:1011)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
    at $Proxy6.getBlockLocations(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy6.getBlockLocations(Unknown Source)
    at
    org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:359)
    at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:380)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:178)
    at
    org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:234)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:946)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:938)
    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:854)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:807)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:807)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:781)
    at
    org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:876)
    Gagan Bansal


    --
    Harsh J
  • Joey Echeverria at Jul 24, 2011 at 9:35 am
    You're running out of memory trying to generate the splits. You need to set
    a bigger heap for your driver program. Assuming you're using the hadoop jar
    command to launch your job, you can do this by setting HADOOP_HEAPSIZE to a
    larger value in $HADOOP_HOME/conf/hadoop-env.sh

    -Joey
    On Jul 24, 2011 5:07 AM, "Gagan Bansal" wrote:
    Hi All,

    I am getting the following error on running a job on about 12 TB of data.
    This happens before any mappers or reducers are launched.
    Also the job starts fine if I reduce the amount of input data. Any ideas as
    to what may be the reason for this error?

    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
    exceeded
    at java.util.Arrays.copyOf(Arrays.java:2786)
    at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71)
    at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
    at org.apache.hadoop.io.UTF8.writeChars(UTF8.java:278)
    at org.apache.hadoop.io.UTF8.writeString(UTF8.java:250)
    at
    org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:131)
    at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:111)
    at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:741)
    at org.apache.hadoop.ipc.Client.call(Client.java:1011)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
    at $Proxy6.getBlockLocations(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy6.getBlockLocations(Unknown Source)
    at
    org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:359)
    at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:380)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:178)
    at
    org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:234)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:946)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:938)
    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:854)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:807)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at
    org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:807)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:781)
    at
    org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:876)
    Gagan Bansal
  • Arun C Murthy at Jul 25, 2011 at 6:35 am

    On Jul 24, 2011, at 2:34 AM, Joey Echeverria wrote:

    You're running out of memory trying to generate the splits. You need to set a bigger heap for your driver program. Assuming you're using the hadoop jar command to launch your job, you can do this by setting HADOOP_HEAPSIZE to a larger value in $HADOOP_HOME/conf/hadoop-env.sh
    As Harsh pointed out, please use HADOOP_CLIENT_OPTS and not HADOOP_HEAPSIZE for the job-client.

    Arun
    -Joey
    On Jul 24, 2011 5:07 AM, "Gagan Bansal" wrote:
    Hi All,

    I am getting the following error on running a job on about 12 TB of data.
    This happens before any mappers or reducers are launched.
    Also the job starts fine if I reduce the amount of input data. Any ideas as
    to what may be the reason for this error?

    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
    exceeded
    at java.util.Arrays.copyOf(Arrays.java:2786)
    at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71)
    at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
    at org.apache.hadoop.io.UTF8.writeChars(UTF8.java:278)
    at org.apache.hadoop.io.UTF8.writeString(UTF8.java:250)
    at org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:131)
    at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:111)
    at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:741)
    at org.apache.hadoop.ipc.Client.call(Client.java:1011)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
    at $Proxy6.getBlockLocations(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy6.getBlockLocations(Unknown Source)
    at
    org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:359)
    at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:380)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:178)
    at
    org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:234)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:946)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:938)
    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:854)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:807)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:807)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:781)
    at
    org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:876)

    Gagan Bansal
  • Gagan Bansal at Jul 25, 2011 at 5:45 pm
    Thanks everyone.

    After setting the HADOOP_CLIENT_OPTS, the error changed to that the number
    of tasks my job was launching was more than 100,000 which I believe is the
    maximum set on my cluster.
    This was because I had more than 100,000 files input to my job. I merged
    some files so that the total number of files was under 100,000 to get past
    this error.

    Gagan Bansal

    On Sun, Jul 24, 2011 at 11:29 PM, Arun C Murthy wrote:


    On Jul 24, 2011, at 2:34 AM, Joey Echeverria wrote:

    You're running out of memory trying to generate the splits. You need to set
    a bigger heap for your driver program. Assuming you're using the hadoop jar
    command to launch your job, you can do this by setting HADOOP_HEAPSIZE to a
    larger value in $HADOOP_HOME/conf/hadoop-env.sh


    As Harsh pointed out, please use HADOOP_CLIENT_OPTS and not HADOOP_HEAPSIZE
    for the job-client.

    Arun

    -Joey
    On Jul 24, 2011 5:07 AM, "Gagan Bansal" wrote:
    Hi All,

    I am getting the following error on running a job on about 12 TB of data.
    This happens before any mappers or reducers are launched.
    Also the job starts fine if I reduce the amount of input data. Any ideas as
    to what may be the reason for this error?

    Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
    exceeded
    at java.util.Arrays.copyOf(Arrays.java:2786)
    at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71)
    at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
    at org.apache.hadoop.io.UTF8.writeChars(UTF8.java:278)
    at org.apache.hadoop.io.UTF8.writeString(UTF8.java:250)
    at
    org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:131)
    at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:111)
    at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:741)
    at org.apache.hadoop.ipc.Client.call(Client.java:1011)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
    at $Proxy6.getBlockLocations(Unknown Source)
    at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
    org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy6.getBlockLocations(Unknown Source)
    at
    org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:359)
    at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:380)
    at
    org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:178)
    at
    org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:234)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:946)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:938)
    at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:854)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:807)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at
    org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:807)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:781)
    at
    org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:876)
    Gagan Bansal

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedJul 24, '11 at 9:07a
activeJul 25, '11 at 5:45p
posts5
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase