FAQ
HI all,
I have written a MyCombineFileInputFormat extends from
CombineFileInputFormat , it can put multi files together into the same
inputsplit, it works fine for just a small amount of files. But if I try to
process 100,000 small files, the CombineFileInputFormat ran out of memory
while splitting input files:

The stack trace is:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2245)
at java.util.Arrays.copyOf(Arrays.java:2219)
at java.util.ArrayList.grow(ArrayList.java:213)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:187)
at java.util.ArrayList.addAll(ArrayList.java:532)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getHosts(CombineFileInputFormat.java:568)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:410)
at
org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at ac.ict.mapreduce.test.MR.main(MR.java:114)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

It seems that the rack-nodes mapping is too big? .... How to solve this
problem? thanks!

Search Discussions

  • Felix.徐 at Jul 16, 2012 at 2:24 pm
    My hadoop version is 1.0.1 and I didn't specify any parameter.

    2012/7/16 Felix.徐 <ygnhzeus@gmail.com>
    HI all,
    I have written a MyCombineFileInputFormat extends from
    CombineFileInputFormat , it can put multi files together into the same
    inputsplit, it works fine for just a small amount of files. But if I try to
    process 100,000 small files, the CombineFileInputFormat ran out of memory
    while splitting input files:

    The stack trace is:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2245)
    at java.util.Arrays.copyOf(Arrays.java:2219)
    at java.util.ArrayList.grow(ArrayList.java:213)
    at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:187)
    at java.util.ArrayList.addAll(ArrayList.java:532)
    at
    org.apache.hadoop.mapred.lib.CombineFileInputFormat.getHosts(CombineFileInputFormat.java:568)
    at
    org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:410)
    at
    org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
    at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
    at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
    at ac.ict.mapreduce.test.MR.main(MR.java:114)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

    It seems that the rack-nodes mapping is too big? .... How to solve this
    problem? thanks!

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedJul 16, '12 at 2:22p
activeJul 16, '12 at 2:24p
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Felix.徐: 2 posts

People

Translate

site design / logo © 2021 Grokbase