FAQ
Hi,

I'm trying to run Terrier-3.0 on hadoop-0.18.3, with general configuration
settings. My hadoop cluster is running on 3 nodes, (1 master, 3 slaves). If
I try to run Terrier Basic Single Pass Indexing (with default
configurations) on a very small data ~1 GB, it works fine. But for larger
data ~10 GB, I get the error:

attempt_201010272120_0001_m_000002_0: java.lang.OutOfMemoryError: GC
overhead limit exceeded
attempt_201010272120_0001_m_000002_0: at
org.terrier.structures.indexing.singlepass.hadoop.SplitEmittedTerm.createNewTerm(SplitEmittedTerm.java:64)
attempt_201010272120_0001_m_000002_0: at
org.terrier.structures.indexing.singlepass.hadoop.HadoopRunWriter.writeTerm(HadoopRunWriter.java:84)
attempt_201010272120_0001_m_000002_0: at
org.terrier.structures.indexing.singlepass.MemoryPostings.writeToWriter(MemoryPostings.java:151)
attempt_201010272120_0001_m_000002_0: at
org.terrier.structures.indexing.singlepass.MemoryPostings.finish(MemoryPostings.java:112)
attempt_201010272120_0001_m_000002_0: at
org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer.forceFlush(Hadoop_BasicSinglePassIndexer.java:308)
attempt_201010272120_0001_m_000002_0: at
org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer.closeMap(Hadoop_BasicSinglePassIndexer.java:419)
attempt_201010272120_0001_m_000002_0: at
org.terrier.indexing.hadoop.Hadoop_BasicSinglePassIndexer.close(Hadoop_BasicSinglePassIndexer.java:236)
attempt_201010272120_0001_m_000002_0: at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
attempt_201010272120_0001_m_000002_0: at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
attempt_201010272120_0001_m_000002_0: at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2198)


Also, I tried running Mahout-0.3 on hadoop-0.20.2. It works fine for tasks
on small datasets ( < 1 MB). But for even slightly larger datasets (~30 MB)
it starts giving error:

Error: java.lang.OutOfMemoryError: Java heap
space
at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.resize(TransactionTree.java:446)

at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.createNode(TransactionTree.java:409)

at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.addPattern(TransactionTree.java:202)

at
org.apache.mahout.fpm.pfpgrowth.TransactionTree.getCompressedTree(TransactionTree.java:285)

at
org.apache.mahout.fpm.pfpgrowth.ParallelFPGrowthCombiner.reduce(ParallelFPGrowthCombiner.java:51)
at
org.apache.mahout.fpm.pfpgrowth.ParallelFPGrowthCombiner.reduce(ParallelFPGrowthCombiner.java:33)
at
org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)

at
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)

at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1265)

at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:686)

at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1173)


I'm absolutely stuck. I've tried increasing the java heap size in
hadoop-env.sh. I've tried using parallelGC. Nothing seems to work.

Can anyone help me please?

Thanks.

Regards,
Geet

--
Geet Garg
Final Year Dual Degree Student
Department of Computer Science and Engineering
Indian Institute of Technology Kharagpur
INDIA
Phone: +91 97344 26187
e-Mail: garggeetus@gmail.com

Search Discussions

  • Allen Wittenauer at Oct 29, 2010 at 1:58 pm

    On Oct 27, 2010, at 10:44 AM, Geet Garg wrote:

    I'm absolutely stuck. I've tried increasing the java heap size in
    hadoop-env.sh. I've tried using parallelGC. Nothing seems to work.

    Can anyone help me please?
    hadoop-env.sh is for the daemons. You need to increase the heap in mapred.child.java.opts.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 29, '10 at 6:10a
activeOct 29, '10 at 1:58p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Allen Wittenauer: 1 post Geet Garg: 1 post

People

Translate

site design / logo © 2022 Grokbase