FAQ
So, the problem is we have a crap ton of small files, and a limited sized cluster (only 4 nodes, just up from 2, yay!) as we are just starting to use Hadoop. With our current hardware, we have 32 Map slots, and >1500 files. The Task startup time is, frankly, killing us, and at this time we can't easily concat them all into a single file as we are receiving them in, and we want to run some analysis on them while they are still inbound. Several months ago we played around with the JVM re-use, but if I recall correctly a Task stays keyed to an individual MR Job until it hit's it's TTL, and then that slot becomes available for another Job. Is there a way to adjust this TTL? Or be able to re-use the JVM for a different Job? This is all with 0.21.0.


--Aaron

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 2, '11 at 5:34p
activeMar 2, '11 at 5:34p
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Aaron Baff: 1 post

People

Translate

site design / logo © 2022 Grokbase