So, the problem is we have a crap ton of small files, and a limited sized cluster (only 4 nodes, just up from 2, yay!) as we are just starting to use Hadoop. With our current hardware, we have 32 Map slots, and >1500 files. The Task startup time is, frankly, killing us, and at this time we can't easily concat them all into a single file as we are receiving them in, and we want to run some analysis on them while they are still inbound. Several months ago we played around with the JVM re-use, but if I recall correctly a Task stays keyed to an individual MR Job until it hit's it's TTL, and then that slot becomes available for another Job. Is there a way to adjust this TTL? Or be able to re-use the JVM for a different Job? This is all with 0.21.0.


Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
postedMar 2, '11 at 5:34p
activeMar 2, '11 at 5:34p

1 user in discussion

Aaron Baff: 1 post



site design / logo © 2022 Grokbase