FAQ
you can specify in your config the number of tasks per node. don't
think the second thing you mention is possible.

hadoop + ec2 has worked very well for me. good luck.

derek

On 9/8/07, Devajyoti Sarkar wrote:
Hi All,

I am new to hadoop and I seem to be having a problem setting the number of
map tasks per node. I have an application that needs to load a significant
amount of data (about 1 GB) in memory to use in mapping data read from
files. I store this in a singleton and access it from my mapper. In order to
do this, I need to have exactly one map task run on a node at anyone time or
the memory requirements will far exceed my RAM. I am generating my own
Splits using an InputFormat class. This gives me roughly 10 splits per node
and I need each corresponding map task in a sequential fashion in the same
child jvm so that each map run does not have to reinitialize the data.

I have tried the following in a single node configuration and 2 splits:
- setting setNumMapTasks in the JobConf to 1 but hadoop seems to create 2
map tasks
- setting mapred.tasktracker.tasks.maximum property 1 - same result 2 map
tasks
- setting mapred.map.tasks property to 1 - same result 2 map tasks

I have yet to try it in a multiple node configuration. My target will using
20 AWS EC2 instances.

Can you please let me know what I should be doing or looking at to make sure
that I have maximum 1 map task per node. Also, how can I have multiple
splits being mapped within the same child jvm by different map tasks in
sequence?

Thanks in advance,
Dev

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 8, '07 at 4:35p
activeSep 11, '07 at 4:29p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase