Hi All,

I am new to hadoop and I seem to be having a problem setting the number of
map tasks per node. I have an application that needs to load a significant
amount of data (about 1 GB) in memory to use in mapping data read from
files. I store this in a singleton and access it from my mapper. In order to
do this, I need to have exactly one map task run on a node at anyone time or
the memory requirements will far exceed my RAM. I am generating my own
Splits using an InputFormat class. This gives me roughly 10 splits per node
and I need each corresponding map task in a sequential fashion in the same
child jvm so that each map run does not have to reinitialize the data.

I have tried the following in a single node configuration and 2 splits:
- setting setNumMapTasks in the JobConf to 1 but hadoop seems to create 2
map tasks
- setting mapred.tasktracker.tasks.maximum property 1 - same result 2 map
- setting mapred.map.tasks property to 1 - same result 2 map tasks

I have yet to try it in a multiple node configuration. My target will using
20 AWS EC2 instances.

Can you please let me know what I should be doing or looking at to make sure
that I have maximum 1 map task per node. Also, how can I have multiple
splits being mapped within the same child jvm by different map tasks in

Thanks in advance,

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupcommon-user @
postedSep 8, '07 at 4:35p
activeSep 11, '07 at 4:29p



site design / logo © 2023 Grokbase