FAQ
Hi All,

I am new to hadoop and I seem to be having a problem setting the number of
map tasks per node. I have an application that needs to load a significant
amount of data (about 1 GB) in memory to use in mapping data read from
files. I store this in a singleton and access it from my mapper. In order to
do this, I need to have exactly one map task run on a node at anyone time or
the memory requirements will far exceed my RAM. I am generating my own
Splits using an InputFormat class. This gives me roughly 10 splits per node
and I need each corresponding map task in a sequential fashion in the same
child jvm so that each map run does not have to reinitialize the data.

I have tried the following in a single node configuration and 2 splits:
- setting setNumMapTasks in the JobConf to 1 but hadoop seems to create 2
map tasks
- setting mapred.tasktracker.tasks.maximum property 1 - same result 2 map
tasks
- setting mapred.map.tasks property to 1 - same result 2 map tasks

I have yet to try it in a multiple node configuration. My target will using
20 AWS EC2 instances.

Can you please let me know what I should be doing or looking at to make sure
that I have maximum 1 map task per node. Also, how can I have multiple
splits being mapped within the same child jvm by different map tasks in
sequence?

Thanks in advance,
Dev

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 2 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 8, '07 at 4:35p
activeSep 11, '07 at 4:29p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase