On Wed, May 6, 2009 at 1:46 PM, Foss User wrote:
Thanks for your response. I got a few more questions regarding
1. Does hadoop clients locally cache the data it last requested?
I don't know the DFS read path very well, but I don't believe there is any
built in cache here. There is an undocumented configuration variable
dfs.read.prefetch.size which affects DFSClient's prefetching of data ahead
of the current file position, but I don't want to give any answer I'm not
certain of. Hopefully someone else will chime in here.
I will answer that there is no *large* cache of data locally. HDFS is
optimized for sequential reads, where a cache is generally useless if not
2. Is the meta data for file blocks on data node kept in the
underlying OS's file system on namenode or is it kept in RAM of the
The block locations are kept in the RAM of the name node, and are updated
whenever a Datanode does a "block report". This is why the namenode is in
"safe mode" at startup until it has received block locations for some
configurable percentage of blocks from the datanodes.
3. If no mapper more mapper functions can be run on the node that
contains the data on which the mapper has to act on, is Hadoop
intelligent enough to run the new mappers on some machines within the
Yes, assuming you have configured a network topology script. Otherwise,
Hadoop has no magical knowledge of your network infrastructure, and it
treats the whole cluster as a single rack called /default-rack
4. When can a case like the above happen? I mean when can it happen
that the maximum number of mappers for a tasktracker configure has
been reached but Hadoop still needs to start more mappers?
If you have file with 100 blocks all on the same three nodes, but you have a
six node cluster, it will schedule some tasks on nodes that do not contain
the blocks, since it would rather keep the cluster utilized than keep all
data access local.
5. Are the multiple mappers and reducers run as separate threads
within the same TaskTracker process?
No, they are run as child processes.