FAQ
hi, all.

I know dfs.blocksize key can affect the performance of a hadoop.

in my case, I have thousands of directories which are including so many
different sized input files.
(file sizes are from 10K to 1G).

in this case, How I can assume the dfs.blocksize to get a best performance?

11/02/22 17:45:49 INFO input.FileInputFormat: Total input paths to
process : *15407*
11/02/22 17:45:54 WARN conf.Configuration: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
11/02/22 17:45:54 INFO mapreduce.JobSubmitter: number of splits:*15411*
11/02/22 17:45:54 INFO mapreduce.JobSubmitter: adding the following
namenodes' delegation tokens:null
11/02/22 17:45:54 INFO mapreduce.Job: Running job: job_201102221737_0002
11/02/22 17:45:55 INFO mapreduce.Job: map 0% reduce 0%

thanks.

--
Junyoung Kim (juneng603@gmail.com)

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 22, '11 at 8:59a
activeFeb 23, '11 at 1:16a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Jun Young Kim: 2 posts Tish Heyssel: 1 post

People

Translate

site design / logo © 2022 Grokbase