FAQ
Sorry for sending this email again but I got no answers from the first one. Anyone please help or forward it to mail-list that would help.

2011-06-15



***********************************************
* Hailong Yang, PhD. Candidate
* Sino-German Joint Software Institute,
* School of Computer Science&Engineering, Beihang University
* Phone: (86-010)82315908
* Email: hailong.yang1115@gmail.com
* Address: G413, New Main Building in Beihang University,
* No.37 XueYuan Road,HaiDian District,
* Beijing,P.R.China,100191
***********************************************



发件人: hailong.yang1115
发送时间: 2011-06-10 13:28:46
收件人: general
抄送:
主题: Problems about the job counters

Dear all,

I am trying to the built-in example wordcount with nearly 15GB input. When the Hadoop job finished, I got the following counters.


CounterMapReduceTotal
Job CountersLaunched reduce tasks001
Rack-local map tasks0035
Launched map tasks002,318
Data-local map tasks002,283
FileSystemCountersFILE_BYTES_READ22,863,580,65617,654,943,34140,518,523,997
HDFS_BYTES_READ154,400,997,4590154,400,997,459
FILE_BYTES_WRITTEN33,490,829,40317,654,943,34151,145,772,744
HDFS_BYTES_WRITTEN02,747,356,7042,747,356,704


My question is what does the FILE_BYTES_READ counter mean? And what is the difference between FILE_BYTES_READ and HDFS_BYTES_READ? In my opinion, all the input is located in HDFS, so where does FILE_BYTES_READ come from during the map phase?


Any help will be appreciated!

Hailong

2011-06-10



***********************************************
* Hailong Yang, PhD. Candidate
* Sino-German Joint Software Institute,
* School of Computer Science&Engineering, Beihang University
* Phone: (86-010)82315908
* Email: hailong.yang1115@gmail.com
* Address: G413, New Main Building in Beihang University,
* No.37 XueYuan Road,HaiDian District,
* Beijing,P.R.China,100191
***********************************************

Search Discussions

  • Denny Ye at Jun 29, 2011 at 7:04 am
    hi Hailong

    An important phase between map and reduce task is 'Shuffle'. In map
    task part, all the output records filled to in-memory buffer and then spill
    to disk as local spill file (temporary file), if the map task took huge
    amount output and several local spill file. It necessary to 'merge' those
    spill files to single target file in map side.
    So, map task *read *local spill file content from disk to memory and going
    to merging records to disk again(FILE_BYTES_READ in map side means the merge
    phase between spill files in disk and memory, also FILE_BYTES_WITTERN is
    total bytes that spilled to disk).

    HDFS_BYTES_READ only represents the map input bytes from HDFS.

    Referenced blogs of mine to explains 'Shuffle' phase in Chinese.
    http://langyu.iteye.com/blog/992916

    --Regards
    Denny Ye

    2011/6/15 hailong.yang1115 <hailong.yang1115@gmail.com>
    **

    Sorry for sending this email again but I got no answers from the first one.
    Anyone please help or forward it to mail-list that would help.

    2011-06-15
    ------------------------------
    ***********************************************
    * Hailong Yang, PhD. Candidate
    * Sino-German Joint Software Institute,
    * School of Computer Science&Engineering, Beihang University
    * Phone: (86-010)82315908
    * Email: hailong.yang1115@gmail.com
    * Address: G413, New Main Building in Beihang University,
    * No.37 XueYuan Road,HaiDian District,
    * Beijing,P.R.China,100191
    ***********************************************
    ------------------------------
    *发件人:* hailong.yang1115
    *发送时间:* 2011-06-10 13:28:46
    *收件人:* general
    *抄送:*
    *主题:* Problems about the job counters

    Dear all,

    I am trying to the built-in example wordcount with nearly 15GB input. When
    the Hadoop job finished, I got the following counters.


    Counter Map Reduce Total Job Counters Launched reduce tasks 0 0 1 Rack-local
    map tasks 0 0 35 Launched map tasks 0 0 2,318 Data-local map tasks 0 0
    2,283 FileSystemCounters FILE_BYTES_READ 22,863,580,656 17,654,943,341
    40,518,523,997 HDFS_BYTES_READ 154,400,997,459 0 154,400,997,459
    FILE_BYTES_WRITTEN 33,490,829,403 17,654,943,341 51,145,772,744
    HDFS_BYTES_WRITTEN 0 2,747,356,704 2,747,356,704

    My question is what does the FILE_BYTES_READ counter mean? And what is the
    difference between FILE_BYTES_READ and HDFS_BYTES_READ? In my opinion, all
    the input is located in HDFS, so where does FILE_BYTES_READ come from during
    the map phase?


    Any help will be appreciated!

    Hailong

    2011-06-10
    ------------------------------
    ***********************************************
    * Hailong Yang, PhD. Candidate
    * Sino-German Joint Software Institute,
    * School of Computer Science&Engineering, Beihang University
    * Phone: (86-010)82315908
    * Email: hailong.yang1115@gmail.com
    * Address: G413, New Main Building in Beihang University,
    * No.37 XueYuan Road,HaiDian District,
    * Beijing,P.R.China,100191
    ***********************************************
  • Hailong.yang1115 at Jun 29, 2011 at 7:20 am
    Hi Denny,

    Thank you very much for your reply. I think you explained the problem quite clearly. I also read your blog and the articles about Hadoop mechanism are very insightful.


    Cheers!

    Hailong

    2011-06-29



    ***********************************************
    * Hailong Yang, PhD. Candidate
    * Sino-German Joint Software Institute,
    * School of Computer Science&Engineering, Beihang University
    * Phone: (86-010)82315908
    * Email: hailong.yang1115@gmail.com
    * Address: G413, New Main Building in Beihang University,
    * No.37 XueYuan Road,HaiDian District,
    * Beijing,P.R.China,100191
    ***********************************************



    发件人: Denny Ye
    发送时间: 2011-06-29 15:03:36
    收件人: hailong.yang1115; hdfs-user
    抄送:
    主题: Re: Fw: Problems about the job counters

    hi Hailong

    An important phase between map and reduce task is 'Shuffle'. In map task part, all the output records filled to in-memory buffer and then spill to disk as local spill file (temporary file), if the map task took huge amount output and several local spill file. It necessary to 'merge' those spill files to single target file in map side.
    So, map task read local spill file content from disk to memory and going to merging records to disk again(FILE_BYTES_READ in map side means the merge phase between spill files in disk and memory, also FILE_BYTES_WITTERN is total bytes that spilled to disk).


    HDFS_BYTES_READ only represents the map input bytes from HDFS.


    Referenced blogs of mine to explains 'Shuffle' phase in Chinese.
    http://langyu.iteye.com/blog/992916


    --Regards
    Denny Ye


    2011/6/15 hailong.yang1115 <hailong.yang1115@gmail.com>


    Sorry for sending this email again but I got no answers from the first one. Anyone please help or forward it to mail-list that would help.

    2011-06-15



    ***********************************************
    * Hailong Yang, PhD. Candidate
    * Sino-German Joint Software Institute,
    * School of Computer Science&Engineering, Beihang University
    * Phone: (86-010)82315908
    * Email: hailong.yang1115@gmail.com
    * Address: G413, New Main Building in Beihang University,
    * No.37 XueYuan Road,HaiDian District,
    * Beijing,P.R.China,100191
    ***********************************************



    发件人: hailong.yang1115
    发送时间: 2011-06-10 13:28:46
    收件人: general
    抄送:
    主题: Problems about the job counters

    Dear all,

    I am trying to the built-in example wordcount with nearly 15GB input. When the Hadoop job finished, I got the following counters.


    CounterMapReduceTotal
    Job CountersLaunched reduce tasks001
    Rack-local map tasks0035
    Launched map tasks002,318
    Data-local map tasks002,283
    FileSystemCountersFILE_BYTES_READ22,863,580,65617,654,943,34140,518,523,997
    HDFS_BYTES_READ154,400,997,4590154,400,997,459
    FILE_BYTES_WRITTEN33,490,829,40317,654,943,34151,145,772,744
    HDFS_BYTES_WRITTEN02,747,356,7042,747,356,704


    My question is what does the FILE_BYTES_READ counter mean? And what is the difference between FILE_BYTES_READ and HDFS_BYTES_READ? In my opinion, all the input is located in HDFS, so where does FILE_BYTES_READ come from during the map phase?


    Any help will be appreciated!

    Hailong

    2011-06-10



    ***********************************************
    * Hailong Yang, PhD. Candidate
    * Sino-German Joint Software Institute,
    * School of Computer Science&Engineering, Beihang University
    * Phone: (86-010)82315908
    * Email: hailong.yang1115@gmail.com
    * Address: G413, New Main Building in Beihang University,
    * No.37 XueYuan Road,HaiDian District,
    * Beijing,P.R.China,100191
    ***********************************************

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedJun 15, '11 at 3:00a
activeJun 29, '11 at 7:20a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Hailong.yang1115: 2 posts Denny Ye: 1 post

People

Translate

site design / logo © 2022 Grokbase