FAQ
I will do that like this: at each map task, I get the input file to
this mapper in the configure(), and manually read the first line of
that file to get the user ID. Then start running the map function.


-Gang


----- 原始邮件 ----
发件人: Raymond Jennings III <raymondjiii@yahoo.com>
收件人: common-user@hadoop.apache.org
发送日期: 2010/1/8 (周五) 4:23:15 下午
主 题: Is it possible to share a key across maps?

I have large files where the userid is the first line of each file. I want to use that value as the output of the map phase for each subsequent line of the file. If each map task gets a chunk of this file only one map task will read the key value from the first line. Is there anyway I can force the other map tasks to wait until this key is read and then somehow pass this value to other map tasks? Or is my reasoning incorrect? Thanks.


___________________________________________________________
好玩贺卡等你发,邮箱贺卡全新上线!
http://card.mail.cn.yahoo.com/

Search Discussions

  • Raymond Jennings III at Jan 9, 2010 at 12:55 am
    Hi, you do this in the map method (open the file and read the first line?) Could you explain a little more how you do it with configure(), thank you.

    --- On Fri, 1/8/10, Gang Luo wrote:
    From: Gang Luo <lgpublic@yahoo.com.cn>
    Subject: Re: Is it possible to share a key across maps?
    To: common-user@hadoop.apache.org
    Date: Friday, January 8, 2010, 4:46 PM
    I will do that like this: at each map
    task, I get the input file to
    this mapper in the configure(), and manually read the first
    line of
    that file to get the user ID. Then start running the map
    function.


    -Gang


    ----- 原始邮件 ----
    发件人: Raymond Jennings III <raymondjiii@yahoo.com>
    收件人: common-user@hadoop.apache.org
    发送日期: 2010/1/8 (周五) 4:23:15 下午
    主   题: Is it possible to share a key
    across maps?

    I have large files where the userid is the first line of
    each file.  I want to use that value as the output of
    the map phase for each subsequent line of the file.  If
    each map task gets a chunk of this file only one map task
    will read the key value from the first line.  Is there
    anyway I can force the other map tasks to wait until this
    key is read and then somehow pass this value to other map
    tasks?  Or is my reasoning incorrect?  Thanks.



    ___________________________________________________________

    好玩贺卡等你发,邮箱贺卡全新上线!

    http://card.mail.cn.yahoo.com/
  • Gang Luo at Jan 9, 2010 at 3:03 am
    I don't do that in map method, but in configure( JobConf ) method which runs ahead of any map method call in that map task. JobConf.get("map.input.file") can tell you which file this map task is processing. Use this path to read first line of corresponding file. All these are done in configure method, that means, before any map method is called.


    -Gang



    ----- 原始邮件 ----
    发件人: Raymond Jennings III <raymondjiii@yahoo.com>
    收件人: common-user@hadoop.apache.org
    发送日期: 2010/1/8 (周五) 7:54:30 下午
    主 题: Re: Is it possible to share a key across maps?

    Hi, you do this in the map method (open the file and read the first line?) Could you explain a little more how you do it with configure(), thank you.

    --- On Fri, 1/8/10, Gang Luo wrote:
    From: Gang Luo <lgpublic@yahoo.com.cn>
    Subject: Re: Is it possible to share a key across maps?
    To: common-user@hadoop.apache.org
    Date: Friday, January 8, 2010, 4:46 PM
    I will do that like this: at each map
    task, I get the input file to
    this mapper in the configure(), and manually read the first
    line of
    that file to get the user ID. Then start running the map
    function.


    -Gang


    ----- 原始邮件 ----
    发件人: Raymond Jennings III <raymondjiii@yahoo.com>
    收件人: common-user@hadoop.apache.org
    发送日期: 2010/1/8 (周五) 4:23:15 下午
    主 题: Is it possible to share a key
    across maps?

    I have large files where the userid is the first line of
    each file. I want to use that value as the output of
    the map phase for each subsequent line of the file. If
    each map task gets a chunk of this file only one map task
    will read the key value from the first line. Is there
    anyway I can force the other map tasks to wait until this
    key is read and then somehow pass this value to other map
    tasks? Or is my reasoning incorrect? Thanks.



    ___________________________________________________________

    好玩贺卡等你发,邮箱贺卡全新上线!

    http://card.mail.cn.yahoo.com/

    ___________________________________________________________
    好玩贺卡等你发,邮箱贺卡全新上线!
    http://card.mail.cn.yahoo.com/
  • Jeff Zhang at Jan 9, 2010 at 4:15 am
    Actually you can treat the mapper task as a template design pattern, here's
    the persuade code:

    Mapper.configure(JobConf)
    for each record in InputSplit:
    do Mapper.map(key,value,outputkey,outputvalue)
    Mapper.close()

    Any sub class of mapper can override the three method: configure(),
    map(),close() to do customization.



    2010/1/8 Gang Luo <lgpublic@yahoo.com.cn>
    I don't do that in map method, but in configure( JobConf ) method which
    runs ahead of any map method call in that map task.
    JobConf.get("map.input.file") can tell you which file this map task is
    processing. Use this path to read first line of corresponding file. All
    these are done in configure method, that means, before any map method is
    called.


    -Gang



    ----- 原始邮件 ----
    发件人: Raymond Jennings III <raymondjiii@yahoo.com>
    收件人: common-user@hadoop.apache.org
    发送日期: 2010/1/8 (周五) 7:54:30 下午
    主 题: Re: Is it possible to share a key across maps?

    Hi, you do this in the map method (open the file and read the first line?)
    Could you explain a little more how you do it with configure(), thank you.

    --- On Fri, 1/8/10, Gang Luo wrote:
    From: Gang Luo <lgpublic@yahoo.com.cn>
    Subject: Re: Is it possible to share a key across maps?
    To: common-user@hadoop.apache.org
    Date: Friday, January 8, 2010, 4:46 PM
    I will do that like this: at each map
    task, I get the input file to
    this mapper in the configure(), and manually read the first
    line of
    that file to get the user ID. Then start running the map
    function.


    -Gang


    ----- 原始邮件 ----
    发件人: Raymond Jennings III <raymondjiii@yahoo.com>
    收件人: common-user@hadoop.apache.org
    发送日期: 2010/1/8 (周五) 4:23:15 下午
    主 题: Is it possible to share a key
    across maps?

    I have large files where the userid is the first line of
    each file. I want to use that value as the output of
    the map phase for each subsequent line of the file. If
    each map task gets a chunk of this file only one map task
    will read the key value from the first line. Is there
    anyway I can force the other map tasks to wait until this
    key is read and then somehow pass this value to other map
    tasks? Or is my reasoning incorrect? Thanks.



    ___________________________________________________________

    好玩贺卡等你发,邮箱贺卡全新上线!

    http://card.mail.cn.yahoo.com/

    ___________________________________________________________
    好玩贺卡等你发,邮箱贺卡全新上线!
    http://card.mail.cn.yahoo.com/


    --
    Best Regards

    Jeff Zhang
  • Raymond Jennings III at Jan 11, 2010 at 5:44 pm
    It looks like what you are referring to is the deprecated class - which has made for some confusing conversations in the past. It seems like many users still use the older API and most of the examples still use it. I would like to stay with the more recent api which looks the call is actually "setup()" instead of configure(). Not sure if it's a one to one mapping though.

    --- On Fri, 1/8/10, Jeff Zhang wrote:
    From: Jeff Zhang <zjffdu@gmail.com>
    Subject: Re: Is it possible to share a key across maps?
    To: common-user@hadoop.apache.org
    Date: Friday, January 8, 2010, 11:15 PM
    Actually you can treat the mapper
    task as a template design pattern, here's
    the persuade code:

    Mapper.configure(JobConf)
    for each record in InputSplit:
    do
    Mapper.map(key,value,outputkey,outputvalue)
    Mapper.close()

    Any sub class of mapper can override the three method:
    configure(),
    map(),close() to do customization.



    2010/1/8 Gang Luo <lgpublic@yahoo.com.cn>
    I don't do that in map method, but in configure(
    JobConf ) method which
    runs ahead of any map method call in that map task.
    JobConf.get("map.input.file") can tell you which file
    this map task is
    processing. Use this path to read first line of
    corresponding file. All
    these are done in configure method, that means, before
    any map method is
    called.


    -Gang



    ----- 原始邮件 ----
    发件人: Raymond Jennings III <raymondjiii@yahoo.com>
    收件人: common-user@hadoop.apache.org
    发送日期: 2010/1/8 (周五) 7:54:30 下午
    主   题: Re: Is it possible to
    share a key across maps?
    Hi, you do this in the map method (open the file and
    read the first line?)
    Could you explain a little more how you do it
    with configure(), thank you.
    --- On Fri, 1/8/10, Gang Luo wrote:
    From: Gang Luo <lgpublic@yahoo.com.cn>
    Subject: Re: Is it possible to share a key across
    maps?
    To: common-user@hadoop.apache.org
    Date: Friday, January 8, 2010, 4:46 PM
    I will do that like this: at each map
    task, I get the input file to
    this mapper in the configure(), and manually read
    the first
    line of
    that file to get the user ID. Then start running
    the map
    function.


    -Gang


    ----- 原始邮件 ----
    发件人: Raymond Jennings III <raymondjiii@yahoo.com>
    收件人: common-user@hadoop.apache.org
    发送日期: 2010/1/8 (周五) 4:23:15 下午
    主   题: Is it possible to
    share a key
    across maps?

    I have large files where the userid is the first
    line of
    each file.  I want to use that value as the
    output of
    the map phase for each subsequent line of the
    file.  If
    each map task gets a chunk of this file only one
    map task
    will read the key value from the first
    line.  Is there
    anyway I can force the other map tasks to wait
    until this
    key is read and then somehow pass this value to
    other map
    tasks?  Or is my reasoning incorrect?
    Thanks.

    ___________________________________________________________
    好玩贺卡等你发,邮箱贺卡全新上线!

    ___________________________________________________________
    好玩贺卡等你发,邮箱贺卡全新上线!
    http://card.mail.cn.yahoo.com/


    --
    Best Regards

    Jeff Zhang
  • Raymond Jennings III at Jan 12, 2010 at 7:56 pm
    Hi Gang,
    I was able to use this on an older version that uses the JobClient class to run the job but not on the newer api with the Job class. The Job class appears to use a setup() method instead of a configure() method but the "map.input.file" attribute does not appear to be available via the conf class the setup() method. Have you tried to do what you described using the newer api? Thank you.

    --- On Fri, 1/8/10, Gang Luo wrote:
    From: Gang Luo <lgpublic@yahoo.com.cn>
    Subject: Re: Is it possible to share a key across maps?
    To: common-user@hadoop.apache.org
    Date: Friday, January 8, 2010, 10:03 PM
    I don't do that in map method, but in
    configure( JobConf ) method which runs ahead of any map
    method call in that map task. JobConf.get("map.input.file")
    can tell you which file this map task is processing. Use
    this path to read first line of corresponding file. All
    these are done in configure method, that means, before any
    map method is called.


    -Gang



    ----- 原始邮件 ----
    发件人: Raymond Jennings III <raymondjiii@yahoo.com>
    收件人: common-user@hadoop.apache.org
    发送日期: 2010/1/8 (周五) 7:54:30 下午
    主   题: Re: Is it possible to share a
    key across maps?

    Hi, you do this in the map method (open the file and read
    the first line?)  Could you explain a little more how
    you do it with configure(), thank you.

    --- On Fri, 1/8/10, Gang Luo wrote:
    From: Gang Luo <lgpublic@yahoo.com.cn>
    Subject: Re: Is it possible to share a key across maps?
    To: common-user@hadoop.apache.org
    Date: Friday, January 8, 2010, 4:46 PM
    I will do that like this: at each map
    task, I get the input file to
    this mapper in the configure(), and manually read the first
    line of
    that file to get the user ID. Then start running the map
    function.


    -Gang


    ----- 原始邮件 ----
    发件人: Raymond Jennings III <raymondjiii@yahoo.com>
    收件人: common-user@hadoop.apache.org
    发送日期: 2010/1/8 (周五) 4:23:15 下午
    主   题: Is it possible to share a key
    across maps?

    I have large files where the userid is the first line of
    each file.  I want to use that value as the output of
    the map phase for each subsequent line of the file.  If
    each map task gets a chunk of this file only one map task
    will read the key value from the first line.  Is there
    anyway I can force the other map tasks to wait until this
    key is read and then somehow pass this value to other map
    tasks?  Or is my reasoning incorrect? Thanks.


    ___________________________________________________________
    好玩贺卡等你发,邮箱贺卡全新上线!

    http://card.mail.cn.yahoo.com/


    ___________________________________________________________

    好玩贺卡等你发,邮箱贺卡全新上线!

    http://card.mail.cn.yahoo.com/
  • Amogh Vasekar at Jan 12, 2010 at 8:32 pm
    (Sorry for the spam if any, mails are bouncing back for me)

    Hi,
    In setup() use this,
    FileSplit split = (FileSplit)context.getInputSplit();
    split.getPath() will return you the Path.
    Hope this helps.

    Amogh


    On 1/13/10 1:25 AM, "Raymond Jennings III" wrote:

    Hi Gang,
    I was able to use this on an older version that uses the JobClient class to run the job but not on the newer api with the Job class. The Job class appears to use a setup() method instead of a configure() method but the "map.input.file" attribute does not appear to be available via the conf class the setup() method. Have you tried to do what you described using the newer api? Thank you.

    --- On Fri, 1/8/10, Gang Luo wrote:
  • Amogh Vasekar at Jan 13, 2010 at 8:09 am
    +1 for the documentation change in mapred-tutorial. Can we do that and publish using a normal apache account?

    Thanks,
    Amogh


    On 1/13/10 2:29 AM, "Raymond Jennings III" wrote:

    Amogh,
    You bet it helps! Thanks! Sometimes it's very difficult to map between the old and the new APIs. I was digging for that answer for awhile. Thanks.

    --- On Tue, 1/12/10, Amogh Vasekar wrote:
    From: Amogh Vasekar <amogh@yahoo-inc.com>
    Subject: Re: Is it possible to share a key across maps?
    To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>, "raymondjiii@yahoo.com" <raymondjiii@yahoo.com>, "core-user@hadoop.apache.org" <core-user@hadoop.apache.org>
    Date: Tuesday, January 12, 2010, 3:32 PM


    Re: Is it possible to share a key across
    maps?


    (Sorry for the spam if any, mails
    are bouncing back for me)



    Hi,

    In setup() use this,

    FileSplit split = (FileSplit)context.getInputSplit();

    split.getPath() will return you the Path.

    Hope this helps.



    Amogh





    On 1/13/10 1:25 AM, "Raymond Jennings III" wrote:



    Hi Gang,

    I was able to use this on an older version that uses the
    JobClient class to run the job but not on the newer api with
    the Job class. The Job class appears to use a setup()
    method instead of a configure() method but the
    "map.input.file" attribute does not appear to be
    available via the conf class the setup() method. Have
    you tried to do what you described using the newer api?
    Thank you.



    --- On Fri, 1/8/10, Gang Luo wrote:






  • Tom White at Jan 15, 2010 at 4:54 am
    Please submit a patch for the documentation change - perhaps at
    https://issues.apache.org/jira/browse/HADOOP-5973.

    Cheers,
    Tom
    On Wed, Jan 13, 2010 at 12:09 AM, Amogh Vasekar wrote:
    +1 for the documentation change in mapred-tutorial. Can we do that and publish using a normal apache account?

    Thanks,
    Amogh


    On 1/13/10 2:29 AM, "Raymond Jennings III" wrote:

    Amogh,
    You bet it helps!  Thanks!  Sometimes it's very difficult to map between the old and the new APIs.  I was digging for that answer for awhile.  Thanks.

    --- On Tue, 1/12/10, Amogh Vasekar wrote:
    From: Amogh Vasekar <amogh@yahoo-inc.com>
    Subject: Re: Is it possible to share a key across maps?
    To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>, "raymondjiii@yahoo.com" <raymondjiii@yahoo.com>, "core-user@hadoop.apache.org" <core-user@hadoop.apache.org>
    Date: Tuesday, January 12, 2010, 3:32 PM


    Re: Is it possible to share a key across
    maps?


    (Sorry for the spam if any, mails
    are bouncing back for me)



    Hi,

    In setup() use this,

    FileSplit split = (FileSplit)context.getInputSplit();

    split.getPath() will return you the Path.

    Hope this helps.



    Amogh





    On 1/13/10 1:25 AM, "Raymond Jennings III" wrote:



    Hi Gang,

    I was able to use this on an older version that uses the
    JobClient class to run the job but not on the newer api with
    the Job class.  The Job class appears to use a setup()
    method instead of a configure() method but the
    "map.input.file" attribute does not appear to be
    available via the conf class the setup() method.  Have
    you tried to do what you described using the newer api?
    Thank you.



    --- On Fri, 1/8/10, Gang Luo wrote:








  • Raymond Jennings III at Feb 17, 2010 at 8:25 pm
    Is there a typo in the Join.java example that comes with hadoop? It has the line:

    JobConf jobConf = new JobConf(getConf(), Sort.class);

    Shouldn't that be Join.class ? Is there an equivalent example that uses the later API instead of the deprecated calls?
  • Raymond Jennings III at Mar 25, 2010 at 8:41 pm
    for the input to a mapper or as the output of either mapper or reducer?
  • Raymond Jennings III at Mar 26, 2010 at 1:30 am
    Any pointers on what might be causing this? Thanks!



    java.io.IOException: Spill failed
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1006)
    at java.io.DataOutputStream.write(Unknown Source)
    at org.apache.hadoop.io.Text.write(Text.java:282)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
    at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:854)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:504)
    at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at TSPmrV2$TSPMapper3.MapEmit(TSPmrV2.java:587)
    at TSPmrV2$TSPMapper3.map(TSPmrV2.java:571)
    at TSPmrV2$TSPMapper3.map(TSPmrV2.java:1)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_201003181420_4634/attempt_201003181420_4634_m_000000_0/output/spill142.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
    at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1183)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:648)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1135)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJan 8, '10 at 9:46p
activeMar 26, '10 at 1:30a
posts12
users5
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase