Dear all,

I am writing a map-reduce job today. Which I hope I could use different
format for the Mapper and Combiner. I am using the Text as the format of the
Mapper and MapWritable as the format of the format.

But it looks the hadoop didn't support that yet?

I have some code like the following:

public class RawLogMapper extends Mapper<LongWritable, Text, Text, Text> {

public class RawLogCombiner extends Reducer<Text, Text, Text, MapWritable> {

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(MapWritable.class);
job.setOutputFormatClass(TextOutputFormat.class);

But it failed and the logs told me that there are type mismatch. Is there
anyway I could use different type for the VALUEOUT for the mapper and
combiner?

Thanks


Best wishes,
Xu Wenhao

Search Discussions

  • Harsh J at Feb 16, 2011 at 11:48 am
    The combiner must "have the same input and output key types and the
    same input and output value types" (as per the docs for setting one.)

    The combined outputs are treated as typical map outputs after
    processing, so that the reducer still applies on it properly. For this
    to work, your combiner can't change the types expected by the Reducer
    from the Mappers as input. Perhaps making the map itself emit <Text,
    MapWritable> in some form may help you still use your combiner
    (although with a little more expense).
    On Wed, Feb 16, 2011 at 4:32 PM, Stanley Xu wrote:
    Dear all,
    I am writing a map-reduce job today. Which I hope I could use different
    format for the Mapper and Combiner. I am using the Text as the format of the
    Mapper and MapWritable as the format of the format.
    But it looks the hadoop didn't support that yet?
    I have some code like the following:
    public class RawLogMapper extends Mapper<LongWritable, Text, Text, Text> {
    public class RawLogCombiner extends Reducer<Text, Text, Text, MapWritable> {
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(MapWritable.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    But it failed and the logs told me that there are type mismatch. Is there
    anyway I could use different type for the VALUEOUT for the mapper and
    combiner?
    Thanks

    Best wishes,
    Xu Wenhao


    --
    Harsh J
    www.harshj.com
  • MONTMORY Alain at Feb 16, 2011 at 1:39 pm
    Hi,

    I think you could use different type for mapper and combiner, they are not linked together but suppose :

    maper < KeyTypeA, ValuetypeB>
    reducer < KeyTypeC, ValuetypeD>

    in your mapper you have to emit :
    public void map(KeyTypeA, ValuetypeB)
    {
    ....

    context.write(KeyTypeC, ValuetypeD);

    }

    hopes this help!

    regards

    Alain

    [@@THALES GROUP RESTRICTED@@]

    De : Stanley Xu
    Envoyé : mercredi 16 février 2011 12:02
    À : mapreduce-user@hadoop.apache.org
    Objet : Could we use different output Format for the Mapper and Combiner?

    Dear all,

    I am writing a map-reduce job today. Which I hope I could use different format for the Mapper and Combiner. I am using the Text as the format of the Mapper and MapWritable as the format of the format.

    But it looks the hadoop didn't support that yet?

    I have some code like the following:

    public class RawLogMapper extends Mapper<LongWritable, Text, Text, Text> {

    public class RawLogCombiner extends Reducer<Text, Text, Text, MapWritable> {

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(MapWritable.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    But it failed and the logs told me that there are type mismatch. Is there anyway I could use different type for the VALUEOUT for the mapper and combiner?

    Thanks


    Best wishes,
    Xu Wenhao
  • Benjamin Hiller at Feb 16, 2011 at 9:33 pm
    Hi,

    is it possible to determine the source (the filename for example) of a key-value pair in the mapper? What I need to do is to differentiate between two different sources, although the records of each source are of the same kind (so I can't differentiate between the sources by looking at the records). I guess I could do this by injecting some kind of tag in the RecordReader or elsewhere (which I haven't figured out yet either) but I hope there is an easier way to do this, preferably right there in the mapper.

    As additional information: I haven't changed anything in the RecordReader or InputSplit yet, because I am working with text-files and it works just fine without any modification. So, if I have just missed something basic regarding the above question, it would be nice if you could point me to some information about it.

    Thanks,
    Ben
  • Alex Kozlov at Feb 16, 2011 at 9:38 pm
    There is a way to get the file name in the new mapreduce API:

    fileName = ((FileSplit) context.getInputSplit()).getPath().toString();

    You usually do it in the setup() method.
    On Wed, Feb 16, 2011 at 1:32 PM, Benjamin Hiller wrote:

    Hi,

    is it possible to determine the source (the filename for example) of a
    key-value pair in the mapper? What I need to do is to differentiate between
    two different sources, although the records of each source are of the same
    kind (so I can't differentiate between the sources by looking at the
    records). I guess I could do this by injecting some kind of tag in the
    RecordReader or elsewhere (which I haven't figured out yet either) but I
    hope there is an easier way to do this, preferably right there in the
    mapper.

    As additional information: I haven't changed anything in the RecordReader
    or InputSplit yet, because I am working with text-files and it works just
    fine without any modification. So, if I have just missed something basic
    regarding the above question, it would be nice if you could point me to some
    information about it.

    Thanks,
    Ben
  • Benjamin Hiller at Feb 16, 2011 at 9:59 pm
    Thank you, that works fine. =)

    ----- Original Message -----
    From: Alex Kozlov
    To: mapreduce-user@hadoop.apache.org
    Cc: Benjamin Hiller
    Sent: Wednesday, February 16, 2011 10:37 PM
    Subject: Re: Is it possible to determine the source of a value in the Mapper?


    There is a way to get the file name in the new mapreduce API:

    fileName = ((FileSplit) context.getInputSplit()).getPath().toString();

    You usually do it in the setup() method.


    On Wed, Feb 16, 2011 at 1:32 PM, Benjamin Hiller wrote:

    Hi,

    is it possible to determine the source (the filename for example) of a key-value pair in the mapper? What I need to do is to differentiate between two different sources, although the records of each source are of the same kind (so I can't differentiate between the sources by looking at the records). I guess I could do this by injecting some kind of tag in the RecordReader or elsewhere (which I haven't figured out yet either) but I hope there is an easier way to do this, preferably right there in the mapper.

    As additional information: I haven't changed anything in the RecordReader or InputSplit yet, because I am working with text-files and it works just fine without any modification. So, if I have just missed something basic regarding the above question, it would be nice if you could point me to some information about it.

    Thanks,
    Ben
  • Harold Lim at Feb 16, 2011 at 10:53 pm
    Hi Ben,

    You can do something like this:

    ((FileSplit) context.getInputSplit()).getPath()


    -Harold

    --- On Wed, 2/16/11, Benjamin Hiller wrote:

    From: Benjamin Hiller <Benjamin.Hiller@urz.uni-heidelberg.de>
    Subject: Is it possible to determine the source of a value in the Mapper?
    To: mapreduce-user@hadoop.apache.org
    Date: Wednesday, February 16, 2011, 4:32 PM





    _filtered #yiv1779397731 {
    font-family:Calibri;}
    _filtered #yiv1779397731 {
    font-family:Tahoma;}
    _filtered #yiv1779397731 {margin:70.85pt 70.85pt 70.85pt 70.85pt;}
    #yiv1779397731 P.yiv1779397731MsoNormal {
    MARGIN:0cm 0cm 0pt;FONT-FAMILY:"serif";FONT-SIZE:12pt;}
    #yiv1779397731 LI.yiv1779397731MsoNormal {
    MARGIN:0cm 0cm 0pt;FONT-FAMILY:"serif";FONT-SIZE:12pt;}
    #yiv1779397731 DIV.yiv1779397731MsoNormal {
    MARGIN:0cm 0cm 0pt;FONT-FAMILY:"serif";FONT-SIZE:12pt;}
    #yiv1779397731 A:link {
    COLOR:blue;TEXT-DECORATION:underline;}
    #yiv1779397731 SPAN.yiv1779397731MsoHyperlink {
    COLOR:blue;TEXT-DECORATION:underline;}
    #yiv1779397731 A:visited {
    COLOR:purple;TEXT-DECORATION:underline;}
    #yiv1779397731 SPAN.yiv1779397731MsoHyperlinkFollowed {
    COLOR:purple;TEXT-DECORATION:underline;}
    #yiv1779397731 SPAN.yiv1779397731EmailStyle17 {
    FONT-FAMILY:"sans-serif";COLOR:#1f497d;}
    #yiv1779397731 .yiv1779397731MsoChpDefault {
    }
    #yiv1779397731 DIV.yiv1779397731Section1 {
    }

    Hi,

    is it possible to determine the source (the
    filename for example) of a key-value pair in the mapper? What I need to do is to
    differentiate between two different sources, although the records of each source
    are of the same kind (so I can't differentiate between the sources by looking at
    the records). I guess I could do this by injecting some kind of tag in the
    RecordReader or elsewhere (which I haven't figured out yet either) but I hope
    there is an easier way to do this, preferably right there in the
    mapper.

    As additional information: I haven't changed
    anything in the RecordReader or InputSplit yet, because I am working with
    text-files and it works just fine without any modification. So, if I have just
    missed something basic regarding the above question, it would be nice if you
    could point me to some information about it.

    Thanks,
    Ben
  • Stanley Xu at Feb 17, 2011 at 7:02 am
    Hi Alain,

    I thought Hash is correct for I found that from the old mails.

    http://search-hadoop.com/m/eSd3VxxvkC1/combiner+input+output+format&subj=What+s+a+valid+combiner+

    And per my test, the combiner probably could not has the different output
    type with the mapper, at least until v 0.20.3.

    Thanks for both of you.

    Best wishes,
    Stanley Xu


    On Wed, Feb 16, 2011 at 9:38 PM, MONTMORY Alain wrote:

    Hi,



    I think you could use different type for mapper and combiner, they are not
    linked together but suppose :



    maper < KeyTypeA, ValuetypeB>

    reducer < KeyTypeC, ValuetypeD>



    in your *mapper* you have to emit :

    *public* *void* map(KeyTypeA, ValuetypeB)

    {

    ….**

    * *

    * *context.write(*KeyTypeC, ValuetypeD*);



    }



    hopes this help!



    regards


    Alain



    [@@THALES GROUP RESTRICTED@@]



    *De :* Stanley Xu
    *Envoyé :* mercredi 16 février 2011 12:02
    *À :* mapreduce-user@hadoop.apache.org
    *Objet :* Could we use different output Format for the Mapper and
    Combiner?



    Dear all,



    I am writing a map-reduce job today. Which I hope I could use different
    format for the Mapper and Combiner. I am using the Text as the format of the
    Mapper and MapWritable as the format of the format.



    But it looks the hadoop didn't support that yet?



    I have some code like the following:



    public class RawLogMapper extends Mapper<LongWritable, Text, Text, Text> {



    public class RawLogCombiner extends Reducer<Text, Text, Text, MapWritable>
    {



    job.setMapOutputKeyClass(Text.class);

    job.setMapOutputValueClass(Text.class);



    job.setOutputKeyClass(Text.class);

    job.setOutputValueClass(MapWritable.class);

    job.setOutputFormatClass(TextOutputFormat.class);



    But it failed and the logs told me that there are type mismatch. Is there
    anyway I could use different type for the VALUEOUT for the mapper and
    combiner?



    Thanks




    Best wishes,
    Xu Wenhao

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedFeb 16, '11 at 11:02a
activeFeb 17, '11 at 7:02a
posts8
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase