FAQ
Hi,

I am very new with hadoop and I'm hoping someone can help me do a two column
sort.
For my input, I have lines with 3 colunns. I would like to sort the first
column by string ascending
and the second column by integer descending.
The listing below shows an example input and expected output.

The approach I have taken is to use the
JobConf.setKeyFieldComparatorOptions.
From reading various resources, putting this setting:
conf.setKeyFieldComparatorOptions("-k1 -k2nr")
conf.set("map.output.key.field.separator", " ");

should do what I want, sort the first column by string, and the second
column
by number descending. I use a space character to separte the 2 key pieces.

But it doesn't seem to work. The actual output I get is also shown below.
Any ideas on what I am doing wrong? The first column seems to be sorted
correctly
but some of the second columns values are not correct.
For example, these two rows should be reverse.
carrot<adog 1 value_c1
carrot<adog 3 value_c3

Any help is greatly appreciated.

David



/*sample input*/
apple<adog 3 value_a3
apple<adog 1 value_a1
apple<acat 2 value_a2
apple<abird 12 value_a2
carrot<adog 1 value_c1
carrot<adog 3 value_c3
carrot<abird 2 value_c2
banana<acat 1 value_b1
banana<abird 3 value_b3
banana<adog 2 value_b2
banana<adog 11 value_b11
banana<abird 17 value_b17
banana<acat 4 value_b4

/*expected output*/
apple<abird 12 value_a2
apple<acat 2 value_a2
apple<adog 3 value_a3
apple<adog 1 value_a1
banana<abird 17 value_b17
banana<abird 3 value_b3
banana<acat 4 value_b4
banana<acat 1 value_b1
banana<adog 11 value_b11
banana<adog 2 value_b2
carrot<abird 2 value_c2
carrot<adog 3 value_c3
carrot<adog 1 value_c1

/*actual output*/
apple<abird 12 value_a2
apple<acat 2 value_a2
apple<adog 1 value_a1
apple<adog 3 value_a3
banana<abird 17 value_b17
banana<abird 3 value_b3
banana<acat 1 value_b1
banana<acat 4 value_b4
banana<adog 11 value_b11
banana<adog 2 value_b2
carrot<abird 2 value_c2
carrot<adog 1 value_c1
carrot<adog 3 value_c3

Search Discussions

  • David_ca at Jul 16, 2009 at 6:58 pm
    Hi,

    I am very new with hadoop and I'm hoping someone can help me do a two column
    sort.
    For my input, I have lines with 3 colunns. I would like to sort the first
    column by string ascending
    and the second column by integer descending.
    The listing below shows an example input and expected output.

    The approach I have taken is to use the JobConf.
    setKeyFieldComparatorOptions.
    From reading various resources, putting this setting:
    conf.setKeyFieldComparatorOptions("-k1 -k2nr")
    conf.set("map.output.key.field.separator", " ");

    should do what I want, sort the first column by string, and the second
    column
    by number descending. I use a space character to separte the 2 key pieces.

    But it doesn't seem to work. The actual output I get is also shown below.
    Any ideas on what I am doing wrong? The first column seems to be sorted
    correctly
    but some of the second columns values are not correct.
    For example, these two rows should be reverse.
    carrot<adog 1 value_c1
    carrot<adog 3 value_c3

    Any help is greatly appreciated.

    David



    /*sample input*/
    apple<adog 3 value_a3
    apple<adog 1 value_a1
    apple<acat 2 value_a2
    apple<abird 12 value_a2
    carrot<adog 1 value_c1
    carrot<adog 3 value_c3
    carrot<abird 2 value_c2
    banana<acat 1 value_b1
    banana<abird 3 value_b3
    banana<adog 2 value_b2
    banana<adog 11 value_b11
    banana<abird 17 value_b17
    banana<acat 4 value_b4

    /*expected output*/
    apple<abird 12 value_a2
    apple<acat 2 value_a2
    apple<adog 3 value_a3
    apple<adog 1 value_a1
    banana<abird 17 value_b17
    banana<abird 3 value_b3
    banana<acat 4 value_b4
    banana<acat 1 value_b1
    banana<adog 11 value_b11
    banana<adog 2 value_b2
    carrot<abird 2 value_c2
    carrot<adog 3 value_c3
    carrot<adog 1 value_c1

    /*actual output*/
    apple<abird 12 value_a2
    apple<acat 2 value_a2
    apple<adog 1 value_a1
    apple<adog 3 value_a3
    banana<abird 17 value_b17
    banana<abird 3 value_b3
    banana<acat 1 value_b1
    banana<acat 4 value_b4
    banana<adog 11 value_b11
    banana<adog 2 value_b2
    carrot<abird 2 value_c2
    carrot<adog 1 value_c1
    carrot<adog 3 value_c3
  • David_ca at Jul 17, 2009 at 5:58 am
    Hi,

    I am very new with hadoop and I'm hoping someone can help me do a two column
    sort.
    For my input, I have lines with 3 columns. I would like to sort the first
    column by string ascending
    and the second column by integer descending.
    The listing below shows an example input and expected output.

    The approach I have taken is to use the
    JobConf.setKeyFieldComparatorOptions.
    From reading various resources, putting this setting:
    conf.setKeyFieldComparatorOptions("-k1 -k2nr")
    conf.set("map.output.key.field.separator", " ");

    should do what I want, sort the first column by string, and the second
    column
    by number descending. I use a space character to separte the 2 key pieces.

    But it doesn't seem to work. The actual output I get is also shown below.
    Any ideas on what I am doing wrong? The first column seems to be sorted
    correctly
    but some of the second columns values are not correct.
    For example, these two rows should be reverse.
    carrot<adog 1 value_c1
    carrot<adog 3 value_c3

    Any help is greatly appreciated.

    David



    /*sample input*/
    apple<adog 3 value_a3
    apple<adog 1 value_a1
    apple<acat 2 value_a2
    apple<abird 12 value_a2
    carrot<adog 1 value_c1
    carrot<adog 3 value_c3
    carrot<abird 2 value_c2
    banana<acat 1 value_b1
    banana<abird 3 value_b3
    banana<adog 2 value_b2
    banana<adog 11 value_b11
    banana<abird 17 value_b17
    banana<acat 4 value_b4

    /*expected output*/
    apple<abird 12 value_a2
    apple<acat 2 value_a2
    apple<adog 3 value_a3
    apple<adog 1 value_a1
    banana<abird 17 value_b17
    banana<abird 3 value_b3
    banana<acat 4 value_b4
    banana<acat 1 value_b1
    banana<adog 11 value_b11
    banana<adog 2 value_b2
    carrot<abird 2 value_c2
    carrot<adog 3 value_c3
    carrot<adog 1 value_c1

    /*actual output*/
    apple<abird 12 value_a2
    apple<acat 2 value_a2
    apple<adog 1 value_a1
    apple<adog 3 value_a3
    banana<abird 17 value_b17
    banana<abird 3 value_b3
    banana<acat 1 value_b1
    banana<acat 4 value_b4
    banana<adog 11 value_b11
    banana<adog 2 value_b2
    carrot<abird 2 value_c2
    carrot<adog 1 value_c1
    carrot<adog 3 value_c3
  • Jason hadoop at Jul 17, 2009 at 6:05 am
    In the example code for Pro Hadoop there are some shims for the
    fieldcomparator classes, that let you log what is going on in the
    partitioner.

    Also it is very useful if cumbersome to step through that in the debugger.
    On Thu, Jul 16, 2009 at 10:59 PM, David_ca wrote:

    Hi,

    I am very new with hadoop and I'm hoping someone can help me do a two
    column
    sort.
    For my input, I have lines with 3 columns. I would like to sort the first
    column by string ascending
    and the second column by integer descending.
    The listing below shows an example input and expected output.

    The approach I have taken is to use the
    JobConf.setKeyFieldComparatorOptions.
    From reading various resources, putting this setting:
    conf.setKeyFieldComparatorOptions("-k1 -k2nr")
    conf.set("map.output.key.field.separator", " ");

    should do what I want, sort the first column by string, and the second
    column
    by number descending. I use a space character to separte the 2 key pieces.

    But it doesn't seem to work. The actual output I get is also shown below.
    Any ideas on what I am doing wrong? The first column seems to be sorted
    correctly
    but some of the second columns values are not correct.
    For example, these two rows should be reverse.
    carrot<adog 1 value_c1
    carrot<adog 3 value_c3

    Any help is greatly appreciated.

    David



    /*sample input*/
    apple<adog 3 value_a3
    apple<adog 1 value_a1
    apple<acat 2 value_a2
    apple<abird 12 value_a2
    carrot<adog 1 value_c1
    carrot<adog 3 value_c3
    carrot<abird 2 value_c2
    banana<acat 1 value_b1
    banana<abird 3 value_b3
    banana<adog 2 value_b2
    banana<adog 11 value_b11
    banana<abird 17 value_b17
    banana<acat 4 value_b4

    /*expected output*/
    apple<abird 12 value_a2
    apple<acat 2 value_a2
    apple<adog 3 value_a3
    apple<adog 1 value_a1
    banana<abird 17 value_b17
    banana<abird 3 value_b3
    banana<acat 4 value_b4
    banana<acat 1 value_b1
    banana<adog 11 value_b11
    banana<adog 2 value_b2
    carrot<abird 2 value_c2
    carrot<adog 3 value_c3
    carrot<adog 1 value_c1

    /*actual output*/
    apple<abird 12 value_a2
    apple<acat 2 value_a2
    apple<adog 1 value_a1
    apple<adog 3 value_a3
    banana<abird 17 value_b17
    banana<abird 3 value_b3
    banana<acat 1 value_b1
    banana<acat 4 value_b4
    banana<adog 11 value_b11
    banana<adog 2 value_b2
    carrot<abird 2 value_c2
    carrot<adog 1 value_c1
    carrot<adog 3 value_c3


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Jason hadoop at Jul 17, 2009 at 6:05 am
    that let you log what is going on in the field comparator or field
    partitioner.
    On Thu, Jul 16, 2009 at 11:05 PM, jason hadoop wrote:

    In the example code for Pro Hadoop there are some shims for the
    fieldcomparator classes, that let you log what is going on in the
    partitioner.

    Also it is very useful if cumbersome to step through that in the debugger.

    On Thu, Jul 16, 2009 at 10:59 PM, David_ca wrote:

    Hi,

    I am very new with hadoop and I'm hoping someone can help me do a two
    column
    sort.
    For my input, I have lines with 3 columns. I would like to sort the first
    column by string ascending
    and the second column by integer descending.
    The listing below shows an example input and expected output.

    The approach I have taken is to use the
    JobConf.setKeyFieldComparatorOptions.
    From reading various resources, putting this setting:
    conf.setKeyFieldComparatorOptions("-k1 -k2nr")
    conf.set("map.output.key.field.separator", " ");

    should do what I want, sort the first column by string, and the second
    column
    by number descending. I use a space character to separte the 2 key pieces.

    But it doesn't seem to work. The actual output I get is also shown below.
    Any ideas on what I am doing wrong? The first column seems to be sorted
    correctly
    but some of the second columns values are not correct.
    For example, these two rows should be reverse.
    carrot<adog 1 value_c1
    carrot<adog 3 value_c3

    Any help is greatly appreciated.

    David



    /*sample input*/
    apple<adog 3 value_a3
    apple<adog 1 value_a1
    apple<acat 2 value_a2
    apple<abird 12 value_a2
    carrot<adog 1 value_c1
    carrot<adog 3 value_c3
    carrot<abird 2 value_c2
    banana<acat 1 value_b1
    banana<abird 3 value_b3
    banana<adog 2 value_b2
    banana<adog 11 value_b11
    banana<abird 17 value_b17
    banana<acat 4 value_b4

    /*expected output*/
    apple<abird 12 value_a2
    apple<acat 2 value_a2
    apple<adog 3 value_a3
    apple<adog 1 value_a1
    banana<abird 17 value_b17
    banana<abird 3 value_b3
    banana<acat 4 value_b4
    banana<acat 1 value_b1
    banana<adog 11 value_b11
    banana<adog 2 value_b2
    carrot<abird 2 value_c2
    carrot<adog 3 value_c3
    carrot<adog 1 value_c1

    /*actual output*/
    apple<abird 12 value_a2
    apple<acat 2 value_a2
    apple<adog 1 value_a1
    apple<adog 3 value_a3
    banana<abird 17 value_b17
    banana<abird 3 value_b3
    banana<acat 1 value_b1
    banana<acat 4 value_b4
    banana<adog 11 value_b11
    banana<adog 2 value_b2
    carrot<abird 2 value_c2
    carrot<adog 1 value_c1
    carrot<adog 3 value_c3


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 16, '09 at 5:42p
activeJul 17, '09 at 6:05a
posts5
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

David_ca: 3 posts Jason hadoop: 2 posts

People

Translate

site design / logo © 2022 Grokbase