FAQ
Hey all, 4 hours of true torture, hope you will help me (the task is easy)

up = LOAD '/up.log' USING PigStorage(',') AS (upEpoch:long, upInstance:chararray, upKeyword:chararray);
tx = LOAD '/tx.log' USING PigStorage(',') AS (txEpoch:long, txInstance:chararray, txKeyword:chararray);
recordGroup = COGROUP up BY (upInstance), tx BY (txInstance);

recordExtract = FOREACH recordGroup {
recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
recordLimited = LIMIT recordFiltered 1;
GENERATE
recordLimited
;
}

How do I point PIG to my tx input with txEpoch field (from recordGroup)? tx::txEpoch, tx.txEpoch, txEpoch, recordGroup::tx.txEpoch doesn't work...

Always the same, with tx::txEpoch - "ERROR 1000: Error during parsing. Invalid alias: tx::txEpoch in {upEpoch: long,upInstance: chararray,upKeyword: chararray}"

Or with tx.txEpoch (I know it takes tx = LOAD as a source, but I need recordGroup::tx.txEpoch!) - "ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (1314835200050,99,sam), 2nd :(1314835200079,99,flin)"

Search Discussions

  • Xiaomeng Wan at Sep 13, 2011 at 8:26 pm
    tx is a bag, you can not use it in that way unless it is a scalar. Not
    sure about the logic here, but looks like you should use a join rather
    than a cogroup

    recordGroup = join up BY upInstance, tx BY txInstance;
    recordFiltered = FILTER recordGroup BY upEpoch < txEpoch;

    Shawn
    On Tue, Sep 13, 2011 at 11:54 AM, Marek Miglinski wrote:
    Hey all, 4 hours of true torture, hope you will help me (the task is easy)

    up = LOAD '/up.log' USING PigStorage(',') AS (upEpoch:long, upInstance:chararray,  upKeyword:chararray);
    tx = LOAD '/tx.log' USING PigStorage(',') AS (txEpoch:long, txInstance:chararray, txKeyword:chararray);
    recordGroup = COGROUP up BY (upInstance), tx BY (txInstance);

    recordExtract = FOREACH recordGroup {
    recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
    recordLimited = LIMIT recordFiltered 1;
    GENERATE
    recordLimited
    ;
    }

    How do I point PIG to my tx input with txEpoch field (from recordGroup)? tx::txEpoch, tx.txEpoch, txEpoch, recordGroup::tx.txEpoch doesn't work...

    Always the same, with tx::txEpoch - "ERROR 1000: Error during parsing. Invalid alias: tx::txEpoch in {upEpoch: long,upInstance: chararray,upKeyword: chararray}"

    Or with tx.txEpoch (I know it takes tx = LOAD as a source, but I need recordGroup::tx.txEpoch!) - "ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (1314835200050,99,sam), 2nd :(1314835200079,99,flin)"
  • Marek Miglinski at Sep 14, 2011 at 6:52 am
    Thanks for your reply,

    I can't use JOIN and I will explain why. So here I have data...
    UP:
    9,user1,sam1
    5,user1,sam2
    3,user1,sam3
    9,user2,flin

    TX:
    7,user1,wow
    9,user2,pop

    I need to join tx with up by user and closest epoch (first field). If I do JOIN I will get (JOIN BY user):
    7,user1,wow,9,user1,sam1
    7,user1,wow,5,user1,sam2
    7,user1,wow,3,user1,sam3
    9,user2,pop,9,user2,flin

    Now, I can't filter the records properly in FOREACH, because I don't know if current input row is what I need, ok?

    So I do COGROUP and get:
    {(7,user1,wow)}, {(9,user1,sam1), (5,user1,sam2), (3,user1,sam2)}
    {(9,user2,pop)}, {(9,user2,flin)}

    Now I can FILTER, ORDER and LIMIT through FOREACH because I have all data in one row:

    recordExtract = FOREACH recordGroup {
    recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
    recordOrdered = ORDER recordFiltered by upEpoch DESC;
    recordLimited = LIMIT recordOrdered 1;
    GENERATE
    recordLimited
    ;
    }

    So if I get tx.txEpoch properly I will get the desired:
    7,user1,wow,5,user1,sam2 (txEpoch 5 is closest to upEpoch 7)
    9,user2,pop,9,user2,flin (txEpoch 9 is closest to upEpoch 9)


    Do you have any clues?

    ________________________________________
    From: Xiaomeng Wan [[email protected]]
    Sent: Tuesday, September 13, 2011 11:26 PM
    To: [email protected]
    Subject: Re: Dumb question guys

    tx is a bag, you can not use it in that way unless it is a scalar. Not
    sure about the logic here, but looks like you should use a join rather
    than a cogroup

    recordGroup = join up BY upInstance, tx BY txInstance;
    recordFiltered = FILTER recordGroup BY upEpoch < txEpoch;

    Shawn
    On Tue, Sep 13, 2011 at 11:54 AM, Marek Miglinski wrote:
    Hey all, 4 hours of true torture, hope you will help me (the task is easy)

    up = LOAD '/up.log' USING PigStorage(',') AS (upEpoch:long, upInstance:chararray, upKeyword:chararray);
    tx = LOAD '/tx.log' USING PigStorage(',') AS (txEpoch:long, txInstance:chararray, txKeyword:chararray);
    recordGroup = COGROUP up BY (upInstance), tx BY (txInstance);

    recordExtract = FOREACH recordGroup {
    recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
    recordLimited = LIMIT recordFiltered 1;
    GENERATE
    recordLimited
    ;
    }

    How do I point PIG to my tx input with txEpoch field (from recordGroup)? tx::txEpoch, tx.txEpoch, txEpoch, recordGroup::tx.txEpoch doesn't work...

    Always the same, with tx::txEpoch - "ERROR 1000: Error during parsing. Invalid alias: tx::txEpoch in {upEpoch: long,upInstance: chararray,upKeyword: chararray}"

    Or with tx.txEpoch (I know it takes tx = LOAD as a source, but I need recordGroup::tx.txEpoch!) - "ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (1314835200050,99,sam), 2nd :(1314835200079,99,flin)"
  • Marek Miglinski at Sep 14, 2011 at 10:57 am
    Ok... I've done it :P Thanks for your help, done it through JOIN with the help of the new key field (that consist of txUser and txEpoch) that I use later to identify unique fields for GROUPing.


    Sincerely,
    Marek M.
    ________________________________________
    From: Marek Miglinski [[email protected]]
    Sent: Wednesday, September 14, 2011 9:52 AM
    To: [email protected]
    Subject: RE: Dumb question guys

    Thanks for your reply,

    I can't use JOIN and I will explain why. So here I have data...
    UP:
    9,user1,sam1
    5,user1,sam2
    3,user1,sam3
    9,user2,flin

    TX:
    7,user1,wow
    9,user2,pop

    I need to join tx with up by user and closest epoch (first field). If I do JOIN I will get (JOIN BY user):
    7,user1,wow,9,user1,sam1
    7,user1,wow,5,user1,sam2
    7,user1,wow,3,user1,sam3
    9,user2,pop,9,user2,flin

    Now, I can't filter the records properly in FOREACH, because I don't know if current input row is what I need, ok?

    So I do COGROUP and get:
    {(7,user1,wow)}, {(9,user1,sam1), (5,user1,sam2), (3,user1,sam2)}
    {(9,user2,pop)}, {(9,user2,flin)}

    Now I can FILTER, ORDER and LIMIT through FOREACH because I have all data in one row:

    recordExtract = FOREACH recordGroup {
    recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
    recordOrdered = ORDER recordFiltered by upEpoch DESC;
    recordLimited = LIMIT recordOrdered 1;
    GENERATE
    recordLimited
    ;
    }

    So if I get tx.txEpoch properly I will get the desired:
    7,user1,wow,5,user1,sam2 (txEpoch 5 is closest to upEpoch 7)
    9,user2,pop,9,user2,flin (txEpoch 9 is closest to upEpoch 9)


    Do you have any clues?

    ________________________________________
    From: Xiaomeng Wan [[email protected]]
    Sent: Tuesday, September 13, 2011 11:26 PM
    To: [email protected]
    Subject: Re: Dumb question guys

    tx is a bag, you can not use it in that way unless it is a scalar. Not
    sure about the logic here, but looks like you should use a join rather
    than a cogroup

    recordGroup = join up BY upInstance, tx BY txInstance;
    recordFiltered = FILTER recordGroup BY upEpoch < txEpoch;

    Shawn
    On Tue, Sep 13, 2011 at 11:54 AM, Marek Miglinski wrote:
    Hey all, 4 hours of true torture, hope you will help me (the task is easy)

    up = LOAD '/up.log' USING PigStorage(',') AS (upEpoch:long, upInstance:chararray, upKeyword:chararray);
    tx = LOAD '/tx.log' USING PigStorage(',') AS (txEpoch:long, txInstance:chararray, txKeyword:chararray);
    recordGroup = COGROUP up BY (upInstance), tx BY (txInstance);

    recordExtract = FOREACH recordGroup {
    recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
    recordLimited = LIMIT recordFiltered 1;
    GENERATE
    recordLimited
    ;
    }

    How do I point PIG to my tx input with txEpoch field (from recordGroup)? tx::txEpoch, tx.txEpoch, txEpoch, recordGroup::tx.txEpoch doesn't work...

    Always the same, with tx::txEpoch - "ERROR 1000: Error during parsing. Invalid alias: tx::txEpoch in {upEpoch: long,upInstance: chararray,upKeyword: chararray}"

    Or with tx.txEpoch (I know it takes tx = LOAD as a source, but I need recordGroup::tx.txEpoch!) - "ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (1314835200050,99,sam), 2nd :(1314835200079,99,flin)"

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedSep 13, '11 at 6:32p
activeSep 14, '11 at 10:57a
posts4
users2
websitepig.apache.org

2 users in discussion

Marek Miglinski: 3 posts Xiaomeng Wan: 1 post

People

Translate

site design / logo © 2023 Grokbase