Grokbase Groups Pig user June 2011
FAQ
Hi all,

*I am receiving the following exception:*
org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught
error from UDF: org.apache.pig.piggybank.evaluation.math.DoubleMax [Caught
exception processing input row [null]]
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:263)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:269)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.io.IOException: Caught exception processing input row
[null]
at
org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:70)
at
org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:57)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
... 10 more
Caused by: java.lang.NullPointerException
... 13 more

*My Code:*
*FFW2 = Load 'final_free_w2.txt';
FFW3 = Load 'final_free_w3.txt';
FFW2_RankG_RankCate = FOREACH FFW2 GENERATE $0, $4, $3;
FFW3_RankG_RankCate = FOREACH FFW3 GENERATE $0, $4, $3;
FF23 = JOIN FFW2_RankG_RankCate BY $0, FFW3_RankG_RankCate BY $0;
FF23_Filtered = Foreach FF23 Generate $0,$2,$5;
STORE FF23_Filtered INTO 'FF23_Filtered.txt';

REGISTER
/home/training/Desktop/1pig/pig-0.7.0/contrib/piggybank/piggybank.jar
A = LOAD 'FF23_Filtered.txt' AS (appID, rank2, rank3);
B = FOREACH A GENERATE appID,
org.apache.pig.piggybank.evaluation.math.MAX((double)rank2, (double)rank3);
store B into 'FF23_FJM.txt'; *


--> Can any one pls let me know, what is the exact reason which is causing
above exception...
I also made sure that, the file* FF23_Filtered.txt* is not NULL.

---
Thanks & Regards,
Narayan.

Search Discussions

  • Jonathan Coveney at Jun 16, 2011 at 1:39 pm
    Hm, just to make sure, I ran this against trunk (to see if it's just a 0.7.0
    thing or not).

    A = LOAD 'test.txt'; --this is just a blank one line file
    B = FOREACH A GENERATE org.apache.pig.piggybank.evaluation.math.MAX(1,null);

    I also tested fedding it files from test.txt etc. It fails when there is a
    null value. The cast does not.

    2011/6/16 Lakshminarayana Motamarri <narayana.gupta123@gmail.com>
    Hi all,

    *I am receiving the following exception:*
    org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught
    error from UDF: org.apache.pig.piggybank.evaluation.math.DoubleMax [Caught
    exception processing input row [null]]
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:263)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:269)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.io.IOException: Caught exception processing input row
    [null]
    at
    org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:70)
    at
    org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:57)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
    ... 10 more
    Caused by: java.lang.NullPointerException
    ... 13 more

    *My Code:*
    *FFW2 = Load 'final_free_w2.txt';
    FFW3 = Load 'final_free_w3.txt';
    FFW2_RankG_RankCate = FOREACH FFW2 GENERATE $0, $4, $3;
    FFW3_RankG_RankCate = FOREACH FFW3 GENERATE $0, $4, $3;
    FF23 = JOIN FFW2_RankG_RankCate BY $0, FFW3_RankG_RankCate BY $0;
    FF23_Filtered = Foreach FF23 Generate $0,$2,$5;
    STORE FF23_Filtered INTO 'FF23_Filtered.txt';

    REGISTER
    /home/training/Desktop/1pig/pig-0.7.0/contrib/piggybank/piggybank.jar
    A = LOAD 'FF23_Filtered.txt' AS (appID, rank2, rank3);
    B = FOREACH A GENERATE appID,
    org.apache.pig.piggybank.evaluation.math.MAX((double)rank2, (double)rank3);
    store B into 'FF23_FJM.txt'; *


    --> Can any one pls let me know, what is the exact reason which is causing
    above exception...
    I also made sure that, the file* FF23_Filtered.txt* is not NULL.

    ---
    Thanks & Regards,
    Narayan.
  • Jonathan Coveney at Jun 16, 2011 at 1:45 pm
    Can you check if your rank2 or rank3 values are ever null? If they are,
    there are some ad hoc fixes which you can do until this is fixed (and it is
    easy to fix, just a question of deciding what the desired handling of null
    values should be). I would just do something like...

    A = LOAD 'FF23_Filtered.txt' AS (appID, rank2, rank3);
    B = FILTER A BY rank2 is null AND rank3 is null;
    C = FOREACH A GENERATE appID, ( rank2 is null ? rank3 : rank2) as rank2, (
    rank3 is null ? rank2 : rank3 ) as rank3;

    Obvoiusly you could tweak that for whatever you want to happen if a value is
    null.

    2011/6/16 Jonathan Coveney <jcoveney@gmail.com>
    Hm, just to make sure, I ran this against trunk (to see if it's just a
    0.7.0 thing or not).

    A = LOAD 'test.txt'; --this is just a blank one line file
    B = FOREACH A GENERATE
    org.apache.pig.piggybank.evaluation.math.MAX(1,null);

    I also tested fedding it files from test.txt etc. It fails when there is a
    null value. The cast does not.

    2011/6/16 Lakshminarayana Motamarri <narayana.gupta123@gmail.com>
    Hi all,

    *I am receiving the following exception:*
    org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught
    error from UDF: org.apache.pig.piggybank.evaluation.math.DoubleMax [Caught
    exception processing input row [null]]
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:263)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:269)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.io.IOException: Caught exception processing input row
    [null]
    at
    org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:70)
    at
    org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:57)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
    ... 10 more
    Caused by: java.lang.NullPointerException
    ... 13 more

    *My Code:*
    *FFW2 = Load 'final_free_w2.txt';
    FFW3 = Load 'final_free_w3.txt';
    FFW2_RankG_RankCate = FOREACH FFW2 GENERATE $0, $4, $3;
    FFW3_RankG_RankCate = FOREACH FFW3 GENERATE $0, $4, $3;
    FF23 = JOIN FFW2_RankG_RankCate BY $0, FFW3_RankG_RankCate BY $0;
    FF23_Filtered = Foreach FF23 Generate $0,$2,$5;
    STORE FF23_Filtered INTO 'FF23_Filtered.txt';

    REGISTER
    /home/training/Desktop/1pig/pig-0.7.0/contrib/piggybank/piggybank.jar
    A = LOAD 'FF23_Filtered.txt' AS (appID, rank2, rank3);
    B = FOREACH A GENERATE appID,
    org.apache.pig.piggybank.evaluation.math.MAX((double)rank2,
    (double)rank3);
    store B into 'FF23_FJM.txt'; *


    --> Can any one pls let me know, what is the exact reason which is causing
    above exception...
    I also made sure that, the file* FF23_Filtered.txt* is not NULL.

    ---
    Thanks & Regards,
    Narayan.
  • Daniel Dai at Jun 16, 2011 at 6:32 pm
    Jonathan is right. math.MAX does not handle null input. Check for null
    before feeding into MAX is necessary.

    Daniel
    On 06/16/2011 06:45 AM, Jonathan Coveney wrote:
    Can you check if your rank2 or rank3 values are ever null? If they are,
    there are some ad hoc fixes which you can do until this is fixed (and it is
    easy to fix, just a question of deciding what the desired handling of null
    values should be). I would just do something like...

    A = LOAD 'FF23_Filtered.txt' AS (appID, rank2, rank3);
    B = FILTER A BY rank2 is null AND rank3 is null;
    C = FOREACH A GENERATE appID, ( rank2 is null ? rank3 : rank2) as rank2, (
    rank3 is null ? rank2 : rank3 ) as rank3;

    Obvoiusly you could tweak that for whatever you want to happen if a value is
    null.

    2011/6/16 Jonathan Coveney<jcoveney@gmail.com>
    Hm, just to make sure, I ran this against trunk (to see if it's just a
    0.7.0 thing or not).

    A = LOAD 'test.txt'; --this is just a blank one line file
    B = FOREACH A GENERATE
    org.apache.pig.piggybank.evaluation.math.MAX(1,null);

    I also tested fedding it files from test.txt etc. It fails when there is a
    null value. The cast does not.

    2011/6/16 Lakshminarayana Motamarri<narayana.gupta123@gmail.com>
    Hi all,

    *I am receiving the following exception:*
    org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught
    error from UDF: org.apache.pig.piggybank.evaluation.math.DoubleMax [Caught
    exception processing input row [null]]
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:263)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:269)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.io.IOException: Caught exception processing input row
    [null]
    at
    org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:70)
    at
    org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:57)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
    ... 10 more
    Caused by: java.lang.NullPointerException
    ... 13 more

    *My Code:*
    *FFW2 = Load 'final_free_w2.txt';
    FFW3 = Load 'final_free_w3.txt';
    FFW2_RankG_RankCate = FOREACH FFW2 GENERATE $0, $4, $3;
    FFW3_RankG_RankCate = FOREACH FFW3 GENERATE $0, $4, $3;
    FF23 = JOIN FFW2_RankG_RankCate BY $0, FFW3_RankG_RankCate BY $0;
    FF23_Filtered = Foreach FF23 Generate $0,$2,$5;
    STORE FF23_Filtered INTO 'FF23_Filtered.txt';

    REGISTER
    /home/training/Desktop/1pig/pig-0.7.0/contrib/piggybank/piggybank.jar
    A = LOAD 'FF23_Filtered.txt' AS (appID, rank2, rank3);
    B = FOREACH A GENERATE appID,
    org.apache.pig.piggybank.evaluation.math.MAX((double)rank2,
    (double)rank3);
    store B into 'FF23_FJM.txt'; *


    --> Can any one pls let me know, what is the exact reason which is causing
    above exception...
    I also made sure that, the file* FF23_Filtered.txt* is not NULL.

    ---
    Thanks& Regards,
    Narayan.
  • Lakshminarayana Motamarri at Jun 17, 2011 at 3:15 am
    Hi all,

    Thanks Jonathan and Daniel for prompt responses..

    Based on ur suggestions, I tried as following...

    * Code:*

    REGISTER
    /home/training/Desktop/1pig/pig-0.7.0/contrib/piggybank/piggybank.jar
    *
    // all 3 combinations of A, are followed by four combinations of B:*
    * A = LOAD 'FF23_Filtered1.txt' AS (appID: float, rankW2: float, rankW3:
    float);
    A = LOAD 'FF23_Filtered1.txt' AS (appID: int, rankW2: int, rankW3: int);
    A = LOAD 'FF23_Filtered1.txt' AS (appID, rankW2, rankW3);
    *
    *B = FOREACH A GENERATE appID,
    org.apache.pig.piggybank.evaluation.math.MAX((double)rankW2,
    (double)rankW3); **
    store B into 'FF23_FJM.txt'; **//received null pointer
    exception.**
    **
    B = FOREACH A GENERATE appID,
    org.apache.pig.piggybank.evaluation.math.MAX(((double)rankW2 is null ?
    (double)rankW3 : (double)rankW2), ((double)rankW3 is null ? (double)rankW2 :
    (double)rankW3));
    store B into 'FF23_FJM.txt'; **//received nullpointer
    exception.*
    *
    B = FOREACH A GENERATE appID,
    org.apache.pig.piggybank.evaluation.math.MAX(((double)rankW2 is null ?
    (double)rankW3 : (double)rankW2) AS (double)rankW2, ((double)rankW3 is null
    ? (double)rankW2 : (double)rankW3) AS (double)rankW3)); **// received
    invalid alias error**


    B = FOREACH A GENERATE appID,
    org.apache.pig.piggybank.evaluation.math.MAX((rankW2 is null ? rankW3 :
    rankW2) AS (double)rankW2, (rankW3 is null ? rankW2 : rankW3) AS
    (double)rankW3)); **
    **//invalid alias**

    -> As mentioned above, in all 12 combinations of the trails, I got the
    corresponding exceptions, as mentioned with B's... Please advise, if I
    missed some thing...

    **the details of both exceptions are:**
    1) org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught
    error from UDF: org.apache.pig.piggybank.evaluation.math.DoubleMax [Caught
    exception processing input row [null]]
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:263)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:269)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.io.IOException: Caught exception processing input row
    [null]
    at
    org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:70)
    at
    org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:57)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
    ... 10 more
    Caused by: java.lang.NullPointerException
    ... 13 more

    2)---
    ERROR 1000: Error during parsing. Invalid alias: org in {appID:
    float,rankW2: float,rankW3: float}

    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during
    parsing. Invalid alias: org in {appID: float,rankW2: float,rankW3: float}
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1037)
    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
    at org.apache.pig.Main.main(Main.java:363)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid
    alias: org in {appID: float,rankW2: float,rankW3: float}
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:6731)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:6575)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:4682)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:4579)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:4525)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:4434)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:4360)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:4326)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:4252)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:4175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:4119)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:3528)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:2938)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1314)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:893)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:682)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031)
    ... 8 more
    *
    ---
    Thanks & Regards,
    Narayan.
    On Thu, Jun 16, 2011 at 11:30 AM, Daniel Dai wrote:

    Jonathan is right. math.MAX does not handle null input. Check for null
    before feeding into MAX is necessary.

    Daniel
  • Jonathan Coveney at Jun 17, 2011 at 1:55 pm
    First, when troubleshooting (and just in general), I prefer to break steps
    out into multiple lines instead of trying to be overly expressive in one
    line. Pig scripts in general aren't so large that breaking it out doesn't
    aid a lot in debugging, but this is of course personal style.

    I create a file thing.txt, whose contents are as follows:

    1,1
    1,2
    1,3
    1,4
    ,
    ,
    1,
    2,
    ,3
    4,
    6,6
    4,1
    2,3


    8,
    9
    9


    So there are some null lines, some lines with only one, the other, etc. Here
    is the script I ran. Caveat: I'm running pig trunk.

    register /home/jcoveney/pig/build/ivy/lib/Pig/antlr-runtime-3.2.jar;
    register /home/jcoveney/pig/contrib/piggybank/java/piggybank.jar;

    A = LOAD 'thing.txt' USING PigStorage(',') AS (rank1,rank2);
    B = FILTER A BY rank1 is not null OR rank2 is not null;
    C = FOREACH B GENERATE ( rank1 is null ? rank2 : rank1 ) as rank1, ( rank2
    is null ? rank1 : rank2 ) as rank2;
    D = FOREACH C GENERATE
    org.apache.pig.piggybank.evaluation.math.MAX(rank1,rank2);

    This worked fine.

    2011/6/16 Lakshminarayana Motamarri <narayana.gupta123@gmail.com>
    Hi all,

    Thanks Jonathan and Daniel for prompt responses..

    Based on ur suggestions, I tried as following...

    * Code:*

    REGISTER
    /home/training/Desktop/1pig/pig-0.7.0/contrib/piggybank/piggybank.jar
    *
    // all 3 combinations of A, are followed by four combinations of B:*
    * A = LOAD 'FF23_Filtered1.txt' AS (appID: float, rankW2: float,
    rankW3: float);
    A = LOAD 'FF23_Filtered1.txt' AS (appID: int, rankW2: int, rankW3:
    int);
    A = LOAD 'FF23_Filtered1.txt' AS (appID, rankW2, rankW3);
    *
    *B = FOREACH A GENERATE appID,
    org.apache.pig.piggybank.evaluation.math.MAX((double)rankW2,
    (double)rankW3); **
    store B into 'FF23_FJM.txt'; **//received null pointer
    exception.**
    **
    B = FOREACH A GENERATE appID,
    org.apache.pig.piggybank.evaluation.math.MAX(((double)rankW2 is null ?
    (double)rankW3 : (double)rankW2), ((double)rankW3 is null ? (double)rankW2 :
    (double)rankW3));
    store B into 'FF23_FJM.txt'; **//received nullpointer
    exception.*
    *
    B = FOREACH A GENERATE appID,
    org.apache.pig.piggybank.evaluation.math.MAX(((double)rankW2 is null ?
    (double)rankW3 : (double)rankW2) AS (double)rankW2, ((double)rankW3 is null
    ? (double)rankW2 : (double)rankW3) AS (double)rankW3)); **//
    received invalid alias error**


    B = FOREACH A GENERATE appID,
    org.apache.pig.piggybank.evaluation.math.MAX((rankW2 is null ? rankW3 :
    rankW2) AS (double)rankW2, (rankW3 is null ? rankW2 : rankW3) AS
    (double)rankW3)); **
    **//invalid alias**

    -> As mentioned above, in all 12 combinations of the trails, I got the
    corresponding exceptions, as mentioned with B's... Please advise, if I
    missed some thing...

    **the details of both exceptions are:**
    1) org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught
    error from UDF: org.apache.pig.piggybank.evaluation.math.DoubleMax [Caught
    exception processing input row [null]]

    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:263)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:269)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)
    Caused by: java.io.IOException: Caught exception processing input row
    [null]
    at
    org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:70)
    at
    org.apache.pig.piggybank.evaluation.math.DoubleMax.exec(DoubleMax.java:57)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
    ... 10 more
    Caused by: java.lang.NullPointerException
    ... 13 more

    2)---
    ERROR 1000: Error during parsing. Invalid alias: org in {appID:
    float,rankW2: float,rankW3: float}

    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error
    during parsing. Invalid alias: org in {appID: float,rankW2: float,rankW3:
    float}
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1037)
    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
    at
    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
    at org.apache.pig.Main.main(Main.java:363)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid
    alias: org in {appID: float,rankW2: float,rankW3: float}
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:6731)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:6575)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:4682)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:4579)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:4525)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:4434)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:4360)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:4326)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:4252)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:4175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:4119)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:3528)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:2938)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1314)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:893)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:682)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031)
    ... 8 more
    *
    ---
    Thanks & Regards,
    Narayan.

    On Thu, Jun 16, 2011 at 11:30 AM, Daniel Dai wrote:

    Jonathan is right. math.MAX does not handle null input. Check for null
    before feeding into MAX is necessary.

    Daniel
  • Lakshminarayana Motamarri at Jun 18, 2011 at 5:56 am
    Hi all,

    Thanks Jonathan, once again for ur response.

    First of all:
    1) what is *antlr-runtime-3.2.jar*
    I don't find in my PIG installation path: /*/*/pig/ivy/*

    2) Coming to the prev problem context of NULL:
    You are right.. it would have worked..
    Later I also realized that, not just my rank columns, but the initial ID
    column is also null in one of the case... i.e. the last line of the file..

    so I am suppose to handle even that case...
    i.e by *A2 = FILTER A BY appID is not null;*

    anyways it worked out great, got the results. thanks for ur help...

    Thanks & Regards,
    Narayan.
    On Fri, Jun 17, 2011 at 6:55 AM, Jonathan Coveney wrote:

    First, when troubleshooting (and just in general), I prefer to break steps
    out into multiple lines instead of trying to be overly expressive in one
    line. Pig scripts in general aren't so large that breaking it out doesn't
    aid a lot in debugging, but this is of course personal style.

    I create a file thing.txt, whose contents are as follows:

    1,1
    1,2
    1,3
    1,4
    ,
    ,
    1,
    2,
    ,3
    4,
    6,6
    4,1
    2,3


    8,
    9
    9


    So there are some null lines, some lines with only one, the other, etc.
    Here is the script I ran. Caveat: I'm running pig trunk.

    register /home/jcoveney/pig/build/ivy/lib/Pig/antlr-runtime-3.2.jar;
    register /home/jcoveney/pig/contrib/piggybank/java/piggybank.jar;

    A = LOAD 'thing.txt' USING PigStorage(',') AS (rank1,rank2);
    B = FILTER A BY rank1 is not null OR rank2 is not null;
    C = FOREACH B GENERATE ( rank1 is null ? rank2 : rank1 ) as rank1, ( rank2
    is null ? rank1 : rank2 ) as rank2;
    D = FOREACH C GENERATE
    org.apache.pig.piggybank.evaluation.math.MAX(rank1,rank2);

    This worked fine.
  • Jonathan Coveney at Jun 18, 2011 at 1:48 pm
    Ignore the antlr runtime thing, I simply forgot to remove it. It's a weird
    hack that was necessary on my system to get pig trunk withouthadoop to work.

    2011/6/18 Lakshminarayana Motamarri <narayana.gupta123@gmail.com>
    Hi all,

    Thanks Jonathan, once again for ur response.

    First of all:
    1) what is *antlr-runtime-3.2.jar*
    I don't find in my PIG installation path: /*/*/pig/ivy/*

    2) Coming to the prev problem context of NULL:
    You are right.. it would have worked..
    Later I also realized that, not just my rank columns, but the initial ID
    column is also null in one of the case... i.e. the last line of the file..

    so I am suppose to handle even that case...
    i.e by *A2 = FILTER A BY appID is not null;*

    anyways it worked out great, got the results. thanks for ur help...

    Thanks & Regards,
    Narayan.

    On Fri, Jun 17, 2011 at 6:55 AM, Jonathan Coveney wrote:

    First, when troubleshooting (and just in general), I prefer to break steps
    out into multiple lines instead of trying to be overly expressive in one
    line. Pig scripts in general aren't so large that breaking it out doesn't
    aid a lot in debugging, but this is of course personal style.

    I create a file thing.txt, whose contents are as follows:

    1,1
    1,2
    1,3
    1,4
    ,
    ,
    1,
    2,
    ,3
    4,
    6,6
    4,1
    2,3


    8,
    9
    9


    So there are some null lines, some lines with only one, the other, etc.
    Here is the script I ran. Caveat: I'm running pig trunk.

    register /home/jcoveney/pig/build/ivy/lib/Pig/antlr-runtime-3.2.jar;
    register /home/jcoveney/pig/contrib/piggybank/java/piggybank.jar;

    A = LOAD 'thing.txt' USING PigStorage(',') AS (rank1,rank2);
    B = FILTER A BY rank1 is not null OR rank2 is not null;
    C = FOREACH B GENERATE ( rank1 is null ? rank2 : rank1 ) as rank1, ( rank2
    is null ? rank1 : rank2 ) as rank2;
    D = FOREACH C GENERATE
    org.apache.pig.piggybank.evaluation.math.MAX(rank1,rank2);

    This worked fine.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 16, '11 at 10:36a
activeJun 18, '11 at 1:48p
posts8
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase