Grokbase Groups Pig dev August 2010
FAQ
IsEmpty returns the wrong value after using LIMIT
-------------------------------------------------

Key: PIG-1543
URL: https://issues.apache.org/jira/browse/PIG-1543
Project: Pig
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Justin Hu


1. Two input files:

1a: limit_empty.input_a
1
1
1

1b: limit_empty.input_b
2
2

2.
The pig script: limit_empty.pig

-- A contains only 1's & B contains only 2's
A = load 'limit_empty.input_a' as (a1:int);
B = load 'limit_empty.input_a' as (b1:int);

C =COGROUP A by a1, B by b1;
D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B);
store D into 'limit_empty.output/d';
-- After the script done, we see the right results:
-- {(1),(1),(1)} {} 1 0 3 0
-- {} {(2),(2)} 0 1 0 2

C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim), COUNT(Blim);
store D1 into 'limit_empty.output/d1';
-- After the script done, we see the unexpected results:
-- {(1)} {} 1 1 1 0
-- {} {(2)} 1 1 0 1

dump D;
dump D1;

3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:

The major one:

IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns correctly in limit_empty.output/d/*.

The difference is that one has been applied with "LIMIT" before using IsEmpty().

The minor one:

The redirected output only contains the first dump:

({(1),(1),(1)},{},1,0,3L,0L)
({},{(2),(2)},0,1,0L,2L)

We expect two more lines like:
({(1)},{},1,1,1L,0L)
({},{(2)},1,1,0L,1L)

Besides, there is error says:

[main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.pig.data.Tuple


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Olga Natkovich (JIRA) at Aug 16, 2010 at 10:33 pm
    [ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-1543:
    --------------------------------

    Fix Version/s: 0.8.0
    IsEmpty returns the wrong value after using LIMIT
    -------------------------------------------------

    Key: PIG-1543
    URL: https://issues.apache.org/jira/browse/PIG-1543
    Project: Pig
    Issue Type: Bug
    Affects Versions: 0.7.0
    Reporter: Justin Hu
    Fix For: 0.8.0


    1. Two input files:
    1a: limit_empty.input_a
    1
    1
    1
    1b: limit_empty.input_b
    2
    2
    2.
    The pig script: limit_empty.pig
    -- A contains only 1's & B contains only 2's
    A = load 'limit_empty.input_a' as (a1:int);
    B = load 'limit_empty.input_a' as (b1:int);
    C =COGROUP A by a1, B by b1;
    D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B);
    store D into 'limit_empty.output/d';
    -- After the script done, we see the right results:
    -- {(1),(1),(1)} {} 1 0 3 0
    -- {} {(2),(2)} 0 1 0 2
    C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
    D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim), COUNT(Blim);
    store D1 into 'limit_empty.output/d1';
    -- After the script done, we see the unexpected results:
    -- {(1)} {} 1 1 1 0
    -- {} {(2)} 1 1 0 1
    dump D;
    dump D1;
    3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
    The major one:
    IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns correctly in limit_empty.output/d/*.
    The difference is that one has been applied with "LIMIT" before using IsEmpty().
    The minor one:
    The redirected output only contains the first dump:
    ({(1),(1),(1)},{},1,0,3L,0L)
    ({},{(2),(2)},0,1,0L,2L)
    We expect two more lines like:
    ({(1)},{},1,1,1L,0L)
    ({},{(2)},1,1,0L,1L)
    Besides, there is error says:
    [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.pig.data.Tuple
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Aug 27, 2010 at 8:31 pm
    [ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich reassigned PIG-1543:
    -----------------------------------

    Assignee: Daniel Dai

    Daniel can you check if this is related to limit optimizer and if it was addressed with new optimizer. (This can be done post branch since it is a bug split.)
    IsEmpty returns the wrong value after using LIMIT
    -------------------------------------------------

    Key: PIG-1543
    URL: https://issues.apache.org/jira/browse/PIG-1543
    Project: Pig
    Issue Type: Bug
    Affects Versions: 0.7.0
    Reporter: Justin Hu
    Assignee: Daniel Dai
    Fix For: 0.8.0


    1. Two input files:
    1a: limit_empty.input_a
    1
    1
    1
    1b: limit_empty.input_b
    2
    2
    2.
    The pig script: limit_empty.pig
    -- A contains only 1's & B contains only 2's
    A = load 'limit_empty.input_a' as (a1:int);
    B = load 'limit_empty.input_a' as (b1:int);
    C =COGROUP A by a1, B by b1;
    D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B);
    store D into 'limit_empty.output/d';
    -- After the script done, we see the right results:
    -- {(1),(1),(1)} {} 1 0 3 0
    -- {} {(2),(2)} 0 1 0 2
    C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
    D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim), COUNT(Blim);
    store D1 into 'limit_empty.output/d1';
    -- After the script done, we see the unexpected results:
    -- {(1)} {} 1 1 1 0
    -- {} {(2)} 1 1 0 1
    dump D;
    dump D1;
    3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
    The major one:
    IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns correctly in limit_empty.output/d/*.
    The difference is that one has been applied with "LIMIT" before using IsEmpty().
    The minor one:
    The redirected output only contains the first dump:
    ({(1),(1),(1)},{},1,0,3L,0L)
    ({},{(2),(2)},0,1,0L,2L)
    We expect two more lines like:
    ({(1)},{},1,1,1L,0L)
    ({},{(2)},1,1,0L,1L)
    Besides, there is error says:
    [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.pig.data.Tuple
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Daniel Dai (JIRA) at Aug 30, 2010 at 7:07 pm
    [ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904291#action_12904291 ]

    Daniel Dai commented on PIG-1543:
    ---------------------------------

    This seems not a logical layer problem and new optimizer does not address it. It might related to [PIG-747|https://issues.apache.org/jira/browse/PIG-747], need further investigation.
    IsEmpty returns the wrong value after using LIMIT
    -------------------------------------------------

    Key: PIG-1543
    URL: https://issues.apache.org/jira/browse/PIG-1543
    Project: Pig
    Issue Type: Bug
    Affects Versions: 0.7.0
    Reporter: Justin Hu
    Assignee: Daniel Dai
    Fix For: 0.8.0


    1. Two input files:
    1a: limit_empty.input_a
    1
    1
    1
    1b: limit_empty.input_b
    2
    2
    2.
    The pig script: limit_empty.pig
    -- A contains only 1's & B contains only 2's
    A = load 'limit_empty.input_a' as (a1:int);
    B = load 'limit_empty.input_a' as (b1:int);
    C =COGROUP A by a1, B by b1;
    D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B);
    store D into 'limit_empty.output/d';
    -- After the script done, we see the right results:
    -- {(1),(1),(1)} {} 1 0 3 0
    -- {} {(2),(2)} 0 1 0 2
    C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
    D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim), COUNT(Blim);
    store D1 into 'limit_empty.output/d1';
    -- After the script done, we see the unexpected results:
    -- {(1)} {} 1 1 1 0
    -- {} {(2)} 1 1 0 1
    dump D;
    dump D1;
    3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
    The major one:
    IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns correctly in limit_empty.output/d/*.
    The difference is that one has been applied with "LIMIT" before using IsEmpty().
    The minor one:
    The redirected output only contains the first dump:
    ({(1),(1),(1)},{},1,0,3L,0L)
    ({},{(2),(2)},0,1,0L,2L)
    We expect two more lines like:
    ({(1)},{},1,1,1L,0L)
    ({},{(2)},1,1,0L,1L)
    Besides, there is error says:
    [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.pig.data.Tuple
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Daniel Dai (JIRA) at Sep 1, 2010 at 7:29 pm
    [ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Daniel Dai updated PIG-1543:
    ----------------------------

    Attachment: PIG-1543-1.patch

    This patch fix the first issue. The problem is we erroneously put a null in the bag when we expect an empty bag

    The second issue is a side effect of first issue. BinInterSedes has the assumption that bag only contains tuple, so it does not expect a null inside bag. This issue is fixed automatically once first issue is in.
    IsEmpty returns the wrong value after using LIMIT
    -------------------------------------------------

    Key: PIG-1543
    URL: https://issues.apache.org/jira/browse/PIG-1543
    Project: Pig
    Issue Type: Bug
    Affects Versions: 0.7.0
    Reporter: Justin Hu
    Assignee: Daniel Dai
    Fix For: 0.8.0

    Attachments: PIG-1543-1.patch


    1. Two input files:
    1a: limit_empty.input_a
    1
    1
    1
    1b: limit_empty.input_b
    2
    2
    2.
    The pig script: limit_empty.pig
    -- A contains only 1's & B contains only 2's
    A = load 'limit_empty.input_a' as (a1:int);
    B = load 'limit_empty.input_a' as (b1:int);
    C =COGROUP A by a1, B by b1;
    D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B);
    store D into 'limit_empty.output/d';
    -- After the script done, we see the right results:
    -- {(1),(1),(1)} {} 1 0 3 0
    -- {} {(2),(2)} 0 1 0 2
    C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
    D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim), COUNT(Blim);
    store D1 into 'limit_empty.output/d1';
    -- After the script done, we see the unexpected results:
    -- {(1)} {} 1 1 1 0
    -- {} {(2)} 1 1 0 1
    dump D;
    dump D1;
    3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
    The major one:
    IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns correctly in limit_empty.output/d/*.
    The difference is that one has been applied with "LIMIT" before using IsEmpty().
    The minor one:
    The redirected output only contains the first dump:
    ({(1),(1),(1)},{},1,0,3L,0L)
    ({},{(2),(2)},0,1,0L,2L)
    We expect two more lines like:
    ({(1)},{},1,1,1L,0L)
    ({},{(2)},1,1,0L,1L)
    Besides, there is error says:
    [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.pig.data.Tuple
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Daniel Dai (JIRA) at Sep 1, 2010 at 7:29 pm
    [ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Daniel Dai updated PIG-1543:
    ----------------------------

    Status: Patch Available (was: Open)
    IsEmpty returns the wrong value after using LIMIT
    -------------------------------------------------

    Key: PIG-1543
    URL: https://issues.apache.org/jira/browse/PIG-1543
    Project: Pig
    Issue Type: Bug
    Affects Versions: 0.7.0
    Reporter: Justin Hu
    Assignee: Daniel Dai
    Fix For: 0.8.0

    Attachments: PIG-1543-1.patch


    1. Two input files:
    1a: limit_empty.input_a
    1
    1
    1
    1b: limit_empty.input_b
    2
    2
    2.
    The pig script: limit_empty.pig
    -- A contains only 1's & B contains only 2's
    A = load 'limit_empty.input_a' as (a1:int);
    B = load 'limit_empty.input_a' as (b1:int);
    C =COGROUP A by a1, B by b1;
    D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B);
    store D into 'limit_empty.output/d';
    -- After the script done, we see the right results:
    -- {(1),(1),(1)} {} 1 0 3 0
    -- {} {(2),(2)} 0 1 0 2
    C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
    D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim), COUNT(Blim);
    store D1 into 'limit_empty.output/d1';
    -- After the script done, we see the unexpected results:
    -- {(1)} {} 1 1 1 0
    -- {} {(2)} 1 1 0 1
    dump D;
    dump D1;
    3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
    The major one:
    IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns correctly in limit_empty.output/d/*.
    The difference is that one has been applied with "LIMIT" before using IsEmpty().
    The minor one:
    The redirected output only contains the first dump:
    ({(1),(1),(1)},{},1,0,3L,0L)
    ({},{(2),(2)},0,1,0L,2L)
    We expect two more lines like:
    ({(1)},{},1,1,1L,0L)
    ({},{(2)},1,1,0L,1L)
    Besides, there is error says:
    [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.pig.data.Tuple
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Daniel Dai (JIRA) at Sep 2, 2010 at 5:07 pm
    [ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905587#action_12905587 ]

    Daniel Dai commented on PIG-1543:
    ---------------------------------

    test-patch result:

    [exec] +1 overall.
    [exec]
    [exec] +1 @author. The patch does not contain any @author tags.
    [exec]
    [exec] +1 tests included. The patch appears to include 3 new or modified tests.
    [exec]
    [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
    [exec]
    [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
    [exec]
    [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
    [exec]
    [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

    All tests pass
    IsEmpty returns the wrong value after using LIMIT
    -------------------------------------------------

    Key: PIG-1543
    URL: https://issues.apache.org/jira/browse/PIG-1543
    Project: Pig
    Issue Type: Bug
    Affects Versions: 0.7.0
    Reporter: Justin Hu
    Assignee: Daniel Dai
    Fix For: 0.8.0

    Attachments: PIG-1543-1.patch


    1. Two input files:
    1a: limit_empty.input_a
    1
    1
    1
    1b: limit_empty.input_b
    2
    2
    2.
    The pig script: limit_empty.pig
    -- A contains only 1's & B contains only 2's
    A = load 'limit_empty.input_a' as (a1:int);
    B = load 'limit_empty.input_a' as (b1:int);
    C =COGROUP A by a1, B by b1;
    D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B);
    store D into 'limit_empty.output/d';
    -- After the script done, we see the right results:
    -- {(1),(1),(1)} {} 1 0 3 0
    -- {} {(2),(2)} 0 1 0 2
    C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
    D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim), COUNT(Blim);
    store D1 into 'limit_empty.output/d1';
    -- After the script done, we see the unexpected results:
    -- {(1)} {} 1 1 1 0
    -- {} {(2)} 1 1 0 1
    dump D;
    dump D1;
    3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
    The major one:
    IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns correctly in limit_empty.output/d/*.
    The difference is that one has been applied with "LIMIT" before using IsEmpty().
    The minor one:
    The redirected output only contains the first dump:
    ({(1),(1),(1)},{},1,0,3L,0L)
    ({},{(2),(2)},0,1,0L,2L)
    We expect two more lines like:
    ({(1)},{},1,1,1L,0L)
    ({},{(2)},1,1,0L,1L)
    Besides, there is error says:
    [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.pig.data.Tuple
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Richard Ding (JIRA) at Sep 3, 2010 at 6:37 pm
    [ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906008#action_12906008 ]

    Richard Ding commented on PIG-1543:
    -----------------------------------

    +1. Looks good.
    IsEmpty returns the wrong value after using LIMIT
    -------------------------------------------------

    Key: PIG-1543
    URL: https://issues.apache.org/jira/browse/PIG-1543
    Project: Pig
    Issue Type: Bug
    Affects Versions: 0.7.0
    Reporter: Justin Hu
    Assignee: Daniel Dai
    Fix For: 0.8.0

    Attachments: PIG-1543-1.patch


    1. Two input files:
    1a: limit_empty.input_a
    1
    1
    1
    1b: limit_empty.input_b
    2
    2
    2.
    The pig script: limit_empty.pig
    -- A contains only 1's & B contains only 2's
    A = load 'limit_empty.input_a' as (a1:int);
    B = load 'limit_empty.input_a' as (b1:int);
    C =COGROUP A by a1, B by b1;
    D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B);
    store D into 'limit_empty.output/d';
    -- After the script done, we see the right results:
    -- {(1),(1),(1)} {} 1 0 3 0
    -- {} {(2),(2)} 0 1 0 2
    C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
    D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim), COUNT(Blim);
    store D1 into 'limit_empty.output/d1';
    -- After the script done, we see the unexpected results:
    -- {(1)} {} 1 1 1 0
    -- {} {(2)} 1 1 0 1
    dump D;
    dump D1;
    3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
    The major one:
    IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns correctly in limit_empty.output/d/*.
    The difference is that one has been applied with "LIMIT" before using IsEmpty().
    The minor one:
    The redirected output only contains the first dump:
    ({(1),(1),(1)},{},1,0,3L,0L)
    ({},{(2),(2)},0,1,0L,2L)
    We expect two more lines like:
    ({(1)},{},1,1,1L,0L)
    ({},{(2)},1,1,0L,1L)
    Besides, there is error says:
    [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.pig.data.Tuple
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Daniel Dai (JIRA) at Sep 3, 2010 at 9:51 pm
    [ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Daniel Dai updated PIG-1543:
    ----------------------------

    Status: Resolved (was: Patch Available)
    Hadoop Flags: [Reviewed]
    Resolution: Fixed

    Patch committed to both trunk and 0.8 branch.
    IsEmpty returns the wrong value after using LIMIT
    -------------------------------------------------

    Key: PIG-1543
    URL: https://issues.apache.org/jira/browse/PIG-1543
    Project: Pig
    Issue Type: Bug
    Affects Versions: 0.7.0
    Reporter: Justin Hu
    Assignee: Daniel Dai
    Fix For: 0.8.0

    Attachments: PIG-1543-1.patch


    1. Two input files:
    1a: limit_empty.input_a
    1
    1
    1
    1b: limit_empty.input_b
    2
    2
    2.
    The pig script: limit_empty.pig
    -- A contains only 1's & B contains only 2's
    A = load 'limit_empty.input_a' as (a1:int);
    B = load 'limit_empty.input_a' as (b1:int);
    C =COGROUP A by a1, B by b1;
    D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B);
    store D into 'limit_empty.output/d';
    -- After the script done, we see the right results:
    -- {(1),(1),(1)} {} 1 0 3 0
    -- {} {(2),(2)} 0 1 0 2
    C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
    D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim), COUNT(Blim);
    store D1 into 'limit_empty.output/d1';
    -- After the script done, we see the unexpected results:
    -- {(1)} {} 1 1 1 0
    -- {} {(2)} 1 1 0 1
    dump D;
    dump D1;
    3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:
    The major one:
    IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns correctly in limit_empty.output/d/*.
    The difference is that one has been applied with "LIMIT" before using IsEmpty().
    The minor one:
    The redirected output only contains the first dump:
    ({(1),(1),(1)},{},1,0,3L,0L)
    ({},{(2),(2)},0,1,0L,2L)
    We expect two more lines like:
    ({(1)},{},1,1,1L,0L)
    ({},{(2)},1,1,0L,1L)
    Besides, there is error says:
    [main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.pig.data.Tuple
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 12, '10 at 9:00p
activeSep 3, '10 at 9:51p
posts9
users1
websitepig.apache.org

1 user in discussion

Daniel Dai (JIRA): 9 posts

People

Translate

site design / logo © 2022 Grokbase