IsEmpty returns the wrong value after using LIMIT
-------------------------------------------------

Key: PIG-1543
URL: https://issues.apache.org/jira/browse/PIG-1543
Project: Pig
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Justin Hu

1. Two input files:

1a: limit_empty.input_a
1
1
1

1b: limit_empty.input_b
2
2

2.
The pig script: limit_empty.pig

-- A contains only 1's & B contains only 2's
A = load 'limit_empty.input_a' as (a1:int);
B = load 'limit_empty.input_a' as (b1:int);

C =COGROUP A by a1, B by b1;
D = FOREACH C generate A, B, (IsEmpty(A)? 0:1), (IsEmpty(B)? 0:1), COUNT(A), COUNT(B);
store D into 'limit_empty.output/d';
-- After the script done, we see the right results:
-- {(1),(1),(1)} {} 1 0 3 0
-- {} {(2),(2)} 0 1 0 2

C1 = foreach C { Alim = limit A 1; Blim = limit B 1; generate Alim, Blim; }
D1 = FOREACH C1 generate Alim,Blim, (IsEmpty(Alim)? 0:1), (IsEmpty(Blim)? 0:1), COUNT(Alim), COUNT(Blim);
store D1 into 'limit_empty.output/d1';
-- After the script done, we see the unexpected results:
-- {(1)} {} 1 1 1 0
-- {} {(2)} 1 1 0 1

dump D;
dump D1;

3. Run the scrip and redirect the stdout (2 dumps) file. There are two issues:

The major one:

IsEmpty() returns FALSE for empty bag in limit_empty.output/d1/*, while IsEmpty() returns correctly in limit_empty.output/d/*.

The difference is that one has been applied with "LIMIT" before using IsEmpty().

The minor one:

The redirected output only contains the first dump:

({(1),(1),(1)},{},1,0,3L,0L)
({},{(2),(2)},0,1,0L,2L)

We expect two more lines like:
({(1)},{},1,1,1L,0L)
({},{(2)},1,1,0L,1L)

Besides, there is error says:

[main] ERROR org.apache.pig.backend.hadoop.executionengine.HJob - java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.pig.data.Tuple

at Aug 16, 2010 at 10:33 pm
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1543:
--------------------------------

Fix Version/s: 0.8.0
at Aug 27, 2010 at 8:31 pm
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-1543:
-----------------------------------

Assignee: Daniel Dai

Daniel can you check if this is related to limit optimizer and if it was addressed with new optimizer. (This can be done post branch since it is a bug split.)
at Aug 30, 2010 at 7:07 pm
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904291#action_12904291 ]

Daniel Dai commented on PIG-1543:
---------------------------------

This seems not a logical layer problem and new optimizer does not address it. It might related to [PIG-747|https://issues.apache.org/jira/browse/PIG-747], need further investigation.
at Sep 1, 2010 at 7:29 pm
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1543:
----------------------------

Attachment: PIG-1543-1.patch

This patch fix the first issue. The problem is we erroneously put a null in the bag when we expect an empty bag

The second issue is a side effect of first issue. BinInterSedes has the assumption that bag only contains tuple, so it does not expect a null inside bag. This issue is fixed automatically once first issue is in.
at Sep 1, 2010 at 7:29 pm
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1543:
----------------------------

Status: Patch Available (was: Open)
at Sep 2, 2010 at 5:07 pm
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905587#action_12905587 ]

Daniel Dai commented on PIG-1543:
---------------------------------

test-patch result:

[exec] +1 overall.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] +1 tests included. The patch appears to include 3 new or modified tests.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

All tests pass
at Sep 3, 2010 at 6:37 pm
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906008#action_12906008 ]

Richard Ding commented on PIG-1543:

+1. Looks good.
-----------------------------------

+1. Looks good.
at Sep 3, 2010 at 9:51 pm
[ https://issues.apache.org/jira/browse/PIG-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1543:
----------------------------

Status: Resolved (was: Patch Available)
Resolution: Fixed

Patch committed to both trunk and 0.8 branch.
