FAQ
Split -> distinct or order -> cogroup fails
-------------------------------------------

Key: PIG-425
URL: https://issues.apache.org/jira/browse/PIG-425
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: types_branch
Reporter: Alan Gates
Assignee: Alan Gates
Fix For: types_branch


A script like:

{code}
\a = load 'myfile' as (name:chararray, age:int, gpa:double);
split a into a1 if age > 50, a2 if name < 'm';
b2 = distinct a2;
b1 = order a1 by name;
c = cogroup b2 by name, b1 by name;
d = foreach c generate flatten(group), COUNT($1), COUNT($2);
store d into 'OUTPATH';
{code}

Will abort with the error:
{code}
08/09/09 11:46:50 ERROR mapReduceLayer.Launcher: Error message from task (map) tip_200809080906_0185_m_000000java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.IndexedTuple
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:81)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:135)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
{code}

The issue is that the RearrangeAdjuster in MRCompiler is not properly seeing this as a cogroup and moving the localrearrnge out of the reduce and into the
map.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Alan Gates (JIRA) at Sep 9, 2008 at 8:58 pm
    [ https://issues.apache.org/jira/browse/PIG-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alan Gates reassigned PIG-425:
    ------------------------------

    Assignee: Shravan Matthur Narayanamurthy (was: Alan Gates)
    Split -> distinct or order -> cogroup fails
    -------------------------------------------

    Key: PIG-425
    URL: https://issues.apache.org/jira/browse/PIG-425
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: types_branch
    Reporter: Alan Gates
    Assignee: Shravan Matthur Narayanamurthy
    Fix For: types_branch


    A script like:
    {code}
    \a = load 'myfile' as (name:chararray, age:int, gpa:double);
    split a into a1 if age > 50, a2 if name < 'm';
    b2 = distinct a2;
    b1 = order a1 by name;
    c = cogroup b2 by name, b1 by name;
    d = foreach c generate flatten(group), COUNT($1), COUNT($2);
    store d into 'OUTPATH';
    {code}
    Will abort with the error:
    {code}
    08/09/09 11:46:50 ERROR mapReduceLayer.Launcher: Error message from task (map) tip_200809080906_0185_m_000000java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.IndexedTuple
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:81)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:135)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
    {code}
    The issue is that the RearrangeAdjuster in MRCompiler is not properly seeing this as a cogroup and moving the localrearrnge out of the reduce and into the
    map.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 10, 2008 at 7:04 pm
    [ https://issues.apache.org/jira/browse/PIG-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich updated PIG-425:
    -------------------------------

    Priority: Critical (was: Major)
    Split -> distinct or order -> cogroup fails
    -------------------------------------------

    Key: PIG-425
    URL: https://issues.apache.org/jira/browse/PIG-425
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: types_branch
    Reporter: Alan Gates
    Assignee: Shravan Matthur Narayanamurthy
    Priority: Critical
    Fix For: types_branch


    A script like:
    {code}
    \a = load 'myfile' as (name:chararray, age:int, gpa:double);
    split a into a1 if age > 50, a2 if name < 'm';
    b2 = distinct a2;
    b1 = order a1 by name;
    c = cogroup b2 by name, b1 by name;
    d = foreach c generate flatten(group), COUNT($1), COUNT($2);
    store d into 'OUTPATH';
    {code}
    Will abort with the error:
    {code}
    08/09/09 11:46:50 ERROR mapReduceLayer.Launcher: Error message from task (map) tip_200809080906_0185_m_000000java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.IndexedTuple
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:81)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:135)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
    {code}
    The issue is that the RearrangeAdjuster in MRCompiler is not properly seeing this as a cogroup and moving the localrearrnge out of the reduce and into the
    map.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 11, 2008 at 5:16 pm
    [ https://issues.apache.org/jira/browse/PIG-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-425:
    -----------------------------------------------

    Status: Patch Available (was: Open)

    The MRCompiler currently tries to pack as many operators possible into a single phase. So when we have two cogroups one after the other, the LR in the second cogroup gets pushed into the reducer. Since the store just stores away, LRs output, if we load it and pass it to GR we should be just fine.

    However, since IndexedTuple isn't implemented as a new kind of Tuple with a Factory, the Load on the other side tries to load a DefaultTuple from an IndexedTuple and incidentally succeeds due to the way IndexedTuple is serialized. However, this can't be carried any further and when the mapper tries to collect the IndexedTuple, it fails.

    The fix I have is three fold. I have modified IndexedTuple's serialization to suit the solution. Second, I have made IndexedTuple a type of tuple by writing a different byte to the marker byte indicating that this is an IndexedTuple(like we identify null and non-null tuples). Third, I have modfied DataReaderWriter's readDatum method to check if we have an IndexedTuple and process it according to IndexedTuple's serialization format.

    With this, to try out, I have removed the RearrangeAdjuster from MRCompiler to see if my hypothesis is correct. The unit tests passed except MRCompiler due to GoldenPlan issues. We need to run all the end to end tests against this patch and confirm that it works.
    Split -> distinct or order -> cogroup fails
    -------------------------------------------

    Key: PIG-425
    URL: https://issues.apache.org/jira/browse/PIG-425
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: types_branch
    Reporter: Alan Gates
    Assignee: Shravan Matthur Narayanamurthy
    Priority: Critical
    Fix For: types_branch

    Attachments: 425.patch


    A script like:
    {code}
    \a = load 'myfile' as (name:chararray, age:int, gpa:double);
    split a into a1 if age > 50, a2 if name < 'm';
    b2 = distinct a2;
    b1 = order a1 by name;
    c = cogroup b2 by name, b1 by name;
    d = foreach c generate flatten(group), COUNT($1), COUNT($2);
    store d into 'OUTPATH';
    {code}
    Will abort with the error:
    {code}
    08/09/09 11:46:50 ERROR mapReduceLayer.Launcher: Error message from task (map) tip_200809080906_0185_m_000000java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.IndexedTuple
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:81)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:135)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
    {code}
    The issue is that the RearrangeAdjuster in MRCompiler is not properly seeing this as a cogroup and moving the localrearrnge out of the reduce and into the
    map.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Shravan Matthur Narayanamurthy (JIRA) at Sep 11, 2008 at 5:16 pm
    [ https://issues.apache.org/jira/browse/PIG-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Shravan Matthur Narayanamurthy updated PIG-425:
    -----------------------------------------------

    Attachment: 425.patch
    Split -> distinct or order -> cogroup fails
    -------------------------------------------

    Key: PIG-425
    URL: https://issues.apache.org/jira/browse/PIG-425
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: types_branch
    Reporter: Alan Gates
    Assignee: Shravan Matthur Narayanamurthy
    Priority: Critical
    Fix For: types_branch

    Attachments: 425.patch


    A script like:
    {code}
    \a = load 'myfile' as (name:chararray, age:int, gpa:double);
    split a into a1 if age > 50, a2 if name < 'm';
    b2 = distinct a2;
    b1 = order a1 by name;
    c = cogroup b2 by name, b1 by name;
    d = foreach c generate flatten(group), COUNT($1), COUNT($2);
    store d into 'OUTPATH';
    {code}
    Will abort with the error:
    {code}
    08/09/09 11:46:50 ERROR mapReduceLayer.Launcher: Error message from task (map) tip_200809080906_0185_m_000000java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.IndexedTuple
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:81)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:135)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
    {code}
    The issue is that the RearrangeAdjuster in MRCompiler is not properly seeing this as a cogroup and moving the localrearrnge out of the reduce and into the
    map.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 11, 2008 at 8:42 pm
    [ https://issues.apache.org/jira/browse/PIG-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630356#action_12630356 ]

    Olga Natkovich commented on PIG-425:
    ------------------------------------

    Thanks, Shravan for figuring out what the problem was and providing the patch!
    Split -> distinct or order -> cogroup fails
    -------------------------------------------

    Key: PIG-425
    URL: https://issues.apache.org/jira/browse/PIG-425
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: types_branch
    Reporter: Alan Gates
    Assignee: Alan Gates
    Priority: Critical
    Fix For: types_branch

    Attachments: 425.patch


    A script like:
    {code}
    \a = load 'myfile' as (name:chararray, age:int, gpa:double);
    split a into a1 if age > 50, a2 if name < 'm';
    b2 = distinct a2;
    b1 = order a1 by name;
    c = cogroup b2 by name, b1 by name;
    d = foreach c generate flatten(group), COUNT($1), COUNT($2);
    store d into 'OUTPATH';
    {code}
    Will abort with the error:
    {code}
    08/09/09 11:46:50 ERROR mapReduceLayer.Launcher: Error message from task (map) tip_200809080906_0185_m_000000java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.IndexedTuple
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:81)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:135)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
    {code}
    The issue is that the RearrangeAdjuster in MRCompiler is not properly seeing this as a cogroup and moving the localrearrnge out of the reduce and into the
    map.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 11, 2008 at 8:42 pm
    [ https://issues.apache.org/jira/browse/PIG-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich reassigned PIG-425:
    ----------------------------------

    Assignee: Alan Gates (was: Shravan Matthur Narayanamurthy)

    I verified that all the tests run with the patch. However, we are not going to commit the patch since IndexTuple is going away and Alan will integrate this changes together with his NULL work on join.
    Split -> distinct or order -> cogroup fails
    -------------------------------------------

    Key: PIG-425
    URL: https://issues.apache.org/jira/browse/PIG-425
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: types_branch
    Reporter: Alan Gates
    Assignee: Alan Gates
    Priority: Critical
    Fix For: types_branch

    Attachments: 425.patch


    A script like:
    {code}
    \a = load 'myfile' as (name:chararray, age:int, gpa:double);
    split a into a1 if age > 50, a2 if name < 'm';
    b2 = distinct a2;
    b1 = order a1 by name;
    c = cogroup b2 by name, b1 by name;
    d = foreach c generate flatten(group), COUNT($1), COUNT($2);
    store d into 'OUTPATH';
    {code}
    Will abort with the error:
    {code}
    08/09/09 11:46:50 ERROR mapReduceLayer.Launcher: Error message from task (map) tip_200809080906_0185_m_000000java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.IndexedTuple
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:81)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:135)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
    {code}
    The issue is that the RearrangeAdjuster in MRCompiler is not properly seeing this as a cogroup and moving the localrearrnge out of the reduce and into the
    map.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Sep 17, 2008 at 11:18 pm
    [ https://issues.apache.org/jira/browse/PIG-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632002#action_12632002 ]

    Alan Gates commented on PIG-425:
    --------------------------------

    I found one significant downside to the approach of this patch. Not moving the local rearrange into the map removes the possibility of running the combiner. So if you have a query like:

    C = cogroup A, B;
    D = foreach C flatten(A), (B);
    E = group D by $0;
    F = foreach E generate group, COUNT(D)

    that count will not be done in the combiner now. That seems like a significant downside.
    Split -> distinct or order -> cogroup fails
    -------------------------------------------

    Key: PIG-425
    URL: https://issues.apache.org/jira/browse/PIG-425
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: types_branch
    Reporter: Alan Gates
    Assignee: Alan Gates
    Priority: Critical
    Fix For: types_branch

    Attachments: 425.patch


    A script like:
    {code}
    \a = load 'myfile' as (name:chararray, age:int, gpa:double);
    split a into a1 if age > 50, a2 if name < 'm';
    b2 = distinct a2;
    b1 = order a1 by name;
    c = cogroup b2 by name, b1 by name;
    d = foreach c generate flatten(group), COUNT($1), COUNT($2);
    store d into 'OUTPATH';
    {code}
    Will abort with the error:
    {code}
    08/09/09 11:46:50 ERROR mapReduceLayer.Launcher: Error message from task (map) tip_200809080906_0185_m_000000java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.IndexedTuple
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:81)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:135)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
    {code}
    The issue is that the RearrangeAdjuster in MRCompiler is not properly seeing this as a cogroup and moving the localrearrnge out of the reduce and into the
    map.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Alan Gates (JIRA) at Sep 19, 2008 at 9:29 pm
    [ https://issues.apache.org/jira/browse/PIG-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Alan Gates updated PIG-425:
    ---------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    This issue was resolved by the patch that fixed PIG-361.
    Split -> distinct or order -> cogroup fails
    -------------------------------------------

    Key: PIG-425
    URL: https://issues.apache.org/jira/browse/PIG-425
    Project: Pig
    Issue Type: Bug
    Components: impl
    Affects Versions: types_branch
    Reporter: Alan Gates
    Assignee: Alan Gates
    Priority: Critical
    Fix For: types_branch

    Attachments: 425.patch


    A script like:
    {code}
    \a = load 'myfile' as (name:chararray, age:int, gpa:double);
    split a into a1 if age > 50, a2 if name < 'm';
    b2 = distinct a2;
    b1 = order a1 by name;
    c = cogroup b2 by name, b1 by name;
    d = foreach c generate flatten(group), COUNT($1), COUNT($2);
    store d into 'OUTPATH';
    {code}
    Will abort with the error:
    {code}
    08/09/09 11:46:50 ERROR mapReduceLayer.Launcher: Error message from task (map) tip_200809080906_0185_m_000000java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.IndexedTuple
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:81)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:135)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
    {code}
    The issue is that the RearrangeAdjuster in MRCompiler is not properly seeing this as a cogroup and moving the localrearrnge out of the reduce and into the
    map.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedSep 9, '08 at 7:14p
activeSep 19, '08 at 9:29p
posts9
users1
websitepig.apache.org

1 user in discussion

Alan Gates (JIRA): 9 posts

People

Translate

site design / logo © 2022 Grokbase