Grokbase Groups Pig dev August 2008
FAQ
Union doesn't work
------------------

Key: PIG-390
URL: https://issues.apache.org/jira/browse/PIG-390
Project: Pig
Issue Type: Bug
Environment: Mac OS X
Reporter: Arthur Zwiegincew


data files:

$ cat ~/tmp/data
1 1
2 1
3 10

$ cat ~/tmp/data-2
4 20
5 20

pig script:
data = load '/Users/arthur/tmp/data' as (x, y);
data2 = load '/Users/arthur/tmp/data-2' as (x, y);
both = union data, data2;
dump both;

result:
(4, 20)
(5, 20)


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Arthur Zwiegincew (JIRA) at Aug 31, 2008 at 12:20 am
    [ https://issues.apache.org/jira/browse/PIG-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627261#action_12627261 ]

    Arthur Zwiegincew commented on PIG-390:
    ---------------------------------------

    Here's a workaround I'm using:

    package com.cooliris.analytics;

    import java.io.IOException;

    import org.apache.pig.EvalFunc;
    import org.apache.pig.data.DataBag;
    import org.apache.pig.data.Tuple;

    /**
    * Implements a UNIONALL Pig function. It accepts a tuple of the format <unused, {bag-1}, {bag-2}, {bag-3}, ...>
    * and outputs a set of tuples corresponding to UNION bag-1, bag-2, bag-3, ... . This is intended as a workaround
    * to bug PIG-390 — Union doesn't work.
    *
    * Instead of:
    * combined = UNION data1, data2, data3;
    * ...do the following:
    * cg_combined = COGROUP data1 BY 1, data2 BY 1, data3 BY 1;
    * combined = FOREACH cg_combined GENERATE FLATTEN(com.cooliris.analytics.UNIONALL(*));
    *
    * @author arthur@cooliris.com
    */
    public class UNIONALL extends EvalFunc<DataBag> {

    @Override
    public void exec(Tuple input, DataBag output) throws IOException {
    for (int i = 1; i < input.arity(); ++i) {
    for (Tuple nested : input.getBagField(i)) {
    output.add(nested);
    }
    }
    }
    }

    Union doesn't work
    ------------------

    Key: PIG-390
    URL: https://issues.apache.org/jira/browse/PIG-390
    Project: Pig
    Issue Type: Bug
    Environment: Mac OS X
    Reporter: Arthur Zwiegincew

    data files:
    $ cat ~/tmp/data
    1 1
    2 1
    3 10
    $ cat ~/tmp/data-2
    4 20
    5 20
    pig script:
    data = load '/Users/arthur/tmp/data' as (x, y);
    data2 = load '/Users/arthur/tmp/data-2' as (x, y);
    both = union data, data2;
    dump both;
    result:
    (4, 20)
    (5, 20)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Kevin Weil (JIRA) at Oct 6, 2008 at 12:34 am
    [ https://issues.apache.org/jira/browse/PIG-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636975#action_12636975 ]

    Kevin Weil commented on PIG-390:
    --------------------------------

    An update to this bug: it appears to be fixed in the types branch, using hadoop 0.18.1 and the stable-1 svn tag.

    Kevin
    Union doesn't work
    ------------------

    Key: PIG-390
    URL: https://issues.apache.org/jira/browse/PIG-390
    Project: Pig
    Issue Type: Bug
    Environment: Mac OS X
    Reporter: Arthur Zwiegincew

    data files:
    $ cat ~/tmp/data
    1 1
    2 1
    3 10
    $ cat ~/tmp/data-2
    4 20
    5 20
    pig script:
    data = load '/Users/arthur/tmp/data' as (x, y);
    data2 = load '/Users/arthur/tmp/data-2' as (x, y);
    both = union data, data2;
    dump both;
    result:
    (4, 20)
    (5, 20)
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedAug 21, '08 at 10:03p
activeOct 6, '08 at 12:34a
posts3
users1
websitepig.apache.org

1 user in discussion

Kevin Weil (JIRA): 3 posts

People

Translate

site design / logo © 2022 Grokbase