FAQ

[Pig-user] Union of multiple loads using HBaseStorage not working as expected.

Eduardo Afonso Ferreira
Sep 6, 2011 at 4:50 pm
Hi there,

We hit a possible issue with Pig (version 0.9.1) and HBaseStorage where we try to LOAD multiple sets of data and UNION them. Here's a simple example that shows the problem:

HBase Data (use hbase shell to create table and add rows):


create 'test', {NAME => 'data', VERSIONS => 1}

put 'test', '11111', 'data:value', '1'
put 'test', '11112', 'data:value', '2'
put 'test', '11113', 'data:value', '3'
put 'test', '22221', 'data:value', '4'
put 'test', '22222', 'data:value', '5'

put 'test', '22223', 'data:value', '6'

Pig Statements (create file test.pig):

load1 = LOAD 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte 11110 -lte 22220') AS (key:chararray, map:map[]);
load2 = LOAD 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte 22220 -lte 33330') AS (key:chararray, map:map[]);
result = UNION load1, load2;
dump result;


Run Script:
pig -x local test.pig


Result:
(11111,[value#1])
(11112,[value#2])
(11113,[value#3])
(11111,[value#1])
(11112,[value#2])
(11113,[value#3])



The result should be the following:
(11111,[value#1])
(11112,[value#2])
(11113,[value#3])
(22221,[value#4])
(22222,[value#5])
(22223,[value#6])

If we dump load1 or load2 we see the results we expect, but when the UNION is performed, it does not put the expected data together.

Is this a known issue with Pig/HBaseStorage or are we not using them as we should?
If it's a usage problem, what would be the proper way of loading multiple sets of data and union them?

Thanks in advance.
Eduardo.
reply

Search Discussions

2 responses

  • Dmitriy Ryaboy at Sep 6, 2011 at 4:56 pm
    Hi Eduardo, there is no 0.9.1.. do you mean you built it from the 0.9
    branch?
    Could you try trunk?

    On Tue, Sep 6, 2011 at 9:50 AM, Eduardo Afonso Ferreira
    wrote:
    Hi there,

    We hit a possible issue with Pig (version 0.9.1) and HBaseStorage where we
    try to LOAD multiple sets of data and UNION them. Here's a simple example
    that shows the problem:

    HBase Data (use hbase shell to create table and add rows):


    create 'test', {NAME => 'data', VERSIONS => 1}

    put 'test', '11111', 'data:value', '1'
    put 'test', '11112', 'data:value', '2'
    put 'test', '11113', 'data:value', '3'
    put 'test', '22221', 'data:value', '4'
    put 'test', '22222', 'data:value', '5'

    put 'test', '22223', 'data:value', '6'

    Pig Statements (create file test.pig):

    load1 = LOAD 'hbase://test' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte
    11110 -lte 22220') AS (key:chararray, map:map[]);
    load2 = LOAD 'hbase://test' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte
    22220 -lte 33330') AS (key:chararray, map:map[]);
    result = UNION load1, load2;
    dump result;


    Run Script:
    pig -x local test.pig


    Result:
    (11111,[value#1])
    (11112,[value#2])
    (11113,[value#3])
    (11111,[value#1])
    (11112,[value#2])
    (11113,[value#3])



    The result should be the following:
    (11111,[value#1])
    (11112,[value#2])
    (11113,[value#3])
    (22221,[value#4])
    (22222,[value#5])
    (22223,[value#6])

    If we dump load1 or load2 we see the results we expect, but when the UNION
    is performed, it does not put the expected data together.

    Is this a known issue with Pig/HBaseStorage or are we not using them as we
    should?
    If it's a usage problem, what would be the proper way of loading multiple
    sets of data and union them?

    Thanks in advance.
    Eduardo.
  • Eduardo Afonso Ferreira at Sep 6, 2011 at 5:41 pm
    Hey, Dmitriy,

    We built from a code we got from the 0.9 branch a couple of weeks ago.

    But we just built from the trunk and now it works as expected.

    Thanks for the help.
    Eduardo.



    ________________________________
    From: Dmitriy Ryaboy <dvryaboy@gmail.com>
    To: user@pig.apache.org; Eduardo Afonso Ferreira <eafonsof@yahoo.com>
    Sent: Tuesday, September 6, 2011 12:56 PM
    Subject: Re: Union of multiple loads using HBaseStorage not working as expected.


    Hi Eduardo, there is no 0.9.1.. do you mean you built it from the 0.9 branch?
    Could you try trunk?


    On Tue, Sep 6, 2011 at 9:50 AM, Eduardo Afonso Ferreira wrote:

    Hi there,
    We hit a possible issue with Pig (version 0.9.1) and HBaseStorage where we try to LOAD multiple sets of data and UNION them. Here's a simple example that shows the problem:

    HBase Data (use hbase shell to create table and add rows):


    create 'test', {NAME => 'data', VERSIONS => 1}

    put 'test', '11111', 'data:value', '1'
    put 'test', '11112', 'data:value', '2'
    put 'test', '11113', 'data:value', '3'
    put 'test', '22221', 'data:value', '4'
    put 'test', '22222', 'data:value', '5'

    put 'test', '22223', 'data:value', '6'

    Pig Statements (create file test.pig):

    load1 = LOAD 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte 11110 -lte 22220') AS (key:chararray, map:map[]);
    load2 = LOAD 'hbase://test' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:*','-loadKey -gte 22220 -lte 33330') AS (key:chararray, map:map[]);
    result = UNION load1, load2;
    dump result;


    Run Script:
    pig -x local test.pig


    Result:
    (11111,[value#1])
    (11112,[value#2])
    (11113,[value#3])
    (11111,[value#1])
    (11112,[value#2])
    (11113,[value#3])



    The result should be the following:
    (11111,[value#1])
    (11112,[value#2])
    (11113,[value#3])
    (22221,[value#4])
    (22222,[value#5])
    (22223,[value#6])

    If we dump load1 or load2 we see the results we expect, but when the UNION is performed, it does not put the expected data together.

    Is this a known issue with Pig/HBaseStorage or are we not using them as we should?
    If it's a usage problem, what would be the proper way of loading multiple sets of data and union them?

    Thanks in advance.
    Eduardo.

Related Discussions

Discussion Navigation
viewthread | post