Grokbase Groups Pig user May 2011
FAQ
Hi,

While testing a very simple PIG 0.8.0 script counting the nb of rows
of one of my HBase tables, I got a strange result: the nb of rows
reported was only half it should have been (compared to a 'count'
done in a HBase shell.

It appears that the HBaseStorage loader seems to load only 1 single
region of my table.

Any idea ? Is this a known regression ?

Here is my script:

myRows = LOAD 'hbase://<my table>' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:deviceid') AS
(deviceid:chararray);
allRows = GROUP myRows ALL;
nbRows = FOREACH allRows GENERATE COUNT(myRows);
DUMP nbRows;


--

*Vincent BARAT, UBIKOD, CTO*


vbarat@ubikod.com Mob +33 (0)6 15 41 15 18

UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021
Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89

UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2
99 65 69 13


www.ubikod.com <http://www.ubikod.com/>@ubikod
<http://twitter.com/ubikod>

www.capptain.com <http://www.capptain.com/>@capptain_hq
<http://twitter.com/capptain_hq>


IMPORTANT NOTICE – UBIKOD and CAPPTAIN are registered trademarks of
UBIKOD S.A.R.L., all copyrights are reserved. The contents of this
email and attachments are confidential and may be subject to legal
privilege and/or protected by copyright. Copying or communicating
any part of it to others is prohibited and may be unlawful. If you
are not the intended recipient you must not use, copy, distribute or
rely on this email and should please return it immediately or notify
us by telephone. At present the integrity of email across the
Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not
accept liability for any claims arising as a result of the use of
this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD
S.A.R.L. may exercise any of its rights under relevant law, to
monitor the content of all electronic communications. You should
therefore be aware that this communication and any responses might
have been monitored, and may be accessed by UBIKOD S.A.R.L. The
views expressed in this document are that of the individual and may
not necessarily constitute or imply its endorsement or
recommendation by UBIKOD S.A.R.L. The content of this electronic
mail may be subject to the confidentiality terms of a
"Non-Disclosure Agreement" (NDA).

Search Discussions

  • Jameson Lopp at May 23, 2011 at 3:11 pm
    This sounds like a problem I also ran into a while back. I believe I solved it by setting:

    SET pig.splitCombination 'false';

    There may be a better way (turning off split combination feels like a bad thing to do) but that's
    the only luck I had when I was seeing partial data being loaded.
    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
    On 05/23/2011 10:58 AM, Vincent Barat wrote:
    Hi,

    While testing a very simple PIG 0.8.0 script counting the nb of rows of one of my HBase tables, I
    got a strange result: the nb of rows reported was only half it should have been (compared to a
    'count' done in a HBase shell.

    It appears that the HBaseStorage loader seems to load only 1 single region of my table.

    Any idea ? Is this a known regression ?

    Here is my script:

    myRows = LOAD 'hbase://<my table>' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:deviceid') AS (deviceid:chararray);
    allRows = GROUP myRows ALL;
    nbRows = FOREACH allRows GENERATE COUNT(myRows);
    DUMP nbRows;
  • Dmitriy Ryaboy at May 23, 2011 at 8:19 pm
    I believe we fixed this issue in 8.1 (but for 8.0, the solution is
    what Jameson suggests -- turning off split combination completely).

    Please let us know if this still happens on 8.1 when split combination is on.

    D
    On Mon, May 23, 2011 at 8:11 AM, Jameson Lopp wrote:
    This sounds like a problem I also ran into a while back. I believe I solved
    it by setting:

    SET pig.splitCombination 'false';

    There may be a better way (turning off split combination feels like a bad
    thing to do) but that's the only luck I had when I was seeing partial data
    being loaded.
    --
    Jameson Lopp
    Software Engineer
    Bronto Software, Inc.
    On 05/23/2011 10:58 AM, Vincent Barat wrote:

    Hi,

    While testing a very simple PIG 0.8.0 script counting the nb of rows of
    one of my HBase tables, I
    got a strange result: the nb of rows reported was only half it should have
    been (compared to a
    'count' done in a HBase shell.

    It appears that the HBaseStorage loader seems to load only 1 single region
    of my table.

    Any idea ? Is this a known regression ?

    Here is my script:

    myRows = LOAD 'hbase://<my table>' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:deviceid') AS
    (deviceid:chararray);
    allRows = GROUP myRows ALL;
    nbRows = FOREACH allRows GENERATE COUNT(myRows);
    DUMP nbRows;
  • Vincent Barat at May 24, 2011 at 5:34 am
    Actually, I tested PIG 0.8.1 a few days ago and I had the same issue.

    Furthermore, PIG 0.8.1 uses HBase 0.90, while PIG 0.8.0 uses HBase
    0.20.6. I thought this difference was the reason why PIG 0.8.1
    didn't load all of my data (I use HBase 0.20.6).
    So I jumped back to PIG 0.8.0 and discovered that this issue was
    also on this version.

    I think that this bug makes PIG useless when working with HBase, and
    I'm very disappointed to see that the new HBase loader has such a bug !

    Le 23/05/11 22:19, Dmitriy Ryaboy a écrit :
    Please let us know if this still happens on 8.1 when split combination is on.
    --

    *Vincent BARAT, UBIKOD, CTO*


    vbarat@ubikod.com Mob +33 (0)6 15 41 15 18

    UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021
    Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89

    UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2
    99 65 69 13


    www.ubikod.com <http://www.ubikod.com/>@ubikod
    <http://twitter.com/ubikod>

    www.capptain.com <http://www.capptain.com/>@capptain_hq
    <http://twitter.com/capptain_hq>


    IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of
    UBIKOD S.A.R.L., all copyrights are reserved. The contents of this
    email and attachments are confidential and may be subject to legal
    privilege and/or protected by copyright. Copying or communicating
    any part of it to others is prohibited and may be unlawful. If you
    are not the intended recipient you must not use, copy, distribute or
    rely on this email and should please return it immediately or notify
    us by telephone. At present the integrity of email across the
    Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not
    accept liability for any claims arising as a result of the use of
    this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD
    S.A.R.L. may exercise any of its rights under relevant law, to
    monitor the content of all electronic communications. You should
    therefore be aware that this communication and any responses might
    have been monitored, and may be accessed by UBIKOD S.A.R.L. The
    views expressed in this document are that of the individual and may
    not necessarily constitute or imply its endorsement or
    recommendation by UBIKOD S.A.R.L. The content of this electronic
    mail may be subject to the confidentiality terms of a
    "Non-Disclosure Agreement" (NDA).
  • Dmitriy Ryaboy at May 24, 2011 at 5:44 am
    You couldn't have possibly tested with Pig 0.8.1 successfully, as it
    does not work with HBase 0.20.6 at all. This issue should not show up
    if you use Pig 0.8.1 and HBase 0.90+

    Upgrade HBase. The reason I decided this was an acceptable bump in a
    minor release was that 20.6 has a lot of scaling issues that have been
    fixed in 90; anyone running HBase in production should be upgrading
    immediately unless they really like manually rescuing regions.

    Of course, like Jameson suggested, you can also just turn off split
    combination in 0.8. The bug is not in either of the features, it's in
    how they interact, which is why we didn't catch it until it was too
    late :-(.

    D
    On Mon, May 23, 2011 at 10:34 PM, Vincent Barat wrote:
    Actually, I tested PIG 0.8.1 a few days ago and I had the same issue.

    Furthermore, PIG 0.8.1 uses HBase 0.90, while PIG 0.8.0 uses HBase 0.20.6. I
    thought this difference was the reason why PIG 0.8.1 didn't load all of my
    data (I use HBase 0.20.6).
    So I jumped back to PIG 0.8.0 and discovered that this issue was also on
    this version.

    I think that this bug makes PIG useless when working with HBase, and I'm
    very disappointed to see that the new HBase loader has such a bug !

    Le 23/05/11 22:19, Dmitriy Ryaboy a écrit :
    Please let us know if this still happens on 8.1 when split combination is
    on.
    --

    *Vincent BARAT, UBIKOD, CTO*


    vbarat@ubikod.com  Mob +33 (0)6 15 41 15 18

    UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021
    Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89

    UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2 99 65 69
    13


    www.ubikod.com <http://www.ubikod.com/>@ubikod <http://twitter.com/ubikod>

    www.capptain.com <http://www.capptain.com/>@capptain_hq
    <http://twitter.com/capptain_hq>


    IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of UBIKOD
    S.A.R.L., all copyrights are reserved.  The contents of this email and
    attachments are confidential and may be subject to legal privilege and/or
    protected by copyright. Copying or communicating any part of it to others is
    prohibited and may be unlawful. If you are not the intended recipient you
    must not use, copy, distribute or rely on this email and should please
    return it immediately or notify us by telephone. At present the integrity of
    email across the Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L.
    will not accept liability for any claims arising as a result of the use of
    this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD S.A.R.L. may
    exercise any of its rights under relevant law, to monitor the content of all
    electronic communications. You should therefore be aware that this
    communication and any responses might have been monitored, and may be
    accessed by UBIKOD S.A.R.L. The views expressed in this document are that of
    the individual and may not necessarily constitute or imply its endorsement
    or recommendation by UBIKOD S.A.R.L. The content of this electronic mail may
    be subject to the confidentiality terms of a "Non-Disclosure Agreement"
    (NDA).
  • Vincent Barat at May 24, 2011 at 7:03 am
    You're right, I tested pig 0.8.0 with hbase 0.20.6, and not 0.8.1
    (very sorry).

    Le 24/05/11 07:44, Dmitriy Ryaboy a écrit :
    You couldn't have possibly tested with Pig 0.8.1 successfully, as it
    does not work with HBase 0.20.6 at all. This issue should not show up
    if you use Pig 0.8.1 and HBase 0.90+

    Upgrade HBase. The reason I decided this was an acceptable bump in a
    minor release was that 20.6 has a lot of scaling issues that have been
    fixed in 90; anyone running HBase in production should be upgrading
    immediately unless they really like manually rescuing regions.
    Yes, we were planning to upgrade to HBase 0.90, but were blocked
    because of PIG 0.8.0 limitation to HBase 0.20.6. Now that 0.8.1 is
    out, we can upgrade.
    Of course, like Jameson suggested, you can also just turn off split
    combination in 0.8. The bug is not in either of the features, it's in
    how they interact, which is why we didn't catch it until it was too
    late :-(.
    Thanks a lot for this clarification.

    Regards,

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMay 23, '11 at 2:59p
activeMay 24, '11 at 7:03a
posts6
users4
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase