Grokbase Groups Pig user July 2011
FAQ
Hi,

I'm using PIG 0.8.1 with HBase 0.90 and the following script
somethime returns an empty set, and sometimes work !

start_sessions = LOAD 'startSession' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
infoid:chararray, imei:chararray, start:long);
end_sessions = LOAD 'endSession' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
meta:timestamp meta:locid') AS (sid:chararray, end:long,
locid:chararray);
infos = LOAD 'info.mde253811.preprod.ubithere.com' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid') AS
(infoid:chararray);
sessions = JOIN start_sessions BY sid, end_sessions BY sid;
sessions = JOIN sessions BY infoid, infos BY infoid;
dump sessions;

(dumping the "infos" seems to make it work)

Any idea about this very irritating behavior ?

Search Discussions

  • Vincent Barat at Jul 26, 2011 at 5:41 pm
    Hi,

    I'm using PIG 0.8.1 with HBase 0.90 and the following script
    sometime returns an empty set, and sometimes work !

    start_sessions = LOAD 'startSession' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
    infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long,
    locid:chararray);
    infos = LOAD 'info.mde253811.preprod.ubithere.com' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid') AS
    (infoid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = JOIN sessions BY infoid, infos BY infoid;
    dump sessions;

    (dumping the "infos" before the sessions seems to make it work)

    Any idea about this very irritating behavior ?
  • Dmitriy Ryaboy at Jul 26, 2011 at 6:16 pm
    Vincent, can you try replacing the HBase classes with those from trunk?
    A couple of fixes went in that might address that.
    Also, make sure you are running 0.90.3

    D
    On Tue, Jul 26, 2011 at 10:40 AM, Vincent Barat wrote:

    Hi,

    I'm using PIG 0.8.1 with HBase 0.90 and the following script sometime
    returns an empty set, and sometimes work !


    start_sessions = LOAD 'startSession' USING org.apache.pig.backend.hadoop.*
    *hbase.HBaseStorage('meta:sid meta:infoid meta:imei meta:timestamp') AS
    (sid:chararray, infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession' USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long, locid:chararray);
    infos = LOAD 'info.mde253811.preprod.**ubithere.com<http://info.mde253811.preprod.ubithere.com>'
    USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:**infoid')
    AS (infoid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = JOIN sessions BY infoid, infos BY infoid;
    dump sessions;

    (dumping the "infos" before the sessions seems to make it work)


    Any idea about this very irritating behavior ?
  • Corbin Hoenes at Jul 26, 2011 at 7:09 pm
    Dmitriy,

    Does the HBaseStorage class from Pig 0.8.1 and HBase 90.3 work together? We
    just upgraded our HBase cluster and a developer found some issues and we
    aren't sure if they are related to the upgrade.


    On Tue, Jul 26, 2011 at 12:16 PM, Dmitriy Ryaboy wrote:

    Vincent, can you try replacing the HBase classes with those from trunk?
    A couple of fixes went in that might address that.
    Also, make sure you are running 0.90.3

    D

    On Tue, Jul 26, 2011 at 10:40 AM, Vincent Barat <[email protected]
    wrote:
    Hi,

    I'm using PIG 0.8.1 with HBase 0.90 and the following script sometime
    returns an empty set, and sometimes work !


    start_sessions = LOAD 'startSession' USING
    org.apache.pig.backend.hadoop.*
    *hbase.HBaseStorage('meta:sid meta:infoid meta:imei meta:timestamp') AS
    (sid:chararray, infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession' USING
    org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long,
    locid:chararray);
    infos = LOAD 'info.mde253811.preprod.**ubithere.com<
    http://info.mde253811.preprod.ubithere.com>'
    USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:**infoid')
    AS (infoid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = JOIN sessions BY infoid, infos BY infoid;
    dump sessions;

    (dumping the "infos" before the sessions seems to make it work)


    Any idea about this very irritating behavior ?
  • Dmitriy Ryaboy at Jul 26, 2011 at 7:39 pm
    They better, that's the combination I'm running in production :).

    D
    On Tue, Jul 26, 2011 at 12:08 PM, Corbin Hoenes wrote:

    Dmitriy,

    Does the HBaseStorage class from Pig 0.8.1 and HBase 90.3 work together?
    We
    just upgraded our HBase cluster and a developer found some issues and we
    aren't sure if they are related to the upgrade.


    On Tue, Jul 26, 2011 at 12:16 PM, Dmitriy Ryaboy wrote:

    Vincent, can you try replacing the HBase classes with those from trunk?
    A couple of fixes went in that might address that.
    Also, make sure you are running 0.90.3

    D

    On Tue, Jul 26, 2011 at 10:40 AM, Vincent Barat <[email protected]
    wrote:
    Hi,

    I'm using PIG 0.8.1 with HBase 0.90 and the following script sometime
    returns an empty set, and sometimes work !


    start_sessions = LOAD 'startSession' USING
    org.apache.pig.backend.hadoop.*
    *hbase.HBaseStorage('meta:sid meta:infoid meta:imei meta:timestamp') AS
    (sid:chararray, infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession' USING
    org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long,
    locid:chararray);
    infos = LOAD 'info.mde253811.preprod.**ubithere.com<
    http://info.mde253811.preprod.ubithere.com>'
    USING
    org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:**infoid')
    AS (infoid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = JOIN sessions BY infoid, infos BY infoid;
    dump sessions;

    (dumping the "infos" before the sessions seems to make it work)


    Any idea about this very irritating behavior ?
  • Vincent Barat at Jul 27, 2011 at 12:39 pm
    More info on this issue:

    1- I use PIG 0.8.1 and HBase 0.90.3 and Hadoop 0.20-append
    2- The issue can be reproduced with PIG trunk too

    The script:

    start_sessions = LOAD 'startSession.mde253811.preprod.ubithere.com'
    USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
    infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession.mde253811.preprod.ubithere.com'
    USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long,
    locid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = FILTER sessions BY end > start AND end - start < 86400000L;
    sessions = FOREACH sessions GENERATE start_sessions::sid, imei,
    start, end;
    sessions = LIMIT sessions 100;
    dump sessions;
    <output 1>
    dump sessions;
    <output 2>

    The issue:

    <output 1> is empty
    <output 2> is 100 lines

    I can reproduce the issue systematically.

    Please advice: this issue prevent me from moving to HBase 0.90.3 in
    production, as I need to upgrade to PIG 0.8.1 at the same time !
  • Vincent Barat at Jul 27, 2011 at 1:22 pm
    A precision: HBase classes of the PIG trunk cannot be compiled
    inside PIG 0.8.1, so I was enable to test if a fix was introduced in
    the last version of these classes.
    So 2- must not be taken into account

    Le 27/07/11 14:38, Vincent Barat a écrit :
    More info on this issue:

    1- I use PIG 0.8.1 and HBase 0.90.3 and Hadoop 0.20-append
    2- The issue can be reproduced with PIG trunk too

    The script:

    start_sessions = LOAD
    'startSession.mde253811.preprod.ubithere.com' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
    infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession.mde253811.preprod.ubithere.com'
    USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long,
    locid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = FILTER sessions BY end > start AND end - start <
    86400000L;
    sessions = FOREACH sessions GENERATE start_sessions::sid, imei,
    start, end;
    sessions = LIMIT sessions 100;
    dump sessions;
    <output 1>
    dump sessions;
    <output 2>

    The issue:

    <output 1> is empty
    <output 2> is 100 lines

    I can reproduce the issue systematically.

    Please advice: this issue prevent me from moving to HBase 0.90.3
    in production, as I need to upgrade to PIG 0.8.1 at the same time !
    --

    *Vincent BARAT, UBIKOD, CTO*


    [email protected] Mob +33 (0)6 15 41 15 18

    UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021
    Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89

    UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2
    99 65 69 13


    www.ubikod.com <http://www.ubikod.com/>@ubikod
    <http://twitter.com/ubikod>

    www.capptain.com <http://www.capptain.com/>@capptain_hq
    <http://twitter.com/capptain_hq>


    IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of
    UBIKOD S.A.R.L., all copyrights are reserved. The contents of this
    email and attachments are confidential and may be subject to legal
    privilege and/or protected by copyright. Copying or communicating
    any part of it to others is prohibited and may be unlawful. If you
    are not the intended recipient you must not use, copy, distribute or
    rely on this email and should please return it immediately or notify
    us by telephone. At present the integrity of email across the
    Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not
    accept liability for any claims arising as a result of the use of
    this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD
    S.A.R.L. may exercise any of its rights under relevant law, to
    monitor the content of all electronic communications. You should
    therefore be aware that this communication and any responses might
    have been monitored, and may be accessed by UBIKOD S.A.R.L. The
    views expressed in this document are that of the individual and may
    not necessarily constitute or imply its endorsement or
    recommendation by UBIKOD S.A.R.L. The content of this electronic
    mail may be subject to the confidentiality terms of a
    "Non-Disclosure Agreement" (NDA).
  • Vincent Barat at Jul 27, 2011 at 2:31 pm
    I built the pig trunk with hbase 0.90.3 client lib (ant
    -Dhbase.version=0.90.3) and the issue is still here.

    It makes me thing about an issue in the optimizer... Anyway the fact
    is that my request is not complex, so I wonder how such an issue can
    go through PIG test suite !

    Any help ?

    Le 27/07/11 14:38, Vincent Barat a écrit :
    More info on this issue:

    1- I use PIG 0.8.1 and HBase 0.90.3 and Hadoop 0.20-append
    2- The issue can be reproduced with PIG trunk too

    The script:

    start_sessions = LOAD
    'startSession.mde253811.preprod.ubithere.com' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
    infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession.mde253811.preprod.ubithere.com'
    USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long,
    locid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = FILTER sessions BY end > start AND end - start <
    86400000L;
    sessions = FOREACH sessions GENERATE start_sessions::sid, imei,
    start, end;
    sessions = LIMIT sessions 100;
    dump sessions;
    <output 1>
    dump sessions;
    <output 2>

    The issue:

    <output 1> is empty
    <output 2> is 100 lines

    I can reproduce the issue systematically.

    Please advice: this issue prevent me from moving to HBase 0.90.3
    in production, as I need to upgrade to PIG 0.8.1 at the same time !
  • Thejas Nair at Jul 27, 2011 at 5:44 pm
    I looked at the query plan for the query using explain, and it looks
    correct.
    As you said, this is a simple use case, I would be very surprised if
    there is a optimizer bug here.
    I suspect that something is wrong in loading the data from hbase. Are
    you able to get a simple load-store script working consistently ?

    Thanks,
    Thejas

    On 7/27/11 7:31 AM, Vincent Barat wrote:
    I built the pig trunk with hbase 0.90.3 client lib (ant
    -Dhbase.version=0.90.3) and the issue is still here.

    It makes me thing about an issue in the optimizer... Anyway the fact is
    that my request is not complex, so I wonder how such an issue can go
    through PIG test suite !

    Any help ?

    Le 27/07/11 14:38, Vincent Barat a écrit :
    More info on this issue:

    1- I use PIG 0.8.1 and HBase 0.90.3 and Hadoop 0.20-append
    2- The issue can be reproduced with PIG trunk too

    The script:

    start_sessions = LOAD 'startSession.mde253811.preprod.ubithere.com'
    USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
    infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession.mde253811.preprod.ubithere.com' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long,
    locid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = FILTER sessions BY end > start AND end - start < 86400000L;
    sessions = FOREACH sessions GENERATE start_sessions::sid, imei, start,
    end;
    sessions = LIMIT sessions 100;
    dump sessions;
    <output 1>
    dump sessions;
    <output 2>

    The issue:

    <output 1> is empty
    <output 2> is 100 lines

    I can reproduce the issue systematically.

    Please advice: this issue prevent me from moving to HBase 0.90.3 in
    production, as I need to upgrade to PIG 0.8.1 at the same time !
  • Vincent Barat at Jul 27, 2011 at 10:45 pm
    Yes: if I remove the FILTER or the JOIN clause, the loading of data
    works fine and consistently.
    I will do more testings, but yes, I suspect HBase loader to work
    incorrectly in my case...

    The same query works perfectly with HBase 0.20.6 and PIG 0.6.1.

    Le 27/07/11 19:43, Thejas Nair a écrit :
    I looked at the query plan for the query using explain, and it
    looks correct.
    As you said, this is a simple use case, I would be very surprised
    if there is a optimizer bug here.
    I suspect that something is wrong in loading the data from hbase.
    Are you able to get a simple load-store script working consistently ?

    Thanks,
    Thejas

    On 7/27/11 7:31 AM, Vincent Barat wrote:
    I built the pig trunk with hbase 0.90.3 client lib (ant
    -Dhbase.version=0.90.3) and the issue is still here.

    It makes me thing about an issue in the optimizer... Anyway the
    fact is
    that my request is not complex, so I wonder how such an issue can go
    through PIG test suite !

    Any help ?

    Le 27/07/11 14:38, Vincent Barat a écrit :
    More info on this issue:

    1- I use PIG 0.8.1 and HBase 0.90.3 and Hadoop 0.20-append
    2- The issue can be reproduced with PIG trunk too

    The script:

    start_sessions = LOAD 'startSession.mde253811.preprod.ubithere.com'
    USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
    infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession.mde253811.preprod.ubithere.com'
    USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long,
    locid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = FILTER sessions BY end > start AND end - start <
    86400000L;
    sessions = FOREACH sessions GENERATE start_sessions::sid, imei,
    start,
    end;
    sessions = LIMIT sessions 100;
    dump sessions;
    <output 1>
    dump sessions;
    <output 2>

    The issue:

    <output 1> is empty
    <output 2> is 100 lines

    I can reproduce the issue systematically.

    Please advice: this issue prevent me from moving to HBase 0.90.3 in
    production, as I need to upgrade to PIG 0.8.1 at the same time !
  • Raghu Angadi at Jul 27, 2011 at 2:59 pm
    Vincent,

    is the behavior random or the same each time?

    Couple of things to narrow it down..
    - attach the entire console output from PIG run when this happened.
    - only load start_sessions and end_sessions and store them..
    - load the data from tables from previous step and run the same pig
    command

    Consider filing a JIRA. it might be a better place to go into more details.

    -Raghu.
    On Wed, Jul 27, 2011 at 5:38 AM, Vincent Barat wrote:

    More info on this issue:

    1- I use PIG 0.8.1 and HBase 0.90.3 and Hadoop 0.20-append
    2- The issue can be reproduced with PIG trunk too

    The script:

    start_sessions = LOAD 'startSession.mde253811.**preprod.ubithere.com<http://startSession.mde253811.preprod.ubithere.com>'
    USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:sid
    meta:infoid meta:imei meta:timestamp') AS (sid:chararray, infoid:chararray,
    imei:chararray, start:long);
    end_sessions = LOAD 'endSession.mde253811.preprod.**ubithere.com<http://endSession.mde253811.preprod.ubithere.com>'
    USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long, locid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = FILTER sessions BY end > start AND end - start < 86400000L;
    sessions = FOREACH sessions GENERATE start_sessions::sid, imei, start, end;
    sessions = LIMIT sessions 100;
    dump sessions;
    <output 1>
    dump sessions;
    <output 2>

    The issue:

    <output 1> is empty
    <output 2> is 100 lines

    I can reproduce the issue systematically.

    Please advice: this issue prevent me from moving to HBase 0.90.3 in
    production, as I need to upgrade to PIG 0.8.1 at the same time !
  • Vincent Barat at Jul 27, 2011 at 10:51 pm
    The behavior is not random.
    The first dump is always empty, and the second always works.
    I will try what you ask, and if I have more details, I will create a
    JIRA issue.

    Thanks.

    Le 27/07/11 16:59, Raghu Angadi a écrit :
    Vincent,

    is the behavior random or the same each time?

    Couple of things to narrow it down..
    - attach the entire console output from PIG run when this happened.
    - only load start_sessions and end_sessions and store them..
    - load the data from tables from previous step and run the same pig
    command

    Consider filing a JIRA. it might be a better place to go into more details.

    -Raghu.

    On Wed, Jul 27, 2011 at 5:38 AM, Vincent Baratwrote:
    More info on this issue:

    1- I use PIG 0.8.1 and HBase 0.90.3 and Hadoop 0.20-append
    2- The issue can be reproduced with PIG trunk too

    The script:

    start_sessions = LOAD 'startSession.mde253811.**preprod.ubithere.com<http://startSession.mde253811.preprod.ubithere.com>'
    USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:sid
    meta:infoid meta:imei meta:timestamp') AS (sid:chararray, infoid:chararray,
    imei:chararray, start:long);
    end_sessions = LOAD 'endSession.mde253811.preprod.**ubithere.com<http://endSession.mde253811.preprod.ubithere.com>'
    USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long, locid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = FILTER sessions BY end> start AND end - start< 86400000L;
    sessions = FOREACH sessions GENERATE start_sessions::sid, imei, start, end;
    sessions = LIMIT sessions 100;
    dump sessions;
    <output 1>
    dump sessions;
    <output 2>

    The issue:

    <output 1> is empty
    <output 2> is 100 lines

    I can reproduce the issue systematically.

    Please advice: this issue prevent me from moving to HBase 0.90.3 in
    production, as I need to upgrade to PIG 0.8.1 at the same time !
  • Vincent Barat at Jul 28, 2011 at 8:26 am
    So, I've tried the exact same request but loading the data from HDFS
    files (using the regular Pig loader) : it works !

    Here is the request loading from HDFS:

    start_sessions = LOAD 'start_sessions' AS (sid:chararray,
    infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'end_sessions' AS (sid:chararray, end:long,
    locid:chararray);
    infos = LOAD 'infos' AS (infoid:chararray, network_type:chararray,
    network_subtype:chararray, locale:chararray, version_name:chararray,
    carrier_country:chararray, carrier_name:chararray,
    phone_manufacturer:chararray, phone_model:chararray,
    firmware_version:chararray, firmware_name:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = FILTER sessions BY end > start AND end - start < 86400000L;
    sessions = JOIN sessions BY infoid, infos BY infoid;
    sessions = LIMIT sessions 100;
    dump sessions;

    The same request loading from HBase don't work:

    start_sessions = LOAD 'startSession' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
    infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long,
    locid:chararray);
    infos = LOAD 'info' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid
    data:networkType data:networkSubtype data:locale
    data:applicationVersionName data:carrierCountry data:carrierName
    data:phoneManufacturer data:phoneModel data:firmwareVersion
    data:firmwareName') AS (infoid:chararray, network_type:chararray,
    network_subtype:chararray, locale:chararray, version_name:chararray,
    carrier_country:chararray, carrier_name:chararray,
    phone_manufacturer:chararray, phone_model:chararray,
    firmware_version:chararray, firmware_name:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = FILTER sessions BY end > start AND end - start < 86400000L;
    sessions = JOIN sessions BY infoid, infos BY infoid;
    sessions = LIMIT sessions 100;
    dump sessions;

    I guess it definitively means there is a nasty bug in the HBase loader.

    Here is the PIG dump for the non working request:

    aws09:~# pig
    2011-07-28 08:17:36,329 [main] INFO org.apache.pig.Main - Logging
    error messages to: /root/pig_1311841056328.log
    2011-07-28 08:17:36,641 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting to hadoop file system at:
    hdfs://aws09.preprod.ubithere.com:9000
    2011-07-28 08:17:36,923 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting to map-reduce job tracker at: aws09.preprod.ubithere.com:9001
    grunt> start_sessions = LOAD
    'startSession.mde253811.preprod.ubithere.com' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
    infoid:chararray, imei:chararray, start:long);
    grunt> end_sessions = LOAD
    'endSession.mde253811.preprod.ubithere.com' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long,
    locid:chararray);
    grunt> infos = LOAD 'info.mde253811.preprod.ubithere.com' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid
    data:networkType data:networkSubtype data:locale
    data:applicationVersionName data:carrierCountry data:carrierName
    data:phoneManufacturer data:phoneModel data:firmwareVersion
    data:firmwareName') AS (infoid:chararray, network_type:chararray,
    network_subtype:chararray, locale:chararray, version_name:chararray,
    carrier_country:chararray, carrier_name:chararray,
    phone_manufacturer:chararray, phone_model:chararray,
    firmware_version:chararray, firmware_name:chararray);
    grunt> sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    grunt> sessions = FILTER sessions BY end > start AND end - start <
    86400000L;
    grunt> sessions = JOIN sessions BY infoid, infos BY infoid;
    grunt> sessions = LIMIT sessions 100;
    grunt> dump sessions;
    2011-07-28 08:17:50,275 [main] INFO
    org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
    script: HASH_JOIN,FILTER,LIMIT
    2011-07-28 08:17:50,275 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    pig.usenewlogicalplan is set to true. New logical plan will be used.
    2011-07-28 08:17:51,213 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    (Name: sessions:
    Store(hdfs://aws09.preprod.ubithere.com:9000/tmp/temp-1404953096/tmp819396740:org.apache.pig.impl.io.InterStorage)
    - scope-93 Operator Key: scope-93)
    2011-07-28 08:17:51,225 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
    - File concatenation threshold: 100 optimistic? false
    2011-07-28 08:17:51,281 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2011-07-28 08:17:51,281 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2011-07-28 08:17:51,350 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 3
    2011-07-28 08:17:51,350 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 3
    2011-07-28 08:17:51,402 [main] INFO
    org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
    added to the job
    2011-07-28 08:17:51,411 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to
    default 0.3
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client environment:zookeeper.version=3.3.2-1031432, built on
    11/05/2010 05:32 GMT
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client environment:host.name=aws09.machine.com
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client environment:java.version=1.6.0_22
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client environment:java.vendor=Sun Microsystems Inc.
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.22/jre
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client
    environment:java.class.path=/opt/pig/bin/../conf:/usr/lib/jvm/java-6-sun/jre/lib/tools.jar:/opt/pig/bin/../pig-0.8.1-core.jar:/opt/pig/bin/../build/pig-*-SNAPSHOT.jar:/opt/pig/bin/../lib/commons-el-1.0.jar:/opt/pig/bin/../lib/commons-lang-2.4.jar:/opt/pig/bin/../lib/commons-logging-1.1.1.jar:/opt/pig/bin/../lib/guava-r06.jar:/opt/pig/bin/../lib/hbase-0.90.3.jar:/opt/pig/bin/../lib/hsqldb-1.8.0.10.jar:/opt/pig/bin/../lib/jackson-core-asl-1.0.1.jar:/opt/pig/bin/../lib/jackson-mapper-asl-1.0.1.jar:/opt/pig/bin/../lib/javacc-4.2.jar:/opt/pig/bin/../lib/javacc.jar:/opt/pig/bin/../lib/jetty-util-6.1.14.jar:/opt/pig/bin/../lib/jline-0.9.94.jar:/opt/pig/bin/../lib/joda-time-1.6.jar:/opt/pig/bin/../lib/jsch-0.1.38.jar:/opt/pig/bin/../lib/junit-4.5.jar:/opt/pig/bin/../lib/jython-2.5.0.jar:/opt/pig/bin/../lib/log4j-1.2.14.jar:/opt/pig/bin/../lib/pigudfs.jar:/opt/pig/bin/../lib/slf4j-log4j12-1.4.3.jar:/opt/pig/bin/../lib/zookeeper-3.3.2.jar:/opt/hadoop/conf_computation:/opt/hbase/conf:/opt/pig/lib/hadoop-0.20-append-core.jar
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client
    environment:java.library.path=/usr/lib/jvm/java-6-sun-1.6.0.22/jre/lib/amd64/server:/usr/lib/jvm/java-6-sun-1.6.0.22/jre/lib/amd64:/usr/lib/jvm/java-6-sun-1.6.0.22/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client environment:java.io.tmpdir=/tmp
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client environment:java.compiler=<NA>
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client environment:os.name=Linux
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client environment:os.arch=amd64
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client environment:os.version=2.6.21.7-2.fc8xen-ec2-v1.0
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client environment:user.name=root
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client environment:user.home=/root
    2011-07-28 08:17:51,470 [main] INFO org.apache.zookeeper.ZooKeeper
    - Client environment:user.dir=/root
    2011-07-28 08:17:51,471 [main] INFO org.apache.zookeeper.ZooKeeper
    - Initiating client connection, connectString=aws09.machine.com:2222
    sessionTimeout=60000 watcher=hconnection
    2011-07-28 08:17:51,493 [main-SendThread()] INFO
    org.apache.zookeeper.ClientCnxn - Opening socket connection to
    server aws09.machine.com/10.83.1.244:2222
    2011-07-28 08:17:51,499 [main-SendThread(aws09.machine.com:2222)]
    INFO org.apache.zookeeper.ClientCnxn - Socket connection
    established to aws09.machine.com/10.83.1.244:2222, initiating session
    2011-07-28 08:17:51,508 [main-SendThread(aws09.machine.com:2222)]
    INFO org.apache.zookeeper.ClientCnxn - Session establishment
    complete on server aws09.machine.com/10.83.1.244:2222, sessionid =
    0x131617dada6054b, negotiated timeout = 60000
    2011-07-28 08:17:51,575 [main] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Lookedup root region location,
    connection=org.apache.hadoop.hbase.client.HConnectionManager$[email protected];
    hsa=aws03.machine.com:60020
    2011-07-28 08:17:51,687 [main] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
    2011-07-28 08:17:51,696 [main] DEBUG
    org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
    starting at
    row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for
    max=10 rows
    2011-07-28 08:17:51,700 [main] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for
    endSession.mde253811.preprod.ubithere.com,,1311086199483.706685579
    is aws03.machine.com:60020
    2011-07-28 08:17:51,726 [main] DEBUG
    org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
    starting at
    row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for
    max=10 rows
    2011-07-28 08:17:51,729 [main] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for
    startSession.mde253811.preprod.ubithere.com,,1311086198252.1334391323 is
    aws03.machine.com:60020
    2011-07-28 08:17:53,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    2011-07-28 08:17:53,335 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=0
    2011-07-28 08:17:53,335 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Neither PARALLEL nor default parallelism is set for this job.
    Setting number of reducers to 1
    2011-07-28 08:17:53,442 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    2011-07-28 08:17:53,944 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2011-07-28 08:17:53,989 [Thread-13] INFO
    org.apache.zookeeper.ZooKeeper - Initiating client connection,
    connectString=aws09.machine.com:2222 sessionTimeout=60000
    watcher=hconnection
    2011-07-28 08:17:53,990 [Thread-13-SendThread()] INFO
    org.apache.zookeeper.ClientCnxn - Opening socket connection to
    server aws09.machine.com/10.83.1.244:2222
    2011-07-28 08:17:53,991
    [Thread-13-SendThread(aws09.machine.com:2222)] INFO
    org.apache.zookeeper.ClientCnxn - Socket connection established to
    aws09.machine.com/10.83.1.244:2222, initiating session
    2011-07-28 08:17:53,996
    [Thread-13-SendThread(aws09.machine.com:2222)] INFO
    org.apache.zookeeper.ClientCnxn - Session establishment complete on
    server aws09.machine.com/10.83.1.244:2222, sessionid =
    0x131617dada6054c, negotiated timeout = 60000
    2011-07-28 08:17:54,000 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Lookedup root region location,
    connection=org.apache.hadoop.hbase.client.HConnectionManager$[email protected];
    hsa=aws03.machine.com:60020
    2011-07-28 08:17:54,005 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
    2011-07-28 08:17:54,006 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
    starting at
    row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for
    max=10 rows
    2011-07-28 08:17:54,011 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for
    endSession.mde253811.preprod.ubithere.com,,1311086199483.706685579
    is aws03.machine.com:60020
    2011-07-28 08:17:54,017 [Thread-13] INFO
    org.apache.zookeeper.ZooKeeper - Initiating client connection,
    connectString=aws09.machine.com:2222 sessionTimeout=60000
    watcher=hconnection
    2011-07-28 08:17:54,017 [Thread-13-SendThread()] INFO
    org.apache.zookeeper.ClientCnxn - Opening socket connection to
    server aws09.machine.com/10.83.1.244:2222
    2011-07-28 08:17:54,018
    [Thread-13-SendThread(aws09.machine.com:2222)] INFO
    org.apache.zookeeper.ClientCnxn - Socket connection established to
    aws09.machine.com/10.83.1.244:2222, initiating session
    2011-07-28 08:17:54,025
    [Thread-13-SendThread(aws09.machine.com:2222)] INFO
    org.apache.zookeeper.ClientCnxn - Session establishment complete on
    server aws09.machine.com/10.83.1.244:2222, sessionid =
    0x131617dada6054d, negotiated timeout = 60000
    2011-07-28 08:17:54,029 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Lookedup root region location,
    connection=org.apache.hadoop.hbase.client.HConnectionManager$[email protected];
    hsa=aws03.machine.com:60020
    2011-07-28 08:17:54,032 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
    2011-07-28 08:17:54,033 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
    starting at
    row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for
    max=10 rows
    2011-07-28 08:17:54,037 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for
    endSession.mde253811.preprod.ubithere.com,,1311086199483.706685579
    is aws03.machine.com:60020
    2011-07-28 08:17:54,039 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
    starting at
    row=endSession.mde253811.preprod.ubithere.com,,00000000000000 for
    max=2147483647 rows
    2011-07-28 08:17:54,067 [Thread-13] DEBUG
    org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - getSplits:
    split -> 0 -> aws03.machine.com:,
    2011-07-28 08:17:54,068 [Thread-13] INFO
    org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Got 1
    splits.
    2011-07-28 08:17:54,068 [Thread-13] INFO
    org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat -
    Returning 1 splits.
    2011-07-28 08:17:54,109 [Thread-13] INFO
    org.apache.zookeeper.ZooKeeper - Initiating client connection,
    connectString=aws09.machine.com:2222 sessionTimeout=60000
    watcher=hconnection
    2011-07-28 08:17:54,110 [Thread-13-SendThread()] INFO
    org.apache.zookeeper.ClientCnxn - Opening socket connection to
    server aws09.machine.com/10.83.1.244:2222
    2011-07-28 08:17:54,111
    [Thread-13-SendThread(aws09.machine.com:2222)] INFO
    org.apache.zookeeper.ClientCnxn - Socket connection established to
    aws09.machine.com/10.83.1.244:2222, initiating session
    2011-07-28 08:17:54,119
    [Thread-13-SendThread(aws09.machine.com:2222)] INFO
    org.apache.zookeeper.ClientCnxn - Session establishment complete on
    server aws09.machine.com/10.83.1.244:2222, sessionid =
    0x131617dada6054e, negotiated timeout = 60000
    2011-07-28 08:17:54,123 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Lookedup root region location,
    connection=org.apache.hadoop.hbase.client.HConnectionManager$[email protected];
    hsa=aws03.machine.com:60020
    2011-07-28 08:17:54,140 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
    2011-07-28 08:17:54,142 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
    starting at
    row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for
    max=10 rows
    2011-07-28 08:17:54,148 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for
    startSession.mde253811.preprod.ubithere.com,,1311086198252.1334391323 is
    aws03.machine.com:60020
    2011-07-28 08:17:54,154 [Thread-13] INFO
    org.apache.zookeeper.ZooKeeper - Initiating client connection,
    connectString=aws09.machine.com:2222 sessionTimeout=60000
    watcher=hconnection
    2011-07-28 08:17:54,158 [Thread-13-SendThread()] INFO
    org.apache.zookeeper.ClientCnxn - Opening socket connection to
    server aws09.machine.com/10.83.1.244:2222
    2011-07-28 08:17:54,159
    [Thread-13-SendThread(aws09.machine.com:2222)] INFO
    org.apache.zookeeper.ClientCnxn - Socket connection established to
    aws09.machine.com/10.83.1.244:2222, initiating session
    2011-07-28 08:17:54,161
    [Thread-13-SendThread(aws09.machine.com:2222)] INFO
    org.apache.zookeeper.ClientCnxn - Session establishment complete on
    server aws09.machine.com/10.83.1.244:2222, sessionid =
    0x131617dada6054f, negotiated timeout = 60000
    2011-07-28 08:17:54,164 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Lookedup root region location,
    connection=org.apache.hadoop.hbase.client.HConnectionManager$[email protected];
    hsa=aws03.machine.com:60020
    2011-07-28 08:17:54,167 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
    2011-07-28 08:17:54,169 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
    starting at
    row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for
    max=10 rows
    2011-07-28 08:17:54,172 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for
    startSession.mde253811.preprod.ubithere.com,,1311086198252.1334391323 is
    aws03.machine.com:60020
    2011-07-28 08:17:54,173 [Thread-13] DEBUG
    org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
    starting at
    row=startSession.mde253811.preprod.ubithere.com,,00000000000000 for
    max=2147483647 rows
    2011-07-28 08:17:54,180 [Thread-13] DEBUG
    org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - getSplits:
    split -> 0 -> aws03.machine.com:,
    2011-07-28 08:17:54,180 [Thread-13] INFO
    org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Got 1
    splits.
    2011-07-28 08:17:54,180 [Thread-13] INFO
    org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat -
    Returning 1 splits.
    2011-07-28 08:17:55,037 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201107251336_0314
    2011-07-28 08:17:55,037 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at:
    http://aws09.preprod.ubithere.com:50030/jobdetails.jsp?jobid=job_201107251336_0314
    2011-07-28 08:19:06,924 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 8% complete
    2011-07-28 08:19:15,971 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 16% complete
    2011-07-28 08:19:18,985 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 19% complete
    2011-07-28 08:19:25,035 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 27% complete
    2011-07-28 08:20:14,810 [main] INFO
    org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
    added to the job
    2011-07-28 08:20:14,812 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to
    default 0.3
    2011-07-28 08:20:14,830 [main] INFO org.apache.zookeeper.ZooKeeper
    - Initiating client connection, connectString=aws09.machine.com:2222
    sessionTimeout=60000 watcher=hconnection
    2011-07-28 08:20:14,831 [main-SendThread()] INFO
    org.apache.zookeeper.ClientCnxn - Opening socket connection to
    server aws09.machine.com/10.83.1.244:2222
    2011-07-28 08:20:14,832 [main-SendThread(aws09.machine.com:2222)]
    INFO org.apache.zookeeper.ClientCnxn - Socket connection
    established to aws09.machine.com/10.83.1.244:2222, initiating session
    2011-07-28 08:20:14,838 [main-SendThread(aws09.machine.com:2222)]
    INFO org.apache.zookeeper.ClientCnxn - Session establishment
    complete on server aws09.machine.com/10.83.1.244:2222, sessionid =
    0x131617dada60556, negotiated timeout = 60000
    2011-07-28 08:20:14,842 [main] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Lookedup root region location,
    connection=org.apache.hadoop.hbase.client.HConnectionManager$[email protected];
    hsa=aws03.machine.com:60020
    2011-07-28 08:20:14,847 [main] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
    2011-07-28 08:20:14,849 [main] DEBUG
    org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
    starting at row=info.mde253811.preprod.ubithere.com,,00000000000000
    for max=10 rows
    2011-07-28 08:20:14,852 [main] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for
    info.mde253811.preprod.ubithere.com,,1311086202955.1975990008 is
    aws03.machine.com:60020
    2011-07-28 08:20:16,311 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    2011-07-28 08:20:16,324 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - BytesPerReducer=1000000000 maxReducers=999
    totalInputFileSize=198330658
    2011-07-28 08:20:16,324 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Neither PARALLEL nor default parallelism is set for this job.
    Setting number of reducers to 1
    2011-07-28 08:20:16,341 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    2011-07-28 08:20:16,656 [Thread-32] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
    paths to process : 1
    2011-07-28 08:20:16,656 [Thread-32] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process : 1
    2011-07-28 08:20:16,693 [Thread-32] INFO
    org.apache.zookeeper.ZooKeeper - Initiating client connection,
    connectString=aws09.machine.com:2222 sessionTimeout=60000
    watcher=hconnection
    2011-07-28 08:20:16,694 [Thread-32-SendThread()] INFO
    org.apache.zookeeper.ClientCnxn - Opening socket connection to
    server aws09.machine.com/10.83.1.244:2222
    2011-07-28 08:20:16,695
    [Thread-32-SendThread(aws09.machine.com:2222)] INFO
    org.apache.zookeeper.ClientCnxn - Socket connection established to
    aws09.machine.com/10.83.1.244:2222, initiating session
    2011-07-28 08:20:16,702
    [Thread-32-SendThread(aws09.machine.com:2222)] INFO
    org.apache.zookeeper.ClientCnxn - Session establishment complete on
    server aws09.machine.com/10.83.1.244:2222, sessionid =
    0x131617dada60557, negotiated timeout = 60000
    2011-07-28 08:20:16,705 [Thread-32] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Lookedup root region location,
    connection=org.apache.hadoop.hbase.client.HConnectionManager$[email protected];
    hsa=aws03.machine.com:60020
    2011-07-28 08:20:16,709 [Thread-32] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
    2011-07-28 08:20:16,710 [Thread-32] DEBUG
    org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
    starting at row=info.mde253811.preprod.ubithere.com,,00000000000000
    for max=10 rows
    2011-07-28 08:20:16,714 [Thread-32] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for
    info.mde253811.preprod.ubithere.com,,1311086202955.1975990008 is
    aws03.machine.com:60020
    2011-07-28 08:20:16,716 [Thread-32] INFO
    org.apache.zookeeper.ZooKeeper - Initiating client connection,
    connectString=aws09.machine.com:2222 sessionTimeout=60000
    watcher=hconnection
    2011-07-28 08:20:16,717 [Thread-32-SendThread()] INFO
    org.apache.zookeeper.ClientCnxn - Opening socket connection to
    server aws09.machine.com/10.83.1.244:2222
    2011-07-28 08:20:16,718
    [Thread-32-SendThread(aws09.machine.com:2222)] INFO
    org.apache.zookeeper.ClientCnxn - Socket connection established to
    aws09.machine.com/10.83.1.244:2222, initiating session
    2011-07-28 08:20:16,720
    [Thread-32-SendThread(aws09.machine.com:2222)] INFO
    org.apache.zookeeper.ClientCnxn - Session establishment complete on
    server aws09.machine.com/10.83.1.244:2222, sessionid =
    0x131617dada60558, negotiated timeout = 60000
    2011-07-28 08:20:16,723 [Thread-32] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Lookedup root region location,
    connection=org.apache.hadoop.hbase.client.HConnectionManager$[email protected];
    hsa=aws03.machine.com:60020
    2011-07-28 08:20:16,726 [Thread-32] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for .META.,,1.1028785192 is aws03.machine.com:60020
    2011-07-28 08:20:16,727 [Thread-32] DEBUG
    org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
    starting at row=info.mde253811.preprod.ubithere.com,,00000000000000
    for max=10 rows
    2011-07-28 08:20:16,730 [Thread-32] DEBUG
    org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
    - Cached location for
    info.mde253811.preprod.ubithere.com,,1311086202955.1975990008 is
    aws03.machine.com:60020
    2011-07-28 08:20:16,732 [Thread-32] DEBUG
    org.apache.hadoop.hbase.client.MetaScanner - Scanning .META.
    starting at row=info.mde253811.preprod.ubithere.com,,00000000000000
    for max=2147483647 rows
    2011-07-28 08:20:16,772 [Thread-32] DEBUG
    org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - getSplits:
    split -> 0 -> aws03.machine.com:,
    2011-07-28 08:20:16,772 [Thread-32] INFO
    org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat - Got 1
    splits.
    2011-07-28 08:20:16,772 [Thread-32] INFO
    org.apache.pig.backend.hadoop.hbase.HBaseTableInputFormat -
    Returning 1 splits.
    2011-07-28 08:20:17,500 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201107251336_0315
    2011-07-28 08:20:17,500 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at:
    http://aws09.preprod.ubithere.com:50030/jobdetails.jsp?jobid=job_201107251336_0315
    2011-07-28 08:20:28,075 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 37% complete
    2011-07-28 08:20:34,106 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 41% complete
    2011-07-28 08:20:37,124 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2011-07-28 08:20:46,168 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 51% complete
    2011-07-28 08:20:49,183 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 61% complete
    2011-07-28 08:20:52,198 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 62% complete
    2011-07-28 08:20:55,214 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 64% complete
    2011-07-28 08:21:01,244 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 66% complete
    2011-07-28 08:21:07,311 [main] INFO
    org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
    added to the job
    2011-07-28 08:21:07,312 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - mapred.job.reduce.markreset.buffer.percent is not set, set to
    default 0.3
    2011-07-28 08:21:08,770 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    2011-07-28 08:21:08,778 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 1 map-reduce job(s) waiting for submission.
    2011-07-28 08:21:08,910 [Thread-47] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
    paths to process : 1
    2011-07-28 08:21:08,910 [Thread-47] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process : 1
    2011-07-28 08:21:08,911 [Thread-47] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths (combined) to process : 1
    2011-07-28 08:21:09,280 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - HadoopJobId: job_201107251336_0316
    2011-07-28 08:21:09,280 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - More information at:
    http://aws09.preprod.ubithere.com:50030/jobdetails.jsp?jobid=job_201107251336_0316
    2011-07-28 08:21:16,321 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 83% complete
    2011-07-28 08:21:34,439 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    2011-07-28 08:21:34,441 [main] INFO
    org.apache.pig.tools.pigstats.PigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt
    Features
    0.20-append 0.8.1-SNAPSHOT root 2011-07-28 08:17:51
    2011-07-28 08:21:34 HASH_JOIN,FILTER,LIMIT

    Success!

    Job Stats (time in seconds):
    JobId Maps Reduces MaxMapTime MinMapTIme
    AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime
    Alias Feature Outputs
    job_201107251336_0314 2 1 75 66 70 63 63
    63 end_sessions,sessions,start_sessions HASH_JOIN
    job_201107251336_0315 4 1 15 6 12 24 24
    24 infos,sessions HASH_JOIN
    job_201107251336_0316 1 1 3 3 3 12 12 12

    hdfs://aws09.preprod.ubithere.com:9000/tmp/temp-1404953096/tmp819396740,

    Input(s):
    Successfully read 2069446 records from:
    "endSession.mde253811.preprod.ubithere.com"
    Successfully read 2072419 records from:
    "startSession.mde253811.preprod.ubithere.com"
    Successfully read 19441 records from:
    "info.mde253811.preprod.ubithere.com"

    Output(s):
    Successfully stored 0 records in:
    "hdfs://aws09.preprod.ubithere.com:9000/tmp/temp-1404953096/tmp819396740"

    Counters:
    Total records written : 0
    Total bytes written : 0
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 1
    Total records proactively spilled: 1944943

    Job DAG:
    job_201107251336_0314 -> job_201107251336_0315,
    job_201107251336_0315 -> job_201107251336_0316,
    job_201107251336_0316


    2011-07-28 08:21:34,472 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    2011-07-28 08:21:34,500 [main] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
    paths to process : 1
    2011-07-28 08:21:34,501 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total input paths to process : 1
    grunt>




    Le 27/07/11 16:59, Raghu Angadi a écrit :
    Vincent,

    is the behavior random or the same each time?

    Couple of things to narrow it down..
    - attach the entire console output from PIG run when this happened.
    - only load start_sessions and end_sessions and store them..
    - load the data from tables from previous step and run the same pig
    command

    Consider filing a JIRA. it might be a better place to go into more details.

    -Raghu.

    On Wed, Jul 27, 2011 at 5:38 AM, Vincent Baratwrote:
    More info on this issue:

    1- I use PIG 0.8.1 and HBase 0.90.3 and Hadoop 0.20-append
    2- The issue can be reproduced with PIG trunk too

    The script:

    start_sessions = LOAD 'startSession.mde253811.**preprod.ubithere.com<http://startSession.mde253811.preprod.ubithere.com>'
    USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:sid
    meta:infoid meta:imei meta:timestamp') AS (sid:chararray, infoid:chararray,
    imei:chararray, start:long);
    end_sessions = LOAD 'endSession.mde253811.preprod.**ubithere.com<http://endSession.mde253811.preprod.ubithere.com>'
    USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long, locid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = FILTER sessions BY end> start AND end - start< 86400000L;
    sessions = FOREACH sessions GENERATE start_sessions::sid, imei, start, end;
    sessions = LIMIT sessions 100;
    dump sessions;
    <output 1>
    dump sessions;
    <output 2>

    The issue:

    <output 1> is empty
    <output 2> is 100 lines

    I can reproduce the issue systematically.

    Please advice: this issue prevent me from moving to HBase 0.90.3 in
    production, as I need to upgrade to PIG 0.8.1 at the same time !
  • Vincent Barat at Jul 28, 2011 at 9:14 am
    I've reported the issue here:
    https://issues.apache.org/jira/browse/PIG-2193

    Still investigating, but seems so far that the FILTER clause makes
    the HBase loader loose all fields that are not explicitly used in
    the script

    I striped down the request to:

    start_sessions = LOAD 'startSession.mde253811.preprod.ubithere.com'
    USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
    infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession.mde253811.preprod.ubithere.com'
    USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long,
    locid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = FILTER sessions BY end > start AND end - start < 86400000L;
    dump sessions;

    and in the result, the fields "infoid", "imei" and "locid" are
    empty, whereas the fields "sid", "start", "stop" are present.

    (00000A2A33254B8FAE1E9AEAB2428EBE,,,1310649832970,00000A2A33254B8FAE1E9AEAB2428EBE,1310649838390,)
    (00001DCECDC842C0A745C151B9EC295F,,,1310628836846,00001DCECDC842C0A745C151B9EC295F,1310628839075,)
    (00001F8F2B3148D393963928188C72B6,,,1310681918742,00001F8F2B3148D393963928188C72B6,1310681949182,)
    ...

    When using the HDFS loader, everything works correctly:

    (00000A2A33254B8FAE1E9AEAB2428EBE,b87ac86bcf1d4cb44202aa826554a7b2,4e77d62e1839a470ec8386d42b85a076,1310649832970,00000A2A33254B8FAE1E9AEAB2428EBE,1310649838390,)
    (00001DCECDC842C0A745C151B9EC295F,4a4bb0fff26e368c8209f1e480fdf70b,db3924d2e4b88bd103fa19aaa30a9af4,1310628836846,00001DCECDC842C0A745C151B9EC295F,1310628839075,)
    (00001F8F2B3148D393963928188C72B6,5d7e58f68366b55d55862815f863a996,79e1ba90aa555e3e1041df4be657a11d,1310681918742,00001F8F2B3148D393963928188C72B6,1310681949182,)

    ...
  • Vincent Barat at Jul 27, 2011 at 8:36 am
    Hi,

    We are using HBase 0.90.3 and PIG 0.8.1.
    I will try trunk classes and report to you ASAP...

    Cheers,

    Le 26/07/11 20:16, Dmitriy Ryaboy a écrit :
    Vincent, can you try replacing the HBase classes with those from trunk?
    A couple of fixes went in that might address that.
    Also, make sure you are running 0.90.3

    D

    On Tue, Jul 26, 2011 at 10:40 AM, Vincent Baratwrote:
    Hi,

    I'm using PIG 0.8.1 with HBase 0.90 and the following script sometime
    returns an empty set, and sometimes work !


    start_sessions = LOAD 'startSession' USING org.apache.pig.backend.hadoop.*
    *hbase.HBaseStorage('meta:sid meta:infoid meta:imei meta:timestamp') AS
    (sid:chararray, infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession' USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long, locid:chararray);
    infos = LOAD 'info.mde253811.preprod.**ubithere.com<http://info.mde253811.preprod.ubithere.com>'
    USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:**infoid')
    AS (infoid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = JOIN sessions BY infoid, infos BY infoid;
    dump sessions;

    (dumping the "infos" before the sessions seems to make it work)


    Any idea about this very irritating behavior ?
    --

    *Vincent BARAT, UBIKOD, CTO*


    [email protected] Mob +33 (0)6 15 41 15 18

    UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021
    Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89

    UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel. +33 (0)2
    99 65 69 13


    www.ubikod.com <http://www.ubikod.com/>@ubikod
    <http://twitter.com/ubikod>

    www.capptain.com <http://www.capptain.com/>@capptain_hq
    <http://twitter.com/capptain_hq>


    IMPORTANT NOTICE -- UBIKOD and CAPPTAIN are registered trademarks of
    UBIKOD S.A.R.L., all copyrights are reserved. The contents of this
    email and attachments are confidential and may be subject to legal
    privilege and/or protected by copyright. Copying or communicating
    any part of it to others is prohibited and may be unlawful. If you
    are not the intended recipient you must not use, copy, distribute or
    rely on this email and should please return it immediately or notify
    us by telephone. At present the integrity of email across the
    Internet cannot be guaranteed. Therefore UBIKOD S.A.R.L. will not
    accept liability for any claims arising as a result of the use of
    this medium for transmissions by or to UBIKOD S.A.R.L.. UBIKOD
    S.A.R.L. may exercise any of its rights under relevant law, to
    monitor the content of all electronic communications. You should
    therefore be aware that this communication and any responses might
    have been monitored, and may be accessed by UBIKOD S.A.R.L. The
    views expressed in this document are that of the individual and may
    not necessarily constitute or imply its endorsement or
    recommendation by UBIKOD S.A.R.L. The content of this electronic
    mail may be subject to the confidentiality terms of a
    "Non-Disclosure Agreement" (NDA).
  • Vincent Barat at Aug 26, 2011 at 2:54 pm
    FYI, this was fixed by PIG-2193.

    Le 26/07/11 19:40, Vincent Barat a écrit :
    Hi,

    I'm using PIG 0.8.1 with HBase 0.90 and the following script
    sometime returns an empty set, and sometimes work !

    start_sessions = LOAD 'startSession' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
    infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long,
    locid:chararray);
    infos = LOAD 'info.mde253811.preprod.ubithere.com' USING
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid') AS
    (infoid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = JOIN sessions BY infoid, infos BY infoid;
    dump sessions;

    (dumping the "infos" before the sessions seems to make it work)

    Any idea about this very irritating behavior ?
  • Ashutosh Chauhan at Aug 26, 2011 at 4:16 pm
    Thanks Vincent for confirming that issue is resolved.

    Ashutosh
    On Fri, Aug 26, 2011 at 07:54, Vincent Barat wrote:

    FYI, this was fixed by PIG-2193.

    Le 26/07/11 19:40, Vincent Barat a écrit :

    Hi,
    I'm using PIG 0.8.1 with HBase 0.90 and the following script sometime
    returns an empty set, and sometimes work !

    start_sessions = LOAD 'startSession' USING org.apache.pig.backend.hadoop.
    **hbase.HBaseStorage('meta:sid meta:infoid meta:imei meta:timestamp') AS
    (sid:chararray, infoid:chararray, imei:chararray, start:long);
    end_sessions = LOAD 'endSession' USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:sid
    meta:timestamp meta:locid') AS (sid:chararray, end:long, locid:chararray);
    infos = LOAD 'info.mde253811.preprod.**ubithere.com<http://info.mde253811.preprod.ubithere.com>'
    USING org.apache.pig.backend.hadoop.**hbase.HBaseStorage('meta:**infoid')
    AS (infoid:chararray);
    sessions = JOIN start_sessions BY sid, end_sessions BY sid;
    sessions = JOIN sessions BY infoid, infos BY infoid;
    dump sessions;

    (dumping the "infos" before the sessions seems to make it work)

    Any idea about this very irritating behavior ?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJul 26, '11 at 5:40p
activeAug 26, '11 at 4:16p
posts17
users7
websitepig.apache.org

People

Translate

site design / logo © 2023 Grokbase