Grokbase Groups Pig user July 2009
FAQ
Hi,

I've imported data from a MySQL db thanks to sqoop. However when I try
to order this data on 2 fields it does not return the same answer as
MySQL does (which is the correct result)

Here is the code I use :

grunt> A = LOAD 'hdfs://hadoopM:54310/user/hadoop/rental' USING
PigStorage(',') AS (rental_id:int, rental_date:chararray,
inventory_id:int, customer_id:int);
grunt> B = ORDER A BY inventory_id DESC, customer_id ASC;
grunt> C = LIMIT B 20;
grunt> DUMP C;

Here is the result with Pig :

(132,2005-05-25 21:46:54.0,3367,479)
(263,2005-05-26 15:47:40.0,1160,449)
(324,2005-05-27 01:00:04.0,3364,292)
(359,2005-05-27 06:48:33.0,1156,152)
(582,2005-05-28 11:33:46.0,4579,198)
(711,2005-05-29 03:49:03.0,4581,215)
(809,2005-05-29 19:10:20.0,2114,222)
(927,2005-05-30 12:16:40.0,1158,167)
(1084,2005-05-31 11:10:17.0,4577,12)
(1341,2005-06-15 12:26:18.0,3363,344)
(1493,2005-06-15 21:50:32.0,4581,235)
(1537,2005-06-16 00:52:51.0,4577,594)
(1625,2005-06-16 07:49:08.0,3367,39)
(1729,2005-06-16 15:29:47.0,3364,523)
(1945,2005-06-17 07:51:26.0,3366,207)
(2137,2005-06-17 21:18:28.0,1158,581)
(2149,2005-06-17 22:50:00.0,3365,333)
(2321,2005-06-18 09:42:42.0,1160,565)
(2799,2005-06-19 19:15:21.0,4579,576)
(2806,2005-06-19 19:30:48.0,2114,510)

Here is the result with MySQL :

mysql> select rental_id, rental_date, inventory_id, customer_id from
rental order by inventory_id desc , customer_id asc limit 20

+-----------+---------------------+--------------+-------------+
rental_id | rental_date | inventory_id | customer_id |
+-----------+---------------------+--------------+-------------+
711 | 2005-05-29 03:49:03 | 4581 | 215 |
6712 | 2005-07-12 13:24:47 | 4581 | 226 |
1493 | 2005-06-15 21:50:32 | 4581 | 235 |
9701 | 2005-07-31 07:32:21 | 4581 | 401 |
12894 | 2005-08-19 03:49:28 | 4581 | 541 |
10479 | 2005-08-01 10:11:25 | 4580 | 275 |
15916 | 2005-08-23 17:56:01 | 4580 | 327 |
5274 | 2005-07-09 14:34:09 | 4579 | 108 |
582 | 2005-05-28 11:33:46 | 4579 | 198 |
12458 | 2005-08-18 11:22:53 | 4579 | 277 |
8289 | 2005-07-29 02:23:24 | 4579 | 459 |
2799 | 2005-06-19 19:15:21 | 4579 | 576 |
11453 | 2005-08-02 21:00:05 | 4578 | 84 |
12456 | 2005-08-18 11:21:51 | 4578 | 85 |
6664 | 2005-07-12 11:28:22 | 4578 | 351 |
1084 | 2005-05-31 11:10:17 | 4577 | 12 |
5972 | 2005-07-11 00:08:54 | 4577 | 30 |
12854 | 2005-08-19 02:18:51 | 4577 | 362 |
9644 | 2005-07-31 05:40:35 | 4577 | 441 |
1537 | 2005-06-16 00:52:51 | 4577 | 594 |
+-----------+---------------------+--------------+-------------+
20 rows in set (3.10 sec)

thanks

--
Cyril SCETBON

Search Discussions

  • Tamir Kamara at Jul 24, 2009 at 2:38 pm
    Hi,

    I seen this happen in past versions of pig. What version are you using?
    There were a few issues with order by followed by limit that were fixed in
    recent weeks:
    https://issues.apache.org/jira/browse/PIG-797
    https://issues.apache.org/jira/browse/PIG-876

    With latest SVN it should work as expected (it does for me). You can try it
    ...

    Tamir

    On Fri, Jul 24, 2009 at 10:32 AM, Cyril Scetbon wrote:

    Hi,

    I've imported data from a MySQL db thanks to sqoop. However when I try to
    order this data on 2 fields it does not return the same answer as MySQL does
    (which is the correct result)

    Here is the code I use :

    grunt> A = LOAD 'hdfs://hadoopM:54310/user/hadoop/rental' USING
    PigStorage(',') AS (rental_id:int, rental_date:chararray, inventory_id:int,
    customer_id:int);
    grunt> B = ORDER A BY inventory_id DESC, customer_id ASC;
    grunt> C = LIMIT B 20;
    grunt> DUMP C;

    Here is the result with Pig :

    (132,2005-05-25 21:46:54.0,3367,479)
    (263,2005-05-26 15:47:40.0,1160,449)
    (324,2005-05-27 01:00:04.0,3364,292)
    (359,2005-05-27 06:48:33.0,1156,152)
    (582,2005-05-28 11:33:46.0,4579,198)
    (711,2005-05-29 03:49:03.0,4581,215)
    (809,2005-05-29 19:10:20.0,2114,222)
    (927,2005-05-30 12:16:40.0,1158,167)
    (1084,2005-05-31 11:10:17.0,4577,12)
    (1341,2005-06-15 12:26:18.0,3363,344)
    (1493,2005-06-15 21:50:32.0,4581,235)
    (1537,2005-06-16 00:52:51.0,4577,594)
    (1625,2005-06-16 07:49:08.0,3367,39)
    (1729,2005-06-16 15:29:47.0,3364,523)
    (1945,2005-06-17 07:51:26.0,3366,207)
    (2137,2005-06-17 21:18:28.0,1158,581)
    (2149,2005-06-17 22:50:00.0,3365,333)
    (2321,2005-06-18 09:42:42.0,1160,565)
    (2799,2005-06-19 19:15:21.0,4579,576)
    (2806,2005-06-19 19:30:48.0,2114,510)

    Here is the result with MySQL :

    mysql> select rental_id, rental_date, inventory_id, customer_id from rental
    order by inventory_id desc , customer_id asc limit 20

    +-----------+---------------------+--------------+-------------+
    rental_id | rental_date | inventory_id | customer_id |
    +-----------+---------------------+--------------+-------------+
    711 | 2005-05-29 03:49:03 | 4581 | 215 |
    6712 | 2005-07-12 13:24:47 | 4581 | 226 |
    1493 | 2005-06-15 21:50:32 | 4581 | 235 |
    9701 | 2005-07-31 07:32:21 | 4581 | 401 |
    12894 | 2005-08-19 03:49:28 | 4581 | 541 |
    10479 | 2005-08-01 10:11:25 | 4580 | 275 |
    15916 | 2005-08-23 17:56:01 | 4580 | 327 |
    5274 | 2005-07-09 14:34:09 | 4579 | 108 |
    582 | 2005-05-28 11:33:46 | 4579 | 198 |
    12458 | 2005-08-18 11:22:53 | 4579 | 277 |
    8289 | 2005-07-29 02:23:24 | 4579 | 459 |
    2799 | 2005-06-19 19:15:21 | 4579 | 576 |
    11453 | 2005-08-02 21:00:05 | 4578 | 84 |
    12456 | 2005-08-18 11:21:51 | 4578 | 85 |
    6664 | 2005-07-12 11:28:22 | 4578 | 351 |
    1084 | 2005-05-31 11:10:17 | 4577 | 12 |
    5972 | 2005-07-11 00:08:54 | 4577 | 30 |
    12854 | 2005-08-19 02:18:51 | 4577 | 362 |
    9644 | 2005-07-31 05:40:35 | 4577 | 441 |
    1537 | 2005-06-16 00:52:51 | 4577 | 594 |
    +-----------+---------------------+--------------+-------------+
    20 rows in set (3.10 sec)

    thanks

    --
    Cyril SCETBON
  • Cyril Scetbon at Jul 28, 2009 at 10:01 am
    Hi,

    I'm using the version 0.2.0 provided by cloudera. Myabe I'll try the
    version from your svn
    Hi,

    I seen this happen in past versions of pig. What version are you using?
    There were a few issues with order by followed by limit that were fixed in
    recent weeks:
    https://issues.apache.org/jira/browse/PIG-797
    https://issues.apache.org/jira/browse/PIG-876

    With latest SVN it should work as expected (it does for me). You can try it
    ...

    Tamir


    On Fri, Jul 24, 2009 at 10:32 AM, Cyril Scetbon wrote:

    Hi,

    I've imported data from a MySQL db thanks to sqoop. However when I try to
    order this data on 2 fields it does not return the same answer as MySQL does
    (which is the correct result)

    Here is the code I use :

    grunt> A = LOAD 'hdfs://hadoopM:54310/user/hadoop/rental' USING
    PigStorage(',') AS (rental_id:int, rental_date:chararray, inventory_id:int,
    customer_id:int);
    grunt> B = ORDER A BY inventory_id DESC, customer_id ASC;
    grunt> C = LIMIT B 20;
    grunt> DUMP C;

    Here is the result with Pig :

    (132,2005-05-25 21:46:54.0,3367,479)
    (263,2005-05-26 15:47:40.0,1160,449)
    (324,2005-05-27 01:00:04.0,3364,292)
    (359,2005-05-27 06:48:33.0,1156,152)
    (582,2005-05-28 11:33:46.0,4579,198)
    (711,2005-05-29 03:49:03.0,4581,215)
    (809,2005-05-29 19:10:20.0,2114,222)
    (927,2005-05-30 12:16:40.0,1158,167)
    (1084,2005-05-31 11:10:17.0,4577,12)
    (1341,2005-06-15 12:26:18.0,3363,344)
    (1493,2005-06-15 21:50:32.0,4581,235)
    (1537,2005-06-16 00:52:51.0,4577,594)
    (1625,2005-06-16 07:49:08.0,3367,39)
    (1729,2005-06-16 15:29:47.0,3364,523)
    (1945,2005-06-17 07:51:26.0,3366,207)
    (2137,2005-06-17 21:18:28.0,1158,581)
    (2149,2005-06-17 22:50:00.0,3365,333)
    (2321,2005-06-18 09:42:42.0,1160,565)
    (2799,2005-06-19 19:15:21.0,4579,576)
    (2806,2005-06-19 19:30:48.0,2114,510)

    Here is the result with MySQL :

    mysql> select rental_id, rental_date, inventory_id, customer_id from rental
    order by inventory_id desc , customer_id asc limit 20

    +-----------+---------------------+--------------+-------------+
    rental_id | rental_date | inventory_id | customer_id |
    +-----------+---------------------+--------------+-------------+
    711 | 2005-05-29 03:49:03 | 4581 | 215 |
    6712 | 2005-07-12 13:24:47 | 4581 | 226 |
    1493 | 2005-06-15 21:50:32 | 4581 | 235 |
    9701 | 2005-07-31 07:32:21 | 4581 | 401 |
    12894 | 2005-08-19 03:49:28 | 4581 | 541 |
    10479 | 2005-08-01 10:11:25 | 4580 | 275 |
    15916 | 2005-08-23 17:56:01 | 4580 | 327 |
    5274 | 2005-07-09 14:34:09 | 4579 | 108 |
    582 | 2005-05-28 11:33:46 | 4579 | 198 |
    12458 | 2005-08-18 11:22:53 | 4579 | 277 |
    8289 | 2005-07-29 02:23:24 | 4579 | 459 |
    2799 | 2005-06-19 19:15:21 | 4579 | 576 |
    11453 | 2005-08-02 21:00:05 | 4578 | 84 |
    12456 | 2005-08-18 11:21:51 | 4578 | 85 |
    6664 | 2005-07-12 11:28:22 | 4578 | 351 |
    1084 | 2005-05-31 11:10:17 | 4577 | 12 |
    5972 | 2005-07-11 00:08:54 | 4577 | 30 |
    12854 | 2005-08-19 02:18:51 | 4577 | 362 |
    9644 | 2005-07-31 05:40:35 | 4577 | 441 |
    1537 | 2005-06-16 00:52:51 | 4577 | 594 |
    +-----------+---------------------+--------------+-------------+
    20 rows in set (3.10 sec)

    thanks

    --
    Cyril SCETBON


    --
    Cyril SCETBON

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJul 24, '09 at 2:18p
activeJul 28, '09 at 10:01a
posts3
users2
websitepig.apache.org

2 users in discussion

Cyril Scetbon: 2 posts Tamir Kamara: 1 post

People

Translate

site design / logo © 2021 Grokbase