Grokbase Groups Pig user March 2013
FAQ
Hi!

I am using Pig 0.10.0 with Hbase in distributed mode to read the records
and I have used this command below.

fields = load 'hbase://documents' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:fields_j','-loadKey
true -limit 5') as (rowkey, fields:map[]);

I want pig to limit the records to only 5 but it is quite different. Please
see the logs below.

Input(s):
Successfully read 250 records (16520 bytes) from: "hbase://documents"

Output(s):
Successfully stored 250 records (19051 bytes) in:
"hdfs://LucidN1:50001/tmp/temp1510040776/tmp1443083789"

Counters:
Total records written : 250
Total bytes written : 19051
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201303121846_0056

2013-03-13 14:43:10,186 [main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 250 time(s).
2013-03-13 14:43:10,186 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
2013-03-13 14:43:10,210 [main] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 51
2013-03-13 14:43:10,211 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 51

Am I using the 'limit' keyword the wrong way ?

Please let me know your suggestions.

Thanks,

Search Discussions

  • Bill Graham at Mar 13, 2013 at 3:37 pm
    The -limit passed to HBaseStorage is the limit per mapper reading from
    HBase. If you want to limit overall records, also use LIMIT:

    fields = LIMIT fields 5;


    On Wed, Mar 13, 2013 at 7:48 AM, kiran chitturi
    wrote:
    Hi!

    I am using Pig 0.10.0 with Hbase in distributed mode to read the records
    and I have used this command below.

    fields = load 'hbase://documents' using
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:fields_j','-loadKey
    true -limit 5') as (rowkey, fields:map[]);

    I want pig to limit the records to only 5 but it is quite different. Please
    see the logs below.

    Input(s):
    Successfully read 250 records (16520 bytes) from: "hbase://documents"

    Output(s):
    Successfully stored 250 records (19051 bytes) in:
    "hdfs://LucidN1:50001/tmp/temp1510040776/tmp1443083789"

    Counters:
    Total records written : 250
    Total bytes written : 19051
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0
    Job DAG:
    job_201303121846_0056

    2013-03-13 14:43:10,186 [main] WARN
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 250 time(s).
    2013-03-13 14:43:10,186 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    2013-03-13 14:43:10,210 [main] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
    to process : 51
    2013-03-13 14:43:10,211 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
    input paths to process : 51

    Am I using the 'limit' keyword the wrong way ?

    Please let me know your suggestions.

    Thanks,
    --
    Kiran Chitturi

    <http://www.linkedin.com/in/kiranchitturi>


    --
    *Note that I'm no longer using my Yahoo! email address. Please email me at
    billgraham@gmail.com going forward.*
  • Kiran chitturi at Mar 13, 2013 at 4:18 pm
    Thank you. This cleared my doubt.

    On Wed, Mar 13, 2013 at 11:37 AM, Bill Graham wrote:

    The -limit passed to HBaseStorage is the limit per mapper reading from
    HBase. If you want to limit overall records, also use LIMIT:

    fields = LIMIT fields 5;


    On Wed, Mar 13, 2013 at 7:48 AM, kiran chitturi
    wrote:
    Hi!

    I am using Pig 0.10.0 with Hbase in distributed mode to read the records
    and I have used this command below.

    fields = load 'hbase://documents' using
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:fields_j','-loadKey
    true -limit 5') as (rowkey, fields:map[]);

    I want pig to limit the records to only 5 but it is quite different. Please
    see the logs below.

    Input(s):
    Successfully read 250 records (16520 bytes) from: "hbase://documents"

    Output(s):
    Successfully stored 250 records (19051 bytes) in:
    "hdfs://LucidN1:50001/tmp/temp1510040776/tmp1443083789"

    Counters:
    Total records written : 250
    Total bytes written : 19051
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0
    Job DAG:
    job_201303121846_0056

    2013-03-13 14:43:10,186 [main] WARN
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 250
    time(s).
    2013-03-13 14:43:10,186 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    2013-03-13 14:43:10,210 [main] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
    to process : 51
    2013-03-13 14:43:10,211 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
    input paths to process : 51

    Am I using the 'limit' keyword the wrong way ?

    Please let me know your suggestions.

    Thanks,
    --
    Kiran Chitturi

    <http://www.linkedin.com/in/kiranchitturi>


    --
    *Note that I'm no longer using my Yahoo! email address. Please email me at
    billgraham@gmail.com going forward.*


    --
    Kiran Chitturi

    <http://www.linkedin.com/in/kiranchitturi>
  • Dmitriy Ryaboy at Mar 15, 2013 at 1:51 am
    To explain what's going on:
    -limit for HBaseStorage limits the number of rows returned from *each
    region* in the hbase table. It's an optimization -- there is no way for the
    LIMIT operator to be pushed down to the loader, so you can do it explicitly
    if you know you only need a few rows and don't want to pull the rest from
    HBase just to drop them on the floor once they've been extracted and sent
    to your mappers.


    On Wed, Mar 13, 2013 at 9:17 AM, kiran chitturi
    wrote:
    Thank you. This cleared my doubt.

    On Wed, Mar 13, 2013 at 11:37 AM, Bill Graham wrote:

    The -limit passed to HBaseStorage is the limit per mapper reading from
    HBase. If you want to limit overall records, also use LIMIT:

    fields = LIMIT fields 5;


    On Wed, Mar 13, 2013 at 7:48 AM, kiran chitturi
    wrote:
    Hi!

    I am using Pig 0.10.0 with Hbase in distributed mode to read the
    records
    and I have used this command below.

    fields = load 'hbase://documents' using
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:fields_j','-loadKey
    true -limit 5') as (rowkey, fields:map[]);

    I want pig to limit the records to only 5 but it is quite different. Please
    see the logs below.

    Input(s):
    Successfully read 250 records (16520 bytes) from: "hbase://documents"

    Output(s):
    Successfully stored 250 records (19051 bytes) in:
    "hdfs://LucidN1:50001/tmp/temp1510040776/tmp1443083789"

    Counters:
    Total records written : 250
    Total bytes written : 19051
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0
    Job DAG:
    job_201303121846_0056

    2013-03-13 14:43:10,186 [main] WARN
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 250
    time(s).
    2013-03-13 14:43:10,186 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    2013-03-13 14:43:10,210 [main] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
    to process : 51
    2013-03-13 14:43:10,211 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total
    input paths to process : 51

    Am I using the 'limit' keyword the wrong way ?

    Please let me know your suggestions.

    Thanks,
    --
    Kiran Chitturi

    <http://www.linkedin.com/in/kiranchitturi>


    --
    *Note that I'm no longer using my Yahoo! email address. Please email me at
    billgraham@gmail.com going forward.*


    --
    Kiran Chitturi

    <http://www.linkedin.com/in/kiranchitturi>
  • Kiran chitturi at Mar 15, 2013 at 3:17 am
    Is this the good way to limit than using pig LIMIT like (fields = LIMIT
    fields 5;) since filtering is already done while loading ?

    Thanks,

    On Thu, Mar 14, 2013 at 9:50 PM, Dmitriy Ryaboy wrote:

    To explain what's going on:
    -limit for HBaseStorage limits the number of rows returned from *each
    region* in the hbase table. It's an optimization -- there is no way for the
    LIMIT operator to be pushed down to the loader, so you can do it explicitly
    if you know you only need a few rows and don't want to pull the rest from
    HBase just to drop them on the floor once they've been extracted and sent
    to your mappers.


    On Wed, Mar 13, 2013 at 9:17 AM, kiran chitturi
    wrote:
    Thank you. This cleared my doubt.


    On Wed, Mar 13, 2013 at 11:37 AM, Bill Graham <billgraham@gmail.com>
    wrote:
    The -limit passed to HBaseStorage is the limit per mapper reading from
    HBase. If you want to limit overall records, also use LIMIT:

    fields = LIMIT fields 5;


    On Wed, Mar 13, 2013 at 7:48 AM, kiran chitturi
    wrote:
    Hi!

    I am using Pig 0.10.0 with Hbase in distributed mode to read the
    records
    and I have used this command below.

    fields = load 'hbase://documents' using
    org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:fields_j','-loadKey
    true -limit 5') as (rowkey, fields:map[]);

    I want pig to limit the records to only 5 but it is quite different. Please
    see the logs below.

    Input(s):
    Successfully read 250 records (16520 bytes) from: "hbase://documents"

    Output(s):
    Successfully stored 250 records (19051 bytes) in:
    "hdfs://LucidN1:50001/tmp/temp1510040776/tmp1443083789"

    Counters:
    Total records written : 250
    Total bytes written : 19051
    Spillable Memory Manager spill count : 0
    Total bags proactively spilled: 0
    Total records proactively spilled: 0
    Job DAG:
    job_201303121846_0056

    2013-03-13 14:43:10,186 [main] WARN
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 250
    time(s).
    2013-03-13 14:43:10,186 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    2013-03-13 14:43:10,210 [main] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total
    input
    paths
    to process : 51
    2013-03-13 14:43:10,211 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
    Total
    input paths to process : 51

    Am I using the 'limit' keyword the wrong way ?

    Please let me know your suggestions.

    Thanks,
    --
    Kiran Chitturi

    <http://www.linkedin.com/in/kiranchitturi>


    --
    *Note that I'm no longer using my Yahoo! email address. Please email me at
    billgraham@gmail.com going forward.*


    --
    Kiran Chitturi

    <http://www.linkedin.com/in/kiranchitturi>


    --
    Kiran Chitturi

    <http://www.linkedin.com/in/kiranchitturi>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 13, '13 at 2:49p
activeMar 15, '13 at 3:17a
posts5
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase