Grokbase Groups Pig user March 2013
FAQ
To explain what's going on:
-limit for HBaseStorage limits the number of rows returned from *each
region* in the hbase table. It's an optimization -- there is no way for the
LIMIT operator to be pushed down to the loader, so you can do it explicitly
if you know you only need a few rows and don't want to pull the rest from
HBase just to drop them on the floor once they've been extracted and sent
to your mappers.


On Wed, Mar 13, 2013 at 9:17 AM, kiran chitturi
wrote:
Thank you. This cleared my doubt.

On Wed, Mar 13, 2013 at 11:37 AM, Bill Graham wrote:

The -limit passed to HBaseStorage is the limit per mapper reading from
HBase. If you want to limit overall records, also use LIMIT:

fields = LIMIT fields 5;


On Wed, Mar 13, 2013 at 7:48 AM, kiran chitturi
wrote:
Hi!

I am using Pig 0.10.0 with Hbase in distributed mode to read the
records
and I have used this command below.

fields = load 'hbase://documents' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:fields_j','-loadKey
true -limit 5') as (rowkey, fields:map[]);

I want pig to limit the records to only 5 but it is quite different. Please
see the logs below.

Input(s):
Successfully read 250 records (16520 bytes) from: "hbase://documents"

Output(s):
Successfully stored 250 records (19051 bytes) in:
"hdfs://LucidN1:50001/tmp/temp1510040776/tmp1443083789"

Counters:
Total records written : 250
Total bytes written : 19051
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201303121846_0056

2013-03-13 14:43:10,186 [main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 250
time(s).
2013-03-13 14:43:10,186 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
2013-03-13 14:43:10,210 [main] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 51
2013-03-13 14:43:10,211 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
Total
input paths to process : 51

Am I using the 'limit' keyword the wrong way ?

Please let me know your suggestions.

Thanks,
--
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>


--
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*


--
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 5 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 13, '13 at 2:49p
activeMar 15, '13 at 3:17a
posts5
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase