-limit for HBaseStorage limits the number of rows returned from *each
region* in the hbase table. It's an optimization -- there is no way for the
LIMIT operator to be pushed down to the loader, so you can do it explicitly
if you know you only need a few rows and don't want to pull the rest from
HBase just to drop them on the floor once they've been extracted and sent
to your mappers.
On Wed, Mar 13, 2013 at 9:17 AM, kiran chitturi
wrote:
Thank you. This cleared my doubt.
org.apache.pig.backend.hadoop.hbase.HBaseStorage('field:fields_j','-loadKey
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
time(s).
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
Total
Am I using the 'limit' keyword the wrong way ?
Please let me know your suggestions.
Thanks,
--
Kiran Chitturi
<http://www.linkedin.com/in/kiranchitturi>
--
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*
--
Kiran Chitturi
<http://www.linkedin.com/in/kiranchitturi>
On Wed, Mar 13, 2013 at 11:37 AM, Bill Graham wrote:
The -limit passed to HBaseStorage is the limit per mapper reading from
HBase. If you want to limit overall records, also use LIMIT:
fields = LIMIT fields 5;
On Wed, Mar 13, 2013 at 7:48 AM, kiran chitturi
wrote:
recordsThe -limit passed to HBaseStorage is the limit per mapper reading from
HBase. If you want to limit overall records, also use LIMIT:
fields = LIMIT fields 5;
On Wed, Mar 13, 2013 at 7:48 AM, kiran chitturi
wrote:
Hi!
I am using Pig 0.10.0 with Hbase in distributed mode to read the
I am using Pig 0.10.0 with Hbase in distributed mode to read the
and I have used this command below.
fields = load 'hbase://documents' using
fields = load 'hbase://documents' using
true -limit 5') as (rowkey, fields:map[]);
I want pig to limit the records to only 5 but it is quite different. Please
see the logs below.
Input(s):
Successfully read 250 records (16520 bytes) from: "hbase://documents"
Output(s):
Successfully stored 250 records (19051 bytes) in:
"hdfs://LucidN1:50001/tmp/temp1510040776/tmp1443083789"
Counters:
I want pig to limit the records to only 5 but it is quite different. Please
see the logs below.
Input(s):
Successfully read 250 records (16520 bytes) from: "hbase://documents"
Output(s):
Successfully stored 250 records (19051 bytes) in:
"hdfs://LucidN1:50001/tmp/temp1510040776/tmp1443083789"
Counters:
Total records written : 250
Total bytes written : 19051
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201303121846_0056
2013-03-13 14:43:10,186 [main] WARN
Total bytes written : 19051
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201303121846_0056
2013-03-13 14:43:10,186 [main] WARN
- Encountered Warning FIELD_DISCARDED_TYPE_CONVERSION_FAILED 250
2013-03-13 14:43:10,186 [main] INFO
- Success!
2013-03-13 14:43:10,210 [main] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 51
2013-03-13 14:43:10,211 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
2013-03-13 14:43:10,210 [main] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 51
2013-03-13 14:43:10,211 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil -
input paths to process : 51
Am I using the 'limit' keyword the wrong way ?
Please let me know your suggestions.
Thanks,
--
Kiran Chitturi
<http://www.linkedin.com/in/kiranchitturi>
--
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*
--
Kiran Chitturi
<http://www.linkedin.com/in/kiranchitturi>