Search Discussions
-
Hi, I have a pig script which does a simple GROUPing followed by couting and I get this error. My data is certaining not that big for it to cause this out of memory error. Is there a chance that this ...
Rohini U
Mar 21, 2012 at 7:34 pm
Mar 23, 2012 at 8:23 pm -
We currently have 100s of GB of uncompressed data which we would like to zip using some compression that is block compression so that we can use multiple input splits. Does pig support any such ...
Mohit Anchlia
Mar 28, 2012 at 4:45 pm
Apr 5, 2012 at 3:05 pm -
Hi guys, I use Pig to process some clickstream data. I need to track a new field, so I added a new field to my avro schema, and changed my Pig script accordingly. It works fine with the new files ...
IGZ Nick
Mar 28, 2012 at 8:22 pm
Apr 2, 2012 at 5:34 pm -
I need to put a small shared file on distributed cache so I can load it my udf in pig0.7. We are using Hadoop 0.20.2+228. I tried to run it using ...
Felix gao
Mar 17, 2012 at 12:32 am
Mar 20, 2012 at 12:58 am -
Hi I'm following a short tutorial from http://blog.whitepages.com/2011/10/27/hbase-storage-and-pig/ I have a running HBase cluster and Hadoop cluster. Steps I've performed: - prepared a sample input ...
Marcin Cylke
Mar 7, 2012 at 3:48 pm
Mar 9, 2012 at 3:07 pm -
I am running a script to load data in the database. When I use [0-4] I see 2 rows being created for every record that I process. But when I run them individually then it works. Could someone please ...
Mohit Anchlia
Mar 23, 2012 at 11:57 pm
Mar 28, 2012 at 5:04 am -
I am reading bunch of columns from a flat file and inserting it into the database. Is there a way to also insert date?
Mohit Anchlia
Mar 22, 2012 at 7:48 pm
Mar 23, 2012 at 1:04 am -
Hi all, I'm using pig with protobuf and I have some byte fields containing serialized protobuf data. Is it possible to handle this nested serialized data with pig? ex. message A { required bytes data ...
Benjamin Juhn
Mar 26, 2012 at 10:30 pm
Apr 3, 2012 at 11:47 pm -
Hello all, I'm trying to store a bag of tuples using AvroStorage but am not able to figure out what I'm doing wrong (or if it' supported). What I have is the following: grunt illustrate c; .... ... ...
Dan Young
Mar 25, 2012 at 4:36 am
Apr 3, 2012 at 6:21 pm -
Hello, I'm new to these lists. I'm trying to get Pig working, for my first time. I have setup Hadoop and HBase (on HDFS) using the psuedo-distributed setup, all on one machine. I am able to run ...
Ryan Cole
Mar 23, 2012 at 1:17 am
Mar 23, 2012 at 3:16 pm -
Hi, I need to initialize the HBase connection, which I normally do in configure() in the Mapper, and then my mapper uses it. How do I do it in Pig? I am ready to define a UDF that will return a ...
Mark Kerzner
Mar 7, 2012 at 1:02 am
Mar 9, 2012 at 2:31 am -
Hi, I'm loading a bunch of data into Pig using CassandraStorage. When I do a dump and/or store, the amount of data that is outputted is actually only 2-3% of the amount of data in Cassandra ...
Dan Feldman
Mar 29, 2012 at 6:25 pm
Apr 10, 2012 at 1:25 am -
Hey all, I'm trying to write a script to pull the count of a dataset that I've filtered. Here's the script so far: /* scans by title */ scans = LOAD '/hive/scans/*' USING PigStorage(',') AS ...
Jason Alexander
Mar 22, 2012 at 8:29 pm
Mar 22, 2012 at 9:46 pm -
https://jira.mongodb.org/browse/HADOOP-26 -- Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
Russell Jurney
Mar 2, 2012 at 1:20 am
Mar 2, 2012 at 10:24 pm -
Hello all, I used to run pig on the same node where the hadoop job tracker is running and everything was fine. Now I am trying to run pig on my laptop to access the cluster where hadoop is running ...
Iman E
Mar 28, 2012 at 5:19 pm
Mar 28, 2012 at 7:25 pm -
One record in a 125MB avro file is killing my script. I could patch AvroStorage() to catch the exception and return null after logging an error - I think. Should I? -- Russell Jurney ...
Russell Jurney
Mar 24, 2012 at 2:03 am
Mar 25, 2012 at 11:04 pm -
Hi All, I have a situation where I need to create a relation by a combination of UDF and parameter values. For example, first field will be generated by UDF UUIDGenerator, second field by parameter ...
Rakesh sharma
Mar 13, 2012 at 9:38 pm
Mar 14, 2012 at 2:35 am -
- dev@pig + user@pig What command are you using to run this? Are you upping the max heap? 2012/3/28 Herbert Mühlburger <herbert.muehlburger@gmail.com
Jonathan Coveney
Mar 28, 2012 at 4:28 pm
Mar 30, 2012 at 6:38 am -
In this - http://pig.apache.org/docs/r0.9.2/start.html#properties The following precedence order is supported: pig.properties -D Pig property -P properties file set command. This means that if the ...
赵新刚
Mar 27, 2012 at 6:44 pm
Mar 28, 2012 at 8:13 am -
I'm having a possible issue with a simple pig load that writes to an HBase table. The issue is that when I run the test pig script it does not invoke the region observer coprocessor on the table. I ...
Nick
Mar 23, 2012 at 5:54 am
Mar 28, 2012 at 5:05 am -
Hey guys, Continuing on in my Pig education, I'm trying to pivot my previous script to give me a break down of count by title. The script I have so far is: /* scans grouped by title */ scans = LOAD ...
Jason Alexander
Mar 26, 2012 at 5:39 pm
Mar 26, 2012 at 7:56 pm -
Pig users and developers, The Apache Pig PMCs is pleased to announce the new additions to Pig project: * Jonathan Coveney is now Apache Pig committer * Julien Le Dem is now Apache Pig PMC member ...
Daniel Dai
Mar 20, 2012 at 12:04 am
Mar 20, 2012 at 8:56 pm -
Hi all, I just test a very simple pig script as following: records = LOAD '$input' AS (hash:chararray, domain:chararray, host:chararray, page:chararray, freq:int); grpd = GROUP records BY (domain, ...
Yen SYU
Mar 13, 2012 at 7:00 pm
Mar 16, 2012 at 2:28 pm -
Dear All: this is the description of wiki about distinct: grunt A = load 'mydata' using PigStorage() as (a, b, c); grunt B = group A by a; grunt C = foreach B { D = distinct A.b; generate ...
Guoyun
Mar 6, 2012 at 3:20 am
Mar 16, 2012 at 2:03 am -
Hi, I am running a pig query on around 500 GB input data. The current block size is 128 MB and split size is the default 128 MB. I have also specified 16 reducers and around 3800 mappers are running. ...
Austin Chungath
Mar 13, 2012 at 12:25 pm
Mar 14, 2012 at 9:12 pm -
I am trying to process the output which has key in it from the map-reduce job. Is there a way I can ignore the key when I load data from that file? When I load data in the variable I don't want the ...
Mohit Anchlia
Mar 8, 2012 at 10:56 pm
Mar 13, 2012 at 1:12 pm -
Hi, We want want to do Linear regression analysis to achieve Interpolation for a set of values, using PIG Scripts. Do we have any in-built functions to achieve this, if not how to achieve. Thanks & ...
Chethan
Mar 12, 2012 at 7:22 am
Mar 13, 2012 at 5:39 am -
Hello, I think there is a bug in PIG when using COUNT on Bag of Tuple with empty element. Here is a minimal script to reproduce this bug : I've this CSV file : ,a 1,a 2,a ,a 3,b 4,b 5,b I use that ...
Kevin Lion
Mar 8, 2012 at 4:56 pm
Mar 9, 2012 at 5:59 am -
As I wanted to increment some counters in some UDFs I wrote, I came across http://squarecog.wordpress.com/2010/12/24/incrementing-hadoop-counters-in-apache-pig/ as THE answer which basically says I ...
Ahmed Sobhi
Mar 30, 2012 at 9:00 am
Mar 30, 2012 at 3:57 pm -
I know there has been lots of discussion on git going on. I've been wanting a place to stick useful UDFS that are pretty generic as well as nice place to share other people's work. I was thinking ...
Corbin Hoenes
Mar 26, 2012 at 11:16 pm
Mar 27, 2012 at 2:56 am -
In the relational database we have a large key, value type of data in 2 tables. Let’s call it Entity and EntityAttribute. Table: Entity Columns: Entity ID, Entity Type Table: EntityAttribute ...
Shan s
Mar 21, 2012 at 6:50 pm
Mar 23, 2012 at 3:38 pm -
Hi guys, Thanks again for your awesome hint about sqoop. I have another question: The data I'm working with is stored as AVRO Files in the Hadoop. When I try to glob them everything works just ...
Markus Resch
Mar 21, 2012 at 3:02 pm
Mar 22, 2012 at 2:13 am -
I wanted to share a deck that has some details regarding how we use Pig for one of the the projects at Salesforce. Essentially, we merged Force.com platform with Hadoop/Pig to generate very critical ...
Prashant Kommireddi
Mar 20, 2012 at 9:07 pm
Mar 21, 2012 at 6:35 am -
I want to read a small reference data file from a UDF. How do I make use of the distributed cache for this purpose ? Sam William sampd@stumbleupon.com
Sam William
Mar 9, 2012 at 11:18 pm
Mar 13, 2012 at 9:06 pm -
Hello, I am using: hadoop-0.20.2-cdh3u2, hbase-0.90.4-cdh3u3, pig-0.8.1-cdh3u3 I have successfully loaded data into HBase tables (implying my Hadoop & HBase setup is good). I can look at the data ...
Something Something
Mar 8, 2012 at 6:30 am
Mar 8, 2012 at 9:55 pm -
Hello All, I was wondering if there was a way for me to store the DESCRIBE on an alias in a file. Often we have many fields to store and we keep adding fields that we want to store, it would be great ...
Gayatri Rao
Mar 29, 2012 at 9:39 pm
Mar 29, 2012 at 11:10 pm -
In this page :http://pig.apache.org/docs/r0.9.2/basic.html#arithmetic Xingang 2012/3/28 This email (including any attachments) is confidential and may be legally privileged. If you received this ...
赵新刚
Mar 28, 2012 at 4:01 pm
Mar 28, 2012 at 4:29 pm -
Hi, There is a trivial issue with PigStats (during HASHJOIN), it does not print correct record count. My job does a LEFT OUTER join operation and hence the row count with input B should match output ...
Subir S
Mar 27, 2012 at 10:45 am
Mar 28, 2012 at 6:44 am -
Hi All, I have a statement like this: -- A is omitted, loads dataB = FOREACH A GENERATE FLATTEN(data1.b.v) as dataPoint1, FLATTEN(data2.b.v) as dataPoint2;C = FILTER B BY dataPoint1 == ...
Michael Moore
Mar 19, 2012 at 7:49 pm
Mar 27, 2012 at 3:30 am -
Folks -- how are folks handling the "productionalization" of their Pig submit nodes? For our PROD environment, I originally thought we'd just have a few VMs from which Pig jobs would be submitted ...
Norbert Burger
Mar 21, 2012 at 1:50 pm
Mar 21, 2012 at 2:55 pm -
Hi, Can write UDF with overrides LOAD SimpleTextLoader without mapreduce, I am bit confused with the use of mapreduce, because i am not able to get the flow of the LOAD SimpleTextLoader when the ...
Chethan
Mar 12, 2012 at 11:51 am
Mar 16, 2012 at 7:23 am -
Hi all. I'm trying to write a simple filter function (to be used with the FILTER operator) in python, but I don't seem to find the right way to specify its schema. I'm using pig 0.9.2. The filter's ...
Marco Cova
Mar 15, 2012 at 11:03 pm
Mar 16, 2012 at 7:08 am -
Hi Folks, I'm currently working on a framework that's going to do some awesome graphing stuff grabbing data out using Pig. What I'm wondering is, is there any way I can put embedded pig in a module ...
Eli Finkelshteyn
Mar 14, 2012 at 6:16 am
Mar 14, 2012 at 3:12 pm -
I tried to return a Set<String from my UDF, but it seems to give some problems. what are the allowed return data types in UDF? is it constrained to those in the "Pig Types" section in ...
Yang
Mar 12, 2012 at 6:16 pm
Mar 12, 2012 at 8:27 pm -
I tried to subscribe but a mail client box came up, not what I wanted, so we'll see if this works. I wrote this script: register s3n://uw-cse344-code/myudfs.jar -- load the test file into Pig --raw = ...
Colleen Ross
Mar 10, 2012 at 3:49 pm
Mar 11, 2012 at 4:58 am -
I have "set mapred.map.tasks 5" in the pig job and still I am seeing around 214 map tasks and around 30 actively running jobs. I was expecting only 5 map tasks. My cluster has 5 nodes.
Mohit Anchlia
Mar 10, 2012 at 12:39 am
Mar 10, 2012 at 1:16 am -
Hi, I have a UDF that parses a line and then return a bag, and sometimes the line is bad so I'm returning null in the UDF. In my pig script, I'd like to filter those nulls like this: raw = LOAD ...
Dexin Wang
Mar 2, 2012 at 12:46 am
Mar 7, 2012 at 11:08 pm -
Hi Can I see the user-payload for the MapReduce job that is created by Pig. How? i.e. the Map and Reduce function code that is generated by Pig script.. Thanks,
Shan shan
Mar 6, 2012 at 1:28 pm
Mar 6, 2012 at 6:01 pm -
Hello everyone, Is Pig capable of appending to a file? I know that an exception is thrown when a file exists using PigStorage, but is there a way to get around this? Thanks, Daan.
Daan Gerits
Mar 5, 2012 at 9:34 am
Mar 5, 2012 at 6:14 pm -
I've created a vim snipmate plugin for PigLatin which saves me a lot of time developing Pig jobs. For those unfamiliar: Snipmate is a Vim plugin for code completion. I've made a small writeup here ...
Rob Verkuylen
Mar 3, 2012 at 1:06 am
Mar 3, 2012 at 9:23 am
Group Overview
group | user |
categories | pig, hadoop |
discussions | 86 |
posts | 341 |
users | 77 |
website | pig.apache.org |
77 users for March 2012
Archives
- May 2013 (92)
- April 2013 (226)
- March 2013 (362)
- February 2013 (192)
- January 2013 (166)
- December 2012 (115)
- November 2012 (223)
- October 2012 (249)
- September 2012 (275)
- August 2012 (249)
- July 2012 (219)
- June 2012 (371)
- May 2012 (281)
- April 2012 (377)
- March 2012 (341)
- February 2012 (323)
- January 2012 (364)
- December 2011 (266)
- November 2011 (234)
- October 2011 (207)
- September 2011 (321)
- August 2011 (271)
- July 2011 (253)
- June 2011 (249)
- May 2011 (239)
- April 2011 (341)
- March 2011 (321)
- February 2011 (276)
- January 2011 (320)
- December 2010 (244)
- November 2010 (136)
- October 2010 (251)
- September 2010 (161)
- August 2010 (201)
- July 2010 (198)
- June 2010 (171)
- May 2010 (205)
- April 2010 (192)
- March 2010 (237)
- February 2010 (192)
- January 2010 (182)
- December 2009 (106)
- November 2009 (169)
- October 2009 (105)
- September 2009 (134)
- August 2009 (108)
- July 2009 (140)
- June 2009 (151)
- May 2009 (150)
- April 2009 (133)
- March 2009 (124)
- February 2009 (119)
- January 2009 (66)
- December 2008 (45)
- November 2008 (80)
- October 2008 (102)
- September 2008 (112)
- August 2008 (32)
- July 2008 (46)
- June 2008 (78)
- May 2008 (79)
- April 2008 (26)
- March 2008 (42)
- February 2008 (30)
- January 2008 (15)
- December 2007 (31)
- November 2007 (13)
- October 2007 (9)