Search Discussions
-
Hi folks, The Analytics Infra team at Twitter will be hosting a Pig hackday on May 11. On the agenda: - get newcomers set up with the apache ticket process - review and commit a bunch of stuff that's ...
Dmitriy Ryaboy
Apr 18, 2012 at 9:18 pm
May 4, 2012 at 7:55 pm -
Hi Folks, I'm currently trying to do something I figured would be trivial, but actually wound up being a bit of work for me, so I'm wondering if I'm missing something. All I want to do is get a cross ...
Eli Finkelshteyn
Apr 4, 2012 at 6:19 pm
Apr 7, 2012 at 11:28 pm -
Does HBaseStorage work with HBase 0.95? This code was working with HBase 0.92 and Pig 0.9 but fails on HBase 0.95 and Pig 0.11 (built from source): register ...
Royston Sellman
Apr 19, 2012 at 8:48 pm
May 4, 2012 at 4:24 pm -
Hi, Sorry, this is a PIG newbie question... When I use FLATTEN, I don't understand the structure of the relation that is returned. For example, the following relation A is the result of using ...
James Newhaven
Apr 11, 2012 at 1:46 pm
Apr 13, 2012 at 9:52 am -
Is there a way I can just unit test my pig UDF? What's the best way to unit test in pig. I saw pigunittest but couldn't find a way to unit test udf.
Mohit Anchlia
Apr 20, 2012 at 12:05 am
Apr 24, 2012 at 10:46 pm -
I am able to write with Snappy compression. But I don't think pig provides anything to read such records. Can someone suggest or point me to relevant code that might help me write LoadFunc for it?
Mohit Anchlia
Apr 26, 2012 at 7:32 pm
May 1, 2012 at 12:39 am -
I have a lot of pig stuff like this: /* Load Avro jars and define shortcut */ register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar register ...
Russell Jurney
Apr 28, 2012 at 11:22 pm
Apr 29, 2012 at 11:25 pm -
Hi all, On behalf of the Pig PMC, I'm very happy to announce that Bill Graham has been invited to become a Pig committer. Bill's been involved in the Pig project for a long time now, and has made a ...
Dmitriy Ryaboy
Apr 5, 2012 at 9:55 pm
Apr 6, 2012 at 2:37 pm -
Given data: (1, 55, abc) (2, 23, asd) (1, 85, xyz) (1, 2, aaa) I would like to group on $0 and then have my grouped tuple be ordered by $1. Is this possible? The output should look like this: (1, ...
Chan, Tim
Apr 16, 2012 at 8:31 pm
Apr 17, 2012 at 10:53 am -
Is it possible to use DBStorage to load data from MySQL by running a suppled SQL query? Something like: mydata = LOAD 'jdbc://localhost/enron' USING DBStorage('SELECT foo.value1, bar.value2 FROM foo ...
Russell Jurney
Apr 28, 2012 at 7:22 am
Apr 29, 2012 at 8:14 pm -
I am new to pig and I have gone through the reference. I am getting used to how this works but I keep getting questions as I write my scripts. I have couple of questions: i) I use FILTER with ...
Mohit Anchlia
Apr 12, 2012 at 12:28 am
Apr 16, 2012 at 2:57 pm -
I am trying to get distinct from 2 fields in a record. something like select distinct a, b from c; So I wrote this in pig which is actually not working. I did: A = LOAD ...
Mohit Anchlia
Apr 11, 2012 at 8:53 pm
Apr 12, 2012 at 3:02 pm -
Hi, I have a really large data set of about 10 to 15 billion rows. I wanted to do some aggregates like sum, count distinct, max etc but this is taking forever to run the script. What hints or ...
Sonia gehlot
Apr 3, 2012 at 12:28 am
Apr 3, 2012 at 7:19 pm -
Hi Guys, Has anyone used any tools to profile a pig query ? or Can anyone guide me on 'How to profile a pig query". I am trying to figure out the CPU, disk I/O, RAM usage. I have tried Starfish but ...
Atul Thapliyal
Apr 12, 2012 at 4:04 pm
May 13, 2012 at 2:35 am -
Hi, I'm storing data into a partitioned table using Hive in RCFile format, but I want to use Pig to do the aggregation of that data. In my array <string in Hive, I have colon delimited data, E.g ...
Malcolm Tye
Apr 5, 2012 at 12:59 pm
May 3, 2012 at 12:30 pm -
Sorry for the previous incomplete message. Here is the take 2: When I use a Replicated Join only 2 map tasks get scheduled (compared to 100+ tasks for the other steps) What is the idea behind this? ...
Shan s
Apr 30, 2012 at 2:55 pm
May 2, 2012 at 9:31 pm -
I am writing unit test but I had a doubt. My understanding is that complete record is a tuple. So record "a b {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx ...
Mohit Anchlia
Apr 20, 2012 at 11:23 pm
Apr 23, 2012 at 9:00 am -
Hey all, Sorry if i sound naive, but how should one implement outputSchema of an eval Func that returns tuple. The way i do it is , public Schema outputSchema(Schema input) { List<FieldSchema list = ...
Rajgopal Vaithiyanathan
Apr 19, 2012 at 2:03 am
Apr 20, 2012 at 7:05 am -
Dear All, I have 2 data dumps (comma separated) each with around 53,000 records ( just sample data. it could be 10times more than this in real time). I need to write a script to - 1. find matching ...
Sarath
Apr 7, 2012 at 9:39 am
Apr 19, 2012 at 3:28 pm -
I am currently getting “Type mismatch in key from map: expected org.apache.pig.impl.io.NullableBytesWritable, recieved org.apache.pig.impl.io.NullableText “ I looked up the PIG-919 and related ...
Shan s
Apr 10, 2012 at 7:03 pm
Apr 11, 2012 at 3:25 pm -
Hi all my date is of the form YYYY-MM-DD HH:MM:SS.XXX How to find week for this ( 1-52 year week for given date) , any UDF for this ? Thanks and Regards ,
Shin Chan
Apr 20, 2012 at 6:09 am
Apr 23, 2012 at 2:06 am -
Hello everyone, new pig user here. I've being toying around with pig for a while now, and I wanted to create my first UDF in Python. It all went fine until I wanted to use RegExps. I tried importing ...
Fernando Doglio
Apr 19, 2012 at 7:30 pm
Apr 19, 2012 at 8:59 pm -
Hi I have data something like f1,f2,f3,f4,f5 Rows with 5 fields I have to produce final dump output as f1,f2,f3, SUM( all fields at f4 position) , COUNT ( number of fields at f5 position ) , f4 , f5 ...
Shin Chan
Apr 15, 2012 at 5:44 am
Apr 16, 2012 at 7:39 am -
Where will the outputSchema be executed? in the client or as a mapreduce ? I've planned to keep the output schema as an XML and let the outputSchema method read it and generate the Schema object with ...
Rajgopal Vaithiyanathan
Apr 13, 2012 at 8:32 am
Apr 13, 2012 at 4:18 pm -
Hi all, How to replace some value in string at particular location For example abcd Replace values from index 0-1 with mn mncd as output Any built in UDF or i should write own UDF?. I checked ...
Shin Chan
Apr 13, 2012 at 12:18 pm
Apr 13, 2012 at 4:07 pm -
Hi, I need to divide a large bag into 10 smaller bags of equal size. Does anyone know of a function that can do this easily? I've had a look at the standard functions and the PiggyBank and can't find ...
James Newhaven
Apr 11, 2012 at 3:53 pm
Apr 11, 2012 at 7:57 pm -
Hi, I am trying to a limit the output size using LIMIT. I want to the limit size to be 5 percent of the total output size like this: -- Put all the inids in a bag so we can count them. G = GROUP F ...
James Newhaven
Apr 10, 2012 at 8:33 pm
Apr 10, 2012 at 10:14 pm -
Hi, I'm using pig 0.9.2 on cdh3u3 with a snapshot-build of elephant bird in order to get json parsing. I have an incredibly unusual error that I see with certain gzip compressed files. It's probably ...
Joe Crobak
Apr 5, 2012 at 3:44 pm
Apr 9, 2012 at 9:33 pm -
Hi, I'm having some challenges with a load function. It only seems to work with a void constructor. The Java code has a void constructor and a String constructor, much like the SimpleTextLoader ...
Walker, Alan
Apr 6, 2012 at 8:39 pm
Apr 9, 2012 at 6:00 pm -
Hi All I am trying to use param file to import certain regex variables into Pig I have file written something like isoDate = '[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z' When i try to use ...
Shin Chan
Apr 4, 2012 at 1:52 pm
Apr 8, 2012 at 6:22 pm -
Hi , Have anyone used the XMLloader to parse an XML file, if so then can you please share few lines of your scripts. I tried using the example given by pig.apache.org but not sure how to use it ...
krishnan N
Apr 17, 2012 at 10:43 pm
May 16, 2012 at 3:16 pm -
Hi, I'm using pig 0.9.2 with the JsonLoader included in elephant-bird 2.2.2 to process geojson data(Flickr shapefiles: http://code.flickr.com/blog/2011/01/08/flickr-shapefiles-public-dataset-2-0/ ) ...
Fabio Souto Moure
Apr 19, 2012 at 2:35 am
Apr 19, 2012 at 2:33 pm -
Hey everyone, I have a new question about how to handle a very common issue the best: We have a LOAD statement loading AVRO files using globbing by a given regex. By some wired reason this might ...
Markus Resch
Apr 10, 2012 at 3:58 pm
Apr 11, 2012 at 5:39 pm -
Hello! I wonder if Pig can be use in this way:I have many records,each record content name and age,now I want to find out one person's age.I know it is easy to make it in Pig Latin, but I just wonder ...
凯氏图腾
Apr 4, 2012 at 1:52 pm
Apr 4, 2012 at 5:24 pm -
Hi, I have set the PIG_CALSSPATH and CLASS_PATH pointing to the location of jython jar file and my python program is in the same location as my pig script. But I am encountering the following error ...
Kumar palaniappan
Apr 19, 2012 at 8:31 pm
Apr 30, 2012 at 11:25 pm -
The documentation page called "Pig Latin Basics" at http://pig.apache.org/docs/r0.9.2/basic.html has three occurrences of "INNER" in upper case. From these three occurrences it seems that INNER is a ...
Fred zemke
Apr 25, 2012 at 7:21 pm
Apr 26, 2012 at 7:37 pm -
Does pig ever share same instance of object concurrently at run time? Or does it create a new instance for every invocation? I wrote UDF with a public data member (not static) but I wonder if that is ...
Mohit Anchlia
Apr 24, 2012 at 11:06 pm
Apr 25, 2012 at 7:20 am -
I have a file on HDFS with a reduced block size. I created this overriding the dfs.block.size param on the hadoop fs -put command . hadoop fsck shows that this file has 15 blocks (as opposed to the ...
Sam William
Apr 23, 2012 at 11:30 pm
Apr 24, 2012 at 4:17 am -
Hi All, I am pretty new to pig and am having some issues with dereferencing. My data in simplified form looks like below data = load 'visitevent' using PigStorage() AS (visit:tuple(visitorid, ...
Mustafi, Priyo
Apr 23, 2012 at 7:05 pm
Apr 23, 2012 at 8:50 pm -
I am simply reading 1.5 gb file with 2458220 records, and storing it back. I am getting Java Heap space error The current setting is mapred.child.java.opts -Xmx1073741824 Below is the error from ...
Shan s
Apr 20, 2012 at 8:19 pm
Apr 22, 2012 at 10:50 pm -
Hello All If i give following variable from command line it does not work -param TIMESTAMP = 'date +%c' It gives error java.lang.RuntimeException: Encountered unexpected arguments on command line - ...
Shin Chan
Apr 19, 2012 at 9:40 am
Apr 20, 2012 at 6:43 am -
Been looking around for this, but couldn't find an answer. Is there any way for me to define a function or procedure inside a pig script so I don't have to copy&paste my piglatin code several times ...
Fernando Doglio
Apr 19, 2012 at 9:22 pm
Apr 19, 2012 at 9:33 pm -
Alan Gates
Apr 18, 2012 at 12:04 am
Apr 18, 2012 at 6:19 pm -
Hi All, I have one holiday file and one daily log file. I have to mark particular day as holiday in daily log file , if that date is matching to holiday File dates holidayFile = load 'holidayList' as ...
Shin Chan
Apr 16, 2012 at 2:12 pm
Apr 17, 2012 at 10:48 am -
I specified -Djava.io.tmpdir=/my/big/partition/path to PIG_OPTS and I can see that this is indeed set on the JVM args, but when I ran pig -x local my_pig_script it still dumped temp files into /tmp, ...
Yang
Apr 17, 2012 at 1:07 am
Apr 17, 2012 at 3:55 am -
Can anyone comment on whether or not Javascript UDFs are here to stay? on the wiki it states "*Note:* *JavaScript UDFs are an experimental feature."* * * *Regards,* * Dan* * *
Dan Young
Apr 12, 2012 at 9:06 pm
Apr 12, 2012 at 9:21 pm -
Is it possible to say something like F = JOIN A BY (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT), B BY (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT) AND FILTER A BY FORM_ID == 0; Also, how far does ...
Mohit Anchlia
Apr 11, 2012 at 10:39 pm
Apr 12, 2012 at 5:03 pm -
Am I doing something wrong or is this just the limitation? Basically I want to group 2 sets into one and locate them in the same row. grunt NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR ...
Mohit Anchlia
Apr 11, 2012 at 11:21 pm
Apr 12, 2012 at 2:43 pm -
I have created a bug (https://issues.apache.org/jira/browse/PIG-2636) based on the following (simplified) script: A = LOAD 'bug.in' AS a:tuple(x:int, y:int); B1 = FOREACH A GENERATE a.x, a.y; B2 = ...
Peter Gieser
Apr 10, 2012 at 6:27 am
Apr 10, 2012 at 2:53 pm -
Hi Guys!! I'm over here trying to get my feet wet with Hadoop and my first task just happens to be a complex one. I was hoping you could help me out. I'm trying to read nested JSON structures (data ...
Anurag Gulati
Apr 4, 2012 at 10:38 pm
Apr 6, 2012 at 5:05 pm
Group Overview
group | user |
categories | pig, hadoop |
discussions | 96 |
posts | 377 |
users | 89 |
website | pig.apache.org |
89 users for April 2012
Archives
- May 2013 (92)
- April 2013 (226)
- March 2013 (362)
- February 2013 (192)
- January 2013 (166)
- December 2012 (115)
- November 2012 (223)
- October 2012 (249)
- September 2012 (275)
- August 2012 (249)
- July 2012 (219)
- June 2012 (371)
- May 2012 (281)
- April 2012 (377)
- March 2012 (341)
- February 2012 (323)
- January 2012 (364)
- December 2011 (266)
- November 2011 (234)
- October 2011 (207)
- September 2011 (321)
- August 2011 (271)
- July 2011 (253)
- June 2011 (249)
- May 2011 (239)
- April 2011 (341)
- March 2011 (321)
- February 2011 (276)
- January 2011 (320)
- December 2010 (244)
- November 2010 (136)
- October 2010 (251)
- September 2010 (161)
- August 2010 (201)
- July 2010 (198)
- June 2010 (171)
- May 2010 (205)
- April 2010 (192)
- March 2010 (237)
- February 2010 (192)
- January 2010 (182)
- December 2009 (106)
- November 2009 (169)
- October 2009 (105)
- September 2009 (134)
- August 2009 (108)
- July 2009 (140)
- June 2009 (151)
- May 2009 (150)
- April 2009 (133)
- March 2009 (124)
- February 2009 (119)
- January 2009 (66)
- December 2008 (45)
- November 2008 (80)
- October 2008 (102)
- September 2008 (112)
- August 2008 (32)
- July 2008 (46)
- June 2008 (78)
- May 2008 (79)
- April 2008 (26)
- March 2008 (42)
- February 2008 (30)
- January 2008 (15)
- December 2007 (31)
- November 2007 (13)
- October 2007 (9)