Grokbase Groups Pig user April 2012
FAQ

Search Discussions

96 discussions - 377 posts

  • Hi folks, The Analytics Infra team at Twitter will be hosting a Pig hackday on May 11. On the agenda: - get newcomers set up with the apache ticket process - review and commit a bunch of stuff that's ...
    Dmitriy RyaboyDmitriy Ryaboy
    Apr 18, 2012 at 9:18 pm
    May 4, 2012 at 7:55 pm
  • Hi Folks, I'm currently trying to do something I figured would be trivial, but actually wound up being a bit of work for me, so I'm wondering if I'm missing something. All I want to do is get a cross ...
    Eli FinkelshteynEli Finkelshteyn
    Apr 4, 2012 at 6:19 pm
    Apr 7, 2012 at 11:28 pm
  • Does HBaseStorage work with HBase 0.95? This code was working with HBase 0.92 and Pig 0.9 but fails on HBase 0.95 and Pig 0.11 (built from source): register ...
    Royston SellmanRoyston Sellman
    Apr 19, 2012 at 8:48 pm
    May 4, 2012 at 4:24 pm
  • Hi, Sorry, this is a PIG newbie question... When I use FLATTEN, I don't understand the structure of the relation that is returned. For example, the following relation A is the result of using ...
    James NewhavenJames Newhaven
    Apr 11, 2012 at 1:46 pm
    Apr 13, 2012 at 9:52 am
  • Is there a way I can just unit test my pig UDF? What's the best way to unit test in pig. I saw pigunittest but couldn't find a way to unit test udf.
    Mohit AnchliaMohit Anchlia
    Apr 20, 2012 at 12:05 am
    Apr 24, 2012 at 10:46 pm
  • I am able to write with Snappy compression. But I don't think pig provides anything to read such records. Can someone suggest or point me to relevant code that might help me write LoadFunc for it?
    Mohit AnchliaMohit Anchlia
    Apr 26, 2012 at 7:32 pm
    May 1, 2012 at 12:39 am
  • I have a lot of pig stuff like this: /* Load Avro jars and define shortcut */ register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar register /me/pig/build/ivy/lib/Pig/json-simple-1.1.jar register ...
    Russell JurneyRussell Jurney
    Apr 28, 2012 at 11:22 pm
    Apr 29, 2012 at 11:25 pm
  • Hi all, On behalf of the Pig PMC, I'm very happy to announce that Bill Graham has been invited to become a Pig committer. Bill's been involved in the Pig project for a long time now, and has made a ...
    Dmitriy RyaboyDmitriy Ryaboy
    Apr 5, 2012 at 9:55 pm
    Apr 6, 2012 at 2:37 pm
  • Given data: (1, 55, abc) (2, 23, asd) (1, 85, xyz) (1, 2, aaa) I would like to group on $0 and then have my grouped tuple be ordered by $1. Is this possible? The output should look like this: (1, ...
    Chan, TimChan, Tim
    Apr 16, 2012 at 8:31 pm
    Apr 17, 2012 at 10:53 am
  • Is it possible to use DBStorage to load data from MySQL by running a suppled SQL query? Something like: mydata = LOAD 'jdbc://localhost/enron' USING DBStorage('SELECT foo.value1, bar.value2 FROM foo ...
    Russell JurneyRussell Jurney
    Apr 28, 2012 at 7:22 am
    Apr 29, 2012 at 8:14 pm
  • I am new to pig and I have gone through the reference. I am getting used to how this works but I keep getting questions as I write my scripts. I have couple of questions: i) I use FILTER with ...
    Mohit AnchliaMohit Anchlia
    Apr 12, 2012 at 12:28 am
    Apr 16, 2012 at 2:57 pm
  • I am trying to get distinct from 2 fields in a record. something like select distinct a, b from c; So I wrote this in pig which is actually not working. I did: A = LOAD ...
    Mohit AnchliaMohit Anchlia
    Apr 11, 2012 at 8:53 pm
    Apr 12, 2012 at 3:02 pm
  • Hi, I have a really large data set of about 10 to 15 billion rows. I wanted to do some aggregates like sum, count distinct, max etc but this is taking forever to run the script. What hints or ...
    Sonia gehlotSonia gehlot
    Apr 3, 2012 at 12:28 am
    Apr 3, 2012 at 7:19 pm
  • Hi Guys, Has anyone used any tools to profile a pig query ? or Can anyone guide me on 'How to profile a pig query". I am trying to figure out the CPU, disk I/O, RAM usage. I have tried Starfish but ...
    Atul ThapliyalAtul Thapliyal
    Apr 12, 2012 at 4:04 pm
    May 13, 2012 at 2:35 am
  • Hi, I'm storing data into a partitioned table using Hive in RCFile format, but I want to use Pig to do the aggregation of that data. In my array <string in Hive, I have colon delimited data, E.g ...
    Malcolm TyeMalcolm Tye
    Apr 5, 2012 at 12:59 pm
    May 3, 2012 at 12:30 pm
  • Sorry for the previous incomplete message. Here is the take 2: When I use a Replicated Join only 2 map tasks get scheduled (compared to 100+ tasks for the other steps) What is the idea behind this? ...
    Shan sShan s
    Apr 30, 2012 at 2:55 pm
    May 2, 2012 at 9:31 pm
  • I am writing unit test but I had a doubt. My understanding is that complete record is a tuple. So record "a b {(ST:NC),(ZIP:28613),(CITY:Xxxxxxx),(NAM2:Xxxxx X &xxx; Xxxxx X Xxxxxx)} {(OCCUP:xxxxxxx ...
    Mohit AnchliaMohit Anchlia
    Apr 20, 2012 at 11:23 pm
    Apr 23, 2012 at 9:00 am
  • Hey all, Sorry if i sound naive, but how should one implement outputSchema of an eval Func that returns tuple. The way i do it is , public Schema outputSchema(Schema input) { List<FieldSchema list = ...
    Rajgopal VaithiyanathanRajgopal Vaithiyanathan
    Apr 19, 2012 at 2:03 am
    Apr 20, 2012 at 7:05 am
  • Dear All, I have 2 data dumps (comma separated) each with around 53,000 records ( just sample data. it could be 10times more than this in real time). I need to write a script to - 1. find matching ...
    SarathSarath
    Apr 7, 2012 at 9:39 am
    Apr 19, 2012 at 3:28 pm
  • I am currently getting “Type mismatch in key from map: expected org.apache.pig.impl.io.NullableBytesWritable, recieved org.apache.pig.impl.io.NullableText “ I looked up the PIG-919 and related ...
    Shan sShan s
    Apr 10, 2012 at 7:03 pm
    Apr 11, 2012 at 3:25 pm
  • Hi all my date is of the form YYYY-MM-DD HH:MM:SS.XXX How to find week for this ( 1-52 year week for given date) , any UDF for this ? Thanks and Regards ,
    Shin ChanShin Chan
    Apr 20, 2012 at 6:09 am
    Apr 23, 2012 at 2:06 am
  • Hello everyone, new pig user here. I've being toying around with pig for a while now, and I wanted to create my first UDF in Python. It all went fine until I wanted to use RegExps. I tried importing ...
    Fernando DoglioFernando Doglio
    Apr 19, 2012 at 7:30 pm
    Apr 19, 2012 at 8:59 pm
  • Hi I have data something like f1,f2,f3,f4,f5 Rows with 5 fields I have to produce final dump output as f1,f2,f3, SUM( all fields at f4 position) , COUNT ( number of fields at f5 position ) , f4 , f5 ...
    Shin ChanShin Chan
    Apr 15, 2012 at 5:44 am
    Apr 16, 2012 at 7:39 am
  • Where will the outputSchema be executed? in the client or as a mapreduce ? I've planned to keep the output schema as an XML and let the outputSchema method read it and generate the Schema object with ...
    Rajgopal VaithiyanathanRajgopal Vaithiyanathan
    Apr 13, 2012 at 8:32 am
    Apr 13, 2012 at 4:18 pm
  • Hi all, How to replace some value in string at particular location For example abcd Replace values from index 0-1 with mn mncd as output Any built in UDF or i should write own UDF?. I checked ...
    Shin ChanShin Chan
    Apr 13, 2012 at 12:18 pm
    Apr 13, 2012 at 4:07 pm
  • Hi, I need to divide a large bag into 10 smaller bags of equal size. Does anyone know of a function that can do this easily? I've had a look at the standard functions and the PiggyBank and can't find ...
    James NewhavenJames Newhaven
    Apr 11, 2012 at 3:53 pm
    Apr 11, 2012 at 7:57 pm
  • Hi, I am trying to a limit the output size using LIMIT. I want to the limit size to be 5 percent of the total output size like this: -- Put all the inids in a bag so we can count them. G = GROUP F ...
    James NewhavenJames Newhaven
    Apr 10, 2012 at 8:33 pm
    Apr 10, 2012 at 10:14 pm
  • Hi, I'm using pig 0.9.2 on cdh3u3 with a snapshot-build of elephant bird in order to get json parsing. I have an incredibly unusual error that I see with certain gzip compressed files. It's probably ...
    Joe CrobakJoe Crobak
    Apr 5, 2012 at 3:44 pm
    Apr 9, 2012 at 9:33 pm
  • Hi, I'm having some challenges with a load function. It only seems to work with a void constructor. The Java code has a void constructor and a String constructor, much like the SimpleTextLoader ...
    Walker, AlanWalker, Alan
    Apr 6, 2012 at 8:39 pm
    Apr 9, 2012 at 6:00 pm
  • Hi All I am trying to use param file to import certain regex variables into Pig I have file written something like isoDate = '[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z' When i try to use ...
    Shin ChanShin Chan
    Apr 4, 2012 at 1:52 pm
    Apr 8, 2012 at 6:22 pm
  • Hi , Have anyone used the XMLloader to parse an XML file, if so then can you please share few lines of your scripts. I tried using the example given by pig.apache.org but not sure how to use it ...
    krishnan Nkrishnan N
    Apr 17, 2012 at 10:43 pm
    May 16, 2012 at 3:16 pm
  • Hi, I'm using pig 0.9.2 with the JsonLoader included in elephant-bird 2.2.2 to process geojson data(Flickr shapefiles: http://code.flickr.com/blog/2011/01/08/flickr-shapefiles-public-dataset-2-0/ ) ...
    Fabio Souto MoureFabio Souto Moure
    Apr 19, 2012 at 2:35 am
    Apr 19, 2012 at 2:33 pm
  • Hey everyone, I have a new question about how to handle a very common issue the best: We have a LOAD statement loading AVRO files using globbing by a given regex. By some wired reason this might ...
    Markus ReschMarkus Resch
    Apr 10, 2012 at 3:58 pm
    Apr 11, 2012 at 5:39 pm
  • Hello! I wonder if Pig can be use in this way:I have many records,each record content name and age,now I want to find out one person's age.I know it is easy to make it in Pig Latin, but I just wonder ...
    凯氏图腾凯氏图腾
    Apr 4, 2012 at 1:52 pm
    Apr 4, 2012 at 5:24 pm
  • Hi, I have set the PIG_CALSSPATH and CLASS_PATH pointing to the location of jython jar file and my python program is in the same location as my pig script. But I am encountering the following error ...
    Kumar palaniappanKumar palaniappan
    Apr 19, 2012 at 8:31 pm
    Apr 30, 2012 at 11:25 pm
  • The documentation page called "Pig Latin Basics" at http://pig.apache.org/docs/r0.9.2/basic.html has three occurrences of "INNER" in upper case. From these three occurrences it seems that INNER is a ...
    Fred zemkeFred zemke
    Apr 25, 2012 at 7:21 pm
    Apr 26, 2012 at 7:37 pm
  • Does pig ever share same instance of object concurrently at run time? Or does it create a new instance for every invocation? I wrote UDF with a public data member (not static) but I wonder if that is ...
    Mohit AnchliaMohit Anchlia
    Apr 24, 2012 at 11:06 pm
    Apr 25, 2012 at 7:20 am
  • I have a file on HDFS with a reduced block size. I created this overriding the dfs.block.size param on the hadoop fs -put command . hadoop fsck shows that this file has 15 blocks (as opposed to the ...
    Sam WilliamSam William
    Apr 23, 2012 at 11:30 pm
    Apr 24, 2012 at 4:17 am
  • Hi All, I am pretty new to pig and am having some issues with dereferencing. My data in simplified form looks like below data = load 'visitevent' using PigStorage() AS (visit:tuple(visitorid, ...
    Mustafi, PriyoMustafi, Priyo
    Apr 23, 2012 at 7:05 pm
    Apr 23, 2012 at 8:50 pm
  • I am simply reading 1.5 gb file with 2458220 records, and storing it back. I am getting Java Heap space error The current setting is mapred.child.java.opts -Xmx1073741824 Below is the error from ...
    Shan sShan s
    Apr 20, 2012 at 8:19 pm
    Apr 22, 2012 at 10:50 pm
  • Hello All If i give following variable from command line it does not work -param TIMESTAMP = 'date +%c' It gives error java.lang.RuntimeException: Encountered unexpected arguments on command line - ...
    Shin ChanShin Chan
    Apr 19, 2012 at 9:40 am
    Apr 20, 2012 at 6:43 am
  • Been looking around for this, but couldn't find an answer. Is there any way for me to define a function or procedure inside a pig script so I don't have to copy&paste my piglatin code several times ...
    Fernando DoglioFernando Doglio
    Apr 19, 2012 at 9:22 pm
    Apr 19, 2012 at 9:33 pm
  • Alan GatesAlan Gates
    Apr 18, 2012 at 12:04 am
    Apr 18, 2012 at 6:19 pm
  • Hi All, I have one holiday file and one daily log file. I have to mark particular day as holiday in daily log file , if that date is matching to holiday File dates holidayFile = load 'holidayList' as ...
    Shin ChanShin Chan
    Apr 16, 2012 at 2:12 pm
    Apr 17, 2012 at 10:48 am
  • I specified -Djava.io.tmpdir=/my/big/partition/path to PIG_OPTS and I can see that this is indeed set on the JVM args, but when I ran pig -x local my_pig_script it still dumped temp files into /tmp, ...
    YangYang
    Apr 17, 2012 at 1:07 am
    Apr 17, 2012 at 3:55 am
  • Can anyone comment on whether or not Javascript UDFs are here to stay? on the wiki it states "*Note:* *JavaScript UDFs are an experimental feature."* * * *Regards,* * Dan* * *
    Dan YoungDan Young
    Apr 12, 2012 at 9:06 pm
    Apr 12, 2012 at 9:21 pm
  • Is it possible to say something like F = JOIN A BY (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT), B BY (FILE_NAME,CREATED_DATE,FORM_ID,FORM_ID_ROOT) AND FILTER A BY FORM_ID == 0; Also, how far does ...
    Mohit AnchliaMohit Anchlia
    Apr 11, 2012 at 10:39 pm
    Apr 12, 2012 at 5:03 pm
  • Am I doing something wrong or is this just the limitation? Basically I want to group 2 sets into one and locate them in the same row. grunt NM_CT_ST_FILTER = FILTER A by (FIELD_ID == 'NAM2' OR ...
    Mohit AnchliaMohit Anchlia
    Apr 11, 2012 at 11:21 pm
    Apr 12, 2012 at 2:43 pm
  • I have created a bug (https://issues.apache.org/jira/browse/PIG-2636) based on the following (simplified) script: A = LOAD 'bug.in' AS a:tuple(x:int, y:int); B1 = FOREACH A GENERATE a.x, a.y; B2 = ...
    Peter GieserPeter Gieser
    Apr 10, 2012 at 6:27 am
    Apr 10, 2012 at 2:53 pm
  • Hi Guys!! I'm over here trying to get my feet wet with Hadoop and my first task just happens to be a complex one. I was hoping you could help me out. I'm trying to read nested JSON structures (data ...
    Anurag GulatiAnurag Gulati
    Apr 4, 2012 at 10:38 pm
    Apr 6, 2012 at 5:05 pm
Group Navigation
period‹ prev | Apr 2012 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions96
posts377
users89
websitepig.apache.org

89 users for April 2012

Dmitriy Ryaboy: 38 posts Mohit Anchlia: 32 posts Jon Coveney: 25 posts Prashant Kommireddi: 22 posts Russell Jurney: 22 posts Shin Chan: 16 posts Gianmarco De Francisci Morales: 12 posts Bill Graham: 11 posts Rajgopal Vaithiyanathan: 11 posts Norbert Burger: 9 posts Dan Feldman: 8 posts Dan Young: 8 posts James Newhaven: 8 posts Sarath: 7 posts Benjamin Juhn: 6 posts Kumar palaniappan: 6 posts Shan s: 6 posts krishnan N: 5 posts Royston Sellman: 5 posts Alan Gates: 4 posts
show more