FAQ

Search Discussions

53 discussions - 244 posts

  • so I have module.py, and I want to be able to use it in a pig script. It has no special imports or anything. I do have @outputSchemaFunction("output:chararray) In my pig script, I have this register ...
    Jonathan CoveneyJonathan Coveney
    Dec 27, 2010 at 9:31 pm
    Dec 30, 2010 at 2:12 am
  • It's very hard to search for this among the docs because it's so generic, so I thought I'd ask... I'm sure the answer is painfully easy. Taking a look at this code that I found online, for example -- ...
    Jonathan CoveneyJonathan Coveney
    Dec 3, 2010 at 9:04 pm
    Dec 8, 2010 at 8:04 pm
  • Does anyone know of any existing StoreFunc to specify a maximum output file size? Or would I need to write a custom StoreFunc to do this? I am running into a problem on Amazon's EMR where the files ...
    Zach BaileyZach Bailey
    Dec 21, 2010 at 10:52 pm
    Jan 7, 2011 at 11:48 pm
  • I kept seening Failed to create DataStroage error when try to run pig $ java -cp pig-0.7.0-core.jar:$HADOOP_CONF_DIR org.apache.pig.Main -x mapreduce 10/12/09 20:35:31 INFO pig.Main: Logging error ...
    Felix gaoFelix gao
    Dec 10, 2010 at 1:48 am
    Dec 13, 2010 at 5:37 pm
  • I have al is of regex patterns that I would like to run against a data set, and if it matches a particular pattern in the list, tag it with the predefined tag for that pattern. Has this been done, or ...
    Brian AdamsBrian Adams
    Dec 6, 2010 at 8:26 pm
    Dec 8, 2010 at 12:59 am
  • Hi, I loaded a csv file with about 10 fields into PigStorage and tried to do a GROUP BY on one of the fields. The MapReduce job gets created, and the Mappers finish execution. But after that, the job ...
    Deepak N85Deepak N85
    Dec 13, 2010 at 8:19 am
    Dec 14, 2010 at 6:48 am
  • Hello, Is there some sort of mechanism by which I could cause a value to accumulate within a relation? What I'd like to do is something along the lines of having a long called accumulator, and an ...
    Kris CowardKris Coward
    Dec 17, 2010 at 7:31 pm
    Dec 19, 2010 at 10:49 pm
  • I'm not sure if Pig can handle this...perhaps in this specific case there is something more clever that can be done, although I think it points to a bigger question. Basically, let's say I have ...
    Jonathan CoveneyJonathan Coveney
    Dec 14, 2010 at 6:33 pm
    Dec 14, 2010 at 9:20 pm
  • Hi, In our application Hive is used as a database. i.e. a result set from a select query is consumed outside of hadoop cluster. The consumption process is not Hadoop friendly as in it is network ...
    Jae LeeJae Lee
    Dec 7, 2010 at 2:41 pm
    Dec 8, 2010 at 6:34 pm
  • Hello, All. I have what I hope is a quick, noob question, but I am having trouble flattening one of my relations, G. -------------------------------------------------- ...
    Michael MossMichael Moss
    Dec 28, 2010 at 7:39 pm
    Dec 28, 2010 at 10:36 pm
  • Hi, How do I change the default timeout for reducer with Pig? I have some reducer that needs to take longer than 10 minutes to finish. It is pretty frustrating to see many of get to 95% complete and ...
    Dexin WangDexin Wang
    Dec 21, 2010 at 9:23 pm
    Dec 23, 2010 at 4:30 pm
  • Guys, I'm starting to use pig (0.8) now and I went to Pig Wiki for some directives and tutorials. I already found some errors and have some suggestions to contribute. How ( or to whom ) I could send ...
    Charles GonçalvesCharles Gonçalves
    Dec 21, 2010 at 11:56 am
    Dec 22, 2010 at 3:10 pm
  • I am having a hard time getting comparison to work. I am comparing from two long values but I keep on getting a cast long to String error Backend error message --------------------- ...
    John HuiJohn Hui
    Dec 15, 2010 at 9:22 pm
    Dec 17, 2010 at 6:22 am
  • Hi guys, I'm having some trouble finished jobs that run smoothly on a smaller dataset, but always fail at 99% if i try to run the job on the whole set. i can see a few killed map and a few killed ...
    JrJr
    Dec 8, 2010 at 2:09 pm
    Dec 15, 2010 at 2:06 am
  • Hi, We'd like to patch our pig AvroStorage function and would highly appreciate any kinds of comments. doc: http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data ...
    Lin GuoLin Guo
    Dec 1, 2010 at 5:05 am
    Dec 9, 2010 at 8:20 am
  • Hi all, In pig, we can do pattern matching with regular expression. For a pattern like "abc|.*", how to write the regular expression? I tried the following: A = FILTER B BY (name matches 'abc\|.*'); ...
    Zhen GuoZhen Guo
    Dec 3, 2010 at 3:32 am
    Dec 3, 2010 at 2:43 pm
  • Hi, Consider this use case: There is a program store cpu usage metrics to a HBase table. This HBase table has a column family called cpu, and individual cpu core usage is stored in columns like, ...
    Eric YangEric Yang
    Dec 29, 2010 at 7:10 am
    Dec 30, 2010 at 7:33 pm
  • So, I made a dumb little python script that parses a pig script, see's what stores there are, and then uses pig's describe function to get the schema of the object being stored and then uses that ...
    Jonathan CoveneyJonathan Coveney
    Dec 28, 2010 at 10:09 pm
    Dec 29, 2010 at 12:49 am
  • Hello fellow pig users, I have told pig to use a separate disk for its temp files by setting PIG_OPTS=-Dhadoop.tmp.dir=/mnt/hadoop-tmp but it still keeps a lot of its files in /tmp: ...
    David VrenskDavid Vrensk
    Dec 16, 2010 at 4:18 pm
    Dec 16, 2010 at 11:03 pm
  • I am getting an error I have not seen before and would love some help. I did a DESCRIBE and it parses fine, but when you actually try and execute, that is when it blows up. Here is the error: ...
    Jonathan CoveneyJonathan Coveney
    Dec 15, 2010 at 9:38 pm
    Dec 16, 2010 at 7:37 pm
  • A = load 'foo.txt' using PigStorage as (x : chararray, y : int); B = group A by x; C = group B by group; describe C; -- we got -- C: {group: chararray,B: {group: chararray,A: {x: chararray,y: int}}} ...
    Lin GuoLin Guo
    Dec 9, 2010 at 11:35 pm
    Dec 14, 2010 at 6:11 am
  • Hello I have this problem to solve using Pig. *Input* 1. Relation A which has only one field of type chararray. Sample of A follows: *abc* *xyz gh* *zzz yy* *red* Approximate numbers of rows in A = ...
    Arun A KArun A K
    Dec 2, 2010 at 6:54 pm
    Dec 2, 2010 at 7:50 pm
  • I'm using Pig 0.8 via Eclipse/PigServer. I can't figure out how to set the logging level to ERROR, rather than the default INFO. I tried everything I could think of: log4j.properties in Pig and ...
    Andreas PaepckeAndreas Paepcke
    Dec 28, 2010 at 7:47 pm
    Jan 17, 2011 at 5:55 pm
  • Pig users, I wrote up a short overview of some new features in Pig 0.8: https://squarecog.wordpress.com/2010/12/19/new-features-in-apache-pig-0-8/ Cheers -Dmitriy
    Dmitriy RyaboyDmitriy Ryaboy
    Dec 20, 2010 at 5:19 pm
    Dec 20, 2010 at 9:48 pm
  • All, Not sure if this is the right mailing list of this question. I am using pig to do some data analysis and I am wondering if there a way to tell pig when it encountered a bad log files either due ...
    Felix gaoFelix gao
    Dec 20, 2010 at 8:07 pm
    Dec 20, 2010 at 8:40 pm
  • Is it possible to increment a counter in Pig UDF (in either Load/Eval/Store Func). Since we have access to counters using the org.apache.hadoop.mapred.Reporter: ...
    Dexin WangDexin Wang
    Dec 16, 2010 at 1:17 am
    Dec 16, 2010 at 5:09 pm
  • Hi Folks, I have 2 datasets (T1, T2) to be joinned. I need to join T1 with T2 based on some criteria. COGROUP does it based on == condition. ex: COGROUP T1 by f1, T2 by f2 (but I need to filter T2.f2 ...
    Rajesh BalamohanRajesh Balamohan
    Dec 14, 2010 at 1:05 am
    Dec 15, 2010 at 10:24 pm
  • Hello, I'm having an issue with a script that uses an EvalFunc I wrote. The issue is the final output contains characters that I am not expecting (commas - followed by what I'm guessing are null ...
    Michael MossMichael Moss
    Dec 8, 2010 at 9:50 pm
    Dec 10, 2010 at 12:07 am
  • Seems after FLATTEN, the rows with null values get dropped. I have two test files: % cat test1.txt 1 a b 2 c d 3 e f % cat test2.txt 1 x 2 y 6 z 8 w I'm trying to cogroup the two on the first column: ...
    Dexin WangDexin Wang
    Dec 31, 2010 at 12:36 am
    Dec 31, 2010 at 4:03 am
  • Basically, I want a way to be able to see the schema of something from within a pig script outside of pig, ideally without having to connect to hadoop to do so. So for example, we take a random ...
    Jonathan CoveneyJonathan Coveney
    Dec 28, 2010 at 4:22 pm
    Dec 28, 2010 at 11:39 pm
  • Hey all, I was wondering if anyone could give me some pointers on a good approach for temporally clustering a data set I have. The data set consists of web page crawl data - for the sake of this ...
    Zach BaileyZach Bailey
    Dec 20, 2010 at 11:36 pm
    Dec 21, 2010 at 5:32 am
  • Pig team is happy to announce Pig 0.8.0 release. Apache Pig provides a high-level data-flow language and execution framework for parallel computation on Hadoop clusters. More details about Pig can be ...
    Daniel DaiDaniel Dai
    Dec 18, 2010 at 2:14 am
    Dec 20, 2010 at 4:58 pm
  • Hi, I have a question about map parallelism in Pig. I am using Pig to stream a file through a Python script that performs some computationally expensive transforms. This process is assigned to a ...
    Charles WCharles W
    Dec 15, 2010 at 4:11 am
    Dec 15, 2010 at 6:39 pm
  • Hi, I've recently gotten stumped by a problem where my attempts to dump the relations produced by a GROUP command give the following error (though illustrating the same relation works fine): ...
    Kris CowardKris Coward
    Dec 8, 2010 at 9:53 pm
    Dec 8, 2010 at 11:20 pm
  • Hi, All, I want to check in some functions into piggybank and have some questions about dependent jars: 1. it depends on some new jars, where should I add them? updating ivy.xml under trunk to ...
    Lin GuoLin Guo
    Dec 8, 2010 at 8:17 am
    Dec 8, 2010 at 6:44 pm
  • Hi, This might be a dumb question. Is it possible to pass anything other than the input tuple to a UDF Eval function? Basically in my UDF, I need to do some user info lookup. So the input will be: ...
    Dexin WangDexin Wang
    Dec 7, 2010 at 7:44 pm
    Dec 7, 2010 at 8:09 pm
  • In order to facilitate more robust loading, I have 2 questions. 1) I know that you can use some wildcards in loading... for example, if you have 2 files, dog1.txt and dog2.txt, you can load dog*.txt ...
    Jonathan CoveneyJonathan Coveney
    Dec 1, 2010 at 3:58 pm
    Dec 1, 2010 at 4:57 pm
  • I wrote a UDF which produces the following DataBag with 2 tuples. I want to sort the results based on the first tuple but get an error. grunt I2L2_f = FOREACH I2L2_grp_t GENERATE ...
    Matt TanquaryMatt Tanquary
    Dec 23, 2010 at 4:46 pm
    Dec 23, 2010 at 4:51 pm
  • This set results from a JOIN: (04f4c2fd-8be2-41c3-b045-283de80909ba,1966,2L) (04f4c2fd-8be2-41c3-b045-283de80909ba,3845,2L) Using PIG, I group this and get: ...
    Matt TanquaryMatt Tanquary
    Dec 21, 2010 at 4:32 pm
    Dec 21, 2010 at 10:44 pm
  • [junit] Tests run: 17, Failures: 0, Errors: 0, Time elapsed: 584.868 sec [junit] Running org.apache.pig.test.TestAdd [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.284 sec [junit] ...
    Corbin HoenesCorbin Hoenes
    Dec 17, 2010 at 4:23 am
    Dec 17, 2010 at 5:15 am
  • Hi all, My company uses pig a lot and I been looking for some examples on how to use pigstats and there seems to be very little material about it. Can someone point me to some useful references on ...
    Felix gaoFelix gao
    Dec 16, 2010 at 9:20 pm
    Dec 17, 2010 at 1:51 am
  • I'm either unhappy with streaming bags, or I'm doing something wrong. I have a relation, and I want to apply a user-defined operation on its elements, conditioned on the field "source". Let¹s say ...
    Dragos MunteanuDragos Munteanu
    Dec 14, 2010 at 5:42 pm
    Dec 14, 2010 at 7:33 pm
  • Hey guys, Any update on the state of the Pig 0.8 release? Found the following thread, looks like all issues with fix version 0.8.0 have been closed out. ...
    Eli CollinsEli Collins
    Dec 14, 2010 at 12:54 am
    Dec 14, 2010 at 1:19 am
  • Hi again, Now I'm having a problem where, after JOINing 2 relations, I can't get an ILLUSTRATE to work on the resulting relation (though a DUMP is working just fine). The error produced is: ...
    Kris CowardKris Coward
    Dec 9, 2010 at 12:14 am
    Dec 9, 2010 at 12:38 am
  • I'm currently running into an issue where I have a bag of tuples like so: ( {(a,b,c,d,e), (1,2,3,4,5)}, ... , {(f,g,h,i,j), (6,7,8,9,10)} ) Each one of the tuples has the same number of fields. So I ...
    Xavier StevensXavier Stevens
    Dec 7, 2010 at 7:26 pm
    Dec 8, 2010 at 12:44 am
  • Hi, i want to store relations A and B into one file. The output file should look like this: A1, B1 A2, B2 etc any Ideas? Is this even possible with piglatin?
    MrkosMrkos
    Dec 6, 2010 at 7:00 pm
    Dec 7, 2010 at 1:20 am
  • I just run a script in which I reuse some of the alias names in different foreach blocks (by careless copying). it is like: ... b = foreach a { x = ...; y = ...; ...} ... i = foreach h { x = ...; y = ...
    Xiaomeng WanXiaomeng Wan
    Dec 2, 2010 at 10:04 pm
    Dec 2, 2010 at 10:08 pm
  • Wrote this up over the holidays, since this is asked pretty often on the user list: https://squarecog.wordpress.com/2010/12/24/incrementing-hadoop-counters-in-apache-pig/ Cheers D
    Dmitriy RyaboyDmitriy Ryaboy
    Dec 27, 2010 at 9:45 pm
    Dec 27, 2010 at 9:45 pm
  • Hi all, I'm excited to announce that Amazon Elastic MapReduce is now hosting the Google Books n-gram dataset in Amazon S3. The data has been converted to SequenceFile format to make it easy to ...
    Andrew HitchcockAndrew Hitchcock
    Dec 24, 2010 at 12:49 am
    Dec 24, 2010 at 12:49 am
  • After some grouping, re-tupling, and grouping again, I end up with the following: grunt describe I2L2; I2L2: {group: (lvl2: {B::lvl2: int}),I2L2_tuple: ...
    Matt TanquaryMatt Tanquary
    Dec 20, 2010 at 5:49 pm
    Dec 20, 2010 at 5:49 pm
Group Navigation
period‹ prev | Dec 2010 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions53
posts244
users49
websitepig.apache.org

49 users for December 2010

Dmitriy Ryaboy: 35 posts Jonathan Coveney: 30 posts Daniel Dai: 21 posts Zach Bailey: 12 posts Dexin Wang: 8 posts Felix gao: 8 posts Kris Coward: 8 posts Lin Guo: 7 posts Thejas M Nair: 7 posts Ashutosh Chauhan: 6 posts Deepak N85: 5 posts Alan Gates: 5 posts Anze: 5 posts Michael Moss: 5 posts Sheeba George: 5 posts Charles Gonçalves: 4 posts Jae Lee: 4 posts Jeff Zhang: 4 posts John Hui: 4 posts Matt Tanquary: 4 posts
show more