Grokbase Groups Pig user October 2011

Search Discussions

49 discussions - 207 posts

  • I have a problem where I don't know how or if pig is even suitable to solve it. I have a schema like this: student-id,student-name,start-time,duration,course 1,marco,1319708213,500,math ...
    Marco CadetgMarco Cadetg
    Oct 27, 2011 at 9:57 am
    Nov 14, 2011 at 6:11 pm
  • Hi All, I am trying to use the sequence file loader in piggybank for my custom writable object. I am working with pig 0.8, It looks like it does not work for user defined custom writables? Any ...
    Gayatri RaoGayatri Rao
    Oct 24, 2011 at 7:21 am
    Oct 29, 2011 at 12:50 am
  • Hi After grouping a data set, how do I save each group in a separate file. ex: A = E:/data.txt' USING PigStorage(','); B = GROUP A BY $0; cat data.txt; (1,2,3) (4,2,1) (8,3,4) (4,3,3) (7,2,5) (8,4,3) ...
    Oct 13, 2011 at 4:34 am
    Oct 13, 2011 at 12:12 pm
  • Hi, I'd like to store the output relation partitioned by
    Stan RosenbergStan Rosenberg
    Oct 4, 2011 at 3:10 am
    Oct 5, 2011 at 5:27 pm
  • Hi, If I create two schemas like this: Schema s1 = Utils.getSchemaFromString("b: {t: (f0: [])}"); Schema s2 = SchemaUtil.newBagSchema(new Byte[]{DataType.MAP}); then compare them like this: boolean ...
    Andrew CleggAndrew Clegg
    Oct 31, 2011 at 7:21 pm
    Nov 1, 2011 at 11:06 am
  • Hi there, I would need to do something like this: A = LOAD 'student' USING PigStorage() AS (name:chararray, region:chararry, iq:int); DUMP A; (John, There, 10) (Alf, There, 10) (ET, There, 10) (Mary, ...
    Marco CadetgMarco Cadetg
    Oct 11, 2011 at 3:21 pm
    Oct 12, 2011 at 4:04 pm
  • Hi, When you have a UDF that returns a bag, and you're writing the outputSchema method, do you have to explicitly include the mandatory 'container' tuple within the bag, or is this implicit? i.e. if ...
    Andrew CleggAndrew Clegg
    Oct 3, 2011 at 3:28 pm
    Oct 5, 2011 at 10:41 pm
  • I have 3 Pig scripts that load data from the same log file, but filter & group this data differently. If I combine these 3 into one & LOAD only once, performance seems to have improved, but now I am ...
    Something SomethingSomething Something
    Oct 3, 2011 at 11:54 pm
    Oct 4, 2011 at 11:35 pm
  • Hi Folks, I came across a use case where I'd like to do something like this: FOREACH X { if (!IsEmpty(t)) }
    Stan RosenbergStan Rosenberg
    Oct 2, 2011 at 4:23 pm
    Oct 3, 2011 at 6:33 pm
  • Hi, What's a proper way to deploy python udfs? I've dropped the latest version of jython.jar in $PIG_HOME/lib. Things work in "local" mode, but when I run on a cluster, built-in python modules cannot ...
    Stan RosenbergStan Rosenberg
    Oct 18, 2011 at 2:49 am
    Mar 14, 2012 at 6:32 am
  • Hi guys, It seems like our 'collected' option for group is pretty limited. Imagine I have the following (silly example) script: tweets = load 'tweets' using TweetLoader() as (id:long, uid:long, ...
    Dmitriy RyaboyDmitriy Ryaboy
    Oct 6, 2011 at 3:51 pm
    Oct 8, 2011 at 1:27 am
  • Hi pig users, I implemented a custom StoreFunc to write some data in a binary format to a Sequence File. private RecordWriter<NullWritable, BytesWritable writer; private BytesWritable bytes; private ...
    Gianmarco De Francisci MoralesGianmarco De Francisci Morales
    Oct 28, 2011 at 4:38 pm
    Nov 3, 2011 at 5:25 pm
  • Hi, I try to figure out why PIG is using so many zookeeper connections (from the frontend machine) when using HBaseStorage(). I added a trace in the constructor of HBaseStorage() I wrote a simple ...
    Vincent BaratVincent Barat
    Oct 25, 2011 at 9:49 am
    Oct 28, 2011 at 6:20 pm
  • Hi, I have three constant udfs in jython: @outputSchema("m:map[bag{tuple()}]") def dummy1(): return {"key":[("value1", "value2")]} @outputSchema("m:map[tuple()]") def dummy2(): return ...
    Stan RosenbergStan Rosenberg
    Oct 12, 2011 at 11:30 pm
    Oct 13, 2011 at 8:58 pm
  • Hi, I try to set a reducer number in the following way: java -Dmapred.reduce.tasks=8 -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main ./L1.pig but it doesn't work, the reducers number remain the ...
    Hui QiHui Qi
    Oct 12, 2011 at 8:29 pm
    Oct 13, 2011 at 12:42 am
  • Greetings everyone, My pig script contains a call to my custom udf and I seem to be running into a couple of classloader issues when running it. Below are the specifics (the call stack), but I have ...
    Babak FarhangBabak Farhang
    Oct 11, 2011 at 11:40 pm
    Oct 12, 2011 at 12:35 pm
  • Hello all, I'm new to Hadoop and Pig, and I've got a question. I've got relation that looks like this via GROUP ((customer1,2011-10-07,GET,200),{....}) ((customer1,2011-10-07,PUT,201),{....}) ...
    Dustin WhitneyDustin Whitney
    Oct 10, 2011 at 4:44 pm
    Oct 10, 2011 at 7:49 pm
  • I am getting the below exception when trying to execute PIG latin script. Failed! Failed Jobs: JobId Alias Feature Message Outputs job_201110042009_0005 A MAP_ONLY Message: Job failed! ...
    Oct 4, 2011 at 11:52 am
    Oct 7, 2011 at 8:13 am
  • Pig is version 0.9. I have a script that under version 0.8, begins running quickly...but under version 0.9, it takes 10 minutes to parse (I did -debug ALL and it builds a big old AST). I am curious ...
    Jonathan CoveneyJonathan Coveney
    Oct 21, 2011 at 10:37 pm
    Oct 25, 2011 at 10:08 pm
  • Hello list! I am new here, I am trying to add Pig on a Hadoop cluster. I have no idea how to make code Pig, I am system administrator. We have 10 node cluster with Hadoop and we use it with Nutch. ...
    Josu LazkanoJosu Lazkano
    Oct 19, 2011 at 8:11 am
    Oct 20, 2011 at 8:09 pm
  • Hi How can I ignore the seperator character in middle of a column value. eg : Seperator char is ‘|’. The Record values are | seperated xyz|1234|98798|”xyz|abc”| Regards Kiran.G
    Oct 18, 2011 at 4:36 am
    Oct 18, 2011 at 4:28 pm
  • Hi, I have a simple python udf which takes a variable number of (string) arguments and returns the first non-empty one. I can see that the udf is invoked from pig but no arguments are being passed. ...
    Stan RosenbergStan Rosenberg
    Oct 17, 2011 at 4:54 pm
    Oct 17, 2011 at 8:01 pm
  • Hi guys, I know Gianmarco recently worked on the nested foreach -- any chance nested group got done at the same time? :) D
    Dmitriy RyaboyDmitriy Ryaboy
    Oct 12, 2011 at 9:12 pm
    Oct 13, 2011 at 6:13 am
  • The job I am trying to run performs some projections and aggregations. I see that maps continuously fail with an OOM with the following stack trace: Error: java.lang.OutOfMemoryError: Java heap space ...
    Shubham ChopraShubham Chopra
    Oct 10, 2011 at 9:23 pm
    Oct 11, 2011 at 9:02 pm
  • Hi there, I would like to replace the value of a field based on its value. E.g.: A = LOAD 'student' USING PigStorage() AS (name:chararray); DUMP A; (John) (Mary (Bill) (Joe) (John) Now I would like ...
    Marco CadetgMarco Cadetg
    Oct 11, 2011 at 7:52 am
    Oct 11, 2011 at 8:56 am
  • Hello, I am using pig through the pig server. I need to pass some parameters to the pig script which I am passing by calling the pigServer.registerScript(pigScript, params); If my parameters have ...
    Alex RovnerAlex Rovner
    Oct 18, 2011 at 11:52 pm
    Jan 10, 2012 at 3:07 pm
  • Does this ticket mean that inner and outer are deprecated for group/cogroup? It sounds that way, but I just wanted to make sure. (We may need to refactor some things if so.) ...
    Jeremy HannaJeremy Hanna
    Oct 26, 2011 at 10:29 pm
    Oct 27, 2011 at 3:49 pm
  • I'm wonder if this is a known bug, or what. Here is a test script that isolates: a = load 'data1' as (x:int); b = load 'data2' as (y:int); val1 = foreach (filter (cogroup a by x, b by y) by COUNT(b) ...
    Jonathan CoveneyJonathan Coveney
    Oct 14, 2011 at 8:16 pm
    Oct 15, 2011 at 6:30 pm
  • Newbie question - I have an inner bag of tuples that I'd like to convert into an outer bag/relation and I'm struggling to figure out how For example if I have ({(1,2),(3,4),(5,6)} ({(7,8),(9,10)} I'd ...
    Pete WardenPete Warden
    Oct 15, 2011 at 5:54 am
    Oct 15, 2011 at 2:54 pm
  • HI Can I use 2 methods while STORING the output, If yes how? Ex: REGISTER contrib/piggybank/java/piggybank.jar; A= LOAD 'E:/data/June_PAG_Sample.txt' USING PigStorage('|'); B = GROUP A BY $3; C= ...
    Oct 13, 2011 at 10:21 am
    Oct 13, 2011 at 8:05 pm
  • Dear All, I am consistently getting following error on execution of my pig script: " [main] ERROR - ERROR 2997: Unable to recreate exception from backed error: ...
    Ipshita chatterjiIpshita chatterji
    Oct 10, 2011 at 7:35 pm
    Oct 11, 2011 at 9:52 pm
  • Hello, I read the blog and tried to understand the architecture of the HBase. There is one thing that makes me confusing. If I ...
    Oct 8, 2011 at 7:02 am
    Oct 8, 2011 at 10:19 am
  • Hi, I'm having trouble updating jar files containing udf's. In my testing, I often find that I need to change a udf but when I redeploy a jar for it, I can't seem to get pig to acknowledge the new ...
    Eric CzechEric Czech
    Oct 6, 2011 at 1:41 am
    Oct 6, 2011 at 9:26 pm
  • I'm trying to add and run a new e2e test, but I'm having trouble getting it to run. I copied one of the existing tests in nightly.conf and changed the 'name' value (see attached diff). Next I run ...
    Mark RoddyMark Roddy
    Oct 5, 2011 at 1:27 am
    Oct 5, 2011 at 1:48 am
  • Hi , assuming i have table with the following sample (two column separated by space): a 1,2,3 b 4,5,6 I would like to covert it to a 1 a 2 a 3 b 4 b 5 b 6 basically split the second column and use ...
    Walter ChangWalter Chang
    Oct 3, 2011 at 7:23 am
    Oct 3, 2011 at 7:59 am
  • Hello, I am running for the first time the tutorial at and found that these lines seem not correct: [snip] Edit the build.xml file in the tutorial ...
    Matteo MociMatteo Moci
    Oct 31, 2011 at 9:03 pm
    Nov 1, 2011 at 6:11 pm
  • Hi, In pig 0.8.1 I wrote a custom load func which returns a bag of tuples. Do i have to explicitly specify the schema in that case? Do I have to implement LoadMetadata in that case? If I dont specify ...
    Gayatri RaoGayatri Rao
    Oct 24, 2011 at 4:32 pm
    Oct 24, 2011 at 5:47 pm
  • Hi all, I am wondering if anyone have some examples on how to write a storefunc that will take a look at some of the tuple's value and construct the output directory based on that. For example if my ...
    Felix gaoFelix gao
    Oct 21, 2011 at 11:57 pm
    Oct 22, 2011 at 10:44 pm
  • Is there a limit on: 1) How long the $FILES string can be? 2) Total # of input paths to process? when I do this in my Pig script... *LOAD '$FILES'* * AS (xyz:chararray, abc:int);* ...
    Something SomethingSomething Something
    Oct 18, 2011 at 7:21 am
    Oct 18, 2011 at 4:43 pm
  • Hi How can I ignore the character in a record if it contains the seperator char in the record. eg : Seperator char is ‘|’. The Record values are | seperated xyz|1234|98798|”xyz|abc”| Regards Kiran.G
    Oct 17, 2011 at 12:02 pm
    Oct 17, 2011 at 9:02 pm
  • Recently, I use pig Latin language ,when it comes to limit clause,for example t = load 'input.txt' using PigStorage(','); t2 = order t1 by $1; t3 = limit t2 5; in the process ,the number of reduce is ...
    China AliceChina Alice
    Oct 17, 2011 at 1:40 pm
    Oct 17, 2011 at 3:20 pm
  • Hi I currently have a bunch of data in json format in hdfs. I would like to use pig to load it dedupe it and store it back using snappy compression. Currently I do something like this. raw = LOAD ...
    Cameron GandeviaCameron Gandevia
    Oct 14, 2011 at 1:30 am
    Oct 15, 2011 at 12:13 am
  • Hi, I have two questions when using udf: 1. if i do system.err.println , where does it print to ? is there a log file ? 2. if i did a group , what's the type of second column of the generated table ? ...
    Walter ChangWalter Chang
    Oct 10, 2011 at 3:04 am
    Oct 10, 2011 at 5:08 am
  • I try to follow the example on for Python UDF, but I got the following error * My command: pig -x local ptest.pig * error message 2011-10-06 ...
    Danfeng LiDanfeng Li
    Oct 6, 2011 at 10:36 pm
    Oct 6, 2011 at 11:22 pm
  • Hello, I have been trying to generate 10M rows on 10 nodes and 4G of memory on each. Below my stdout and the last error prior shell disconnect (working through vpn). Job is successful! It took 2655 ...
    Keren OuaknineKeren Ouaknine
    Oct 14, 2011 at 10:12 am
    Oct 14, 2011 at 10:12 am
  • Hey guys, I've encountered with interesting situation when I run a Pig job through Oozie. When Pig doesn't get any input data it throws "Message: org.apache.pig.backend.executionengine.ExecException: ...
    Marek MiglinskiMarek Miglinski
    Oct 6, 2011 at 6:14 pm
    Oct 6, 2011 at 6:14 pm
  • Pig team is happy to announce Pig 0.9.1 release. Apache Pig provides a high-level data-flow language and execution framework for parallel computation on Hadoop clusters. More details about Pig can be ...
    Daniel DaiDaniel Dai
    Oct 5, 2011 at 5:11 pm
    Oct 5, 2011 at 5:11 pm
  • We have Hadoop 0.20.2 and I upgraded from Pig 0.8 to 0.8.1 and now I get the following error (downgrading to 0.8 fixed it): (Hadoop core jar does not have this class) Backend error message ...
    Ayon SinhaAyon Sinha
    Oct 4, 2011 at 9:53 pm
    Oct 4, 2011 at 9:53 pm
  • Hi, I just follow the instructions to run pigmix2 test. After the command "ant jar pigperf", it shows build successfully. But then, when I try to run ./, I get the error: Exception in ...
    Hui QiHui Qi
    Oct 3, 2011 at 6:54 pm
    Oct 3, 2011 at 6:54 pm
Group Navigation
period‹ prev | Oct 2011 | next ›
Group Overview
groupuser @
categoriespig, hadoop

48 users for October 2011

Dmitriy Ryaboy: 28 posts Thejas Nair: 18 posts Stan Rosenberg: 14 posts Kiranprasad: 12 posts Alan Gates: 10 posts Marco Cadetg: 10 posts Gayatri Rao: 9 posts Jonathan Coveney: 9 posts Andrew Clegg: 7 posts Norbert Burger: 7 posts Daniel Dai: 6 posts Alex Rovner: 4 posts Ayon Sinha: 4 posts Something Something: 4 posts Ashutosh Chauhan: 3 posts Babak Farhang: 3 posts Dustin Whitney: 3 posts Gianmarco De Francisci Morales: 3 posts Guy Bayes: 3 posts Jeremy Hanna: 3 posts
show more