Grokbase Groups Pig user March 2010

Search Discussions

60 discussions - 237 posts

  • Hi folks, We (but mostly Kevin Weil) just open-sourced some of the code we use at Twitter to make working with Hadoop and Pig easier. Most of what is currently included in "Elephant Bird" deals with ...
    Dmitriy RyaboyDmitriy Ryaboy
    Mar 29, 2010 at 9:51 pm
    Apr 7, 2010 at 7:32 pm
  • Hi, I have a function (eval) that needs to use an external jar. In M/R world this can be accomplished by uploading the jar to the dfs and using DistributedCache.addFileToClassPath. How do I do the ...
    Tamir KamaraTamir Kamara
    Mar 10, 2010 at 8:53 am
    Mar 15, 2010 at 9:32 pm
  • Classification: UNCLASSIFIED Caveats: NONE Hello, I'm using pig0.6.0 running the following script on a 27 datanode cluster running RedHat Enterprise 5.4: -- Holds the Pig UDF wrapper around the ...
    Winkler, Robert (Civ, ARL/CISD)Winkler, Robert (Civ, ARL/CISD)
    Mar 5, 2010 at 6:12 pm
    Mar 12, 2010 at 1:10 am
  • hi, I've got a long running daemon application that periodically kicks of Pig jobs via quartz (Pig version 0.4.0). It uses a wrapper class that initilizes an instance of PigServer before parsing and ...
    Bill GrahamBill Graham
    Mar 9, 2010 at 5:30 pm
    Mar 22, 2010 at 5:14 pm
  • Hello everybody, I've been trying to use org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor from piggybank that comes with pig 0.6.0, but i don't seem to be able to set the output ...
    Johannes RußekJohannes Rußek
    Mar 16, 2010 at 6:59 pm
    Mar 17, 2010 at 5:29 pm
  • Guys, I just ran into a weird exception 500 lines into writing a pig script... Below attached is the error. Does anybody have any idea about how to debug this? I don't even know which step of my 500 ...
    Hc busyHc busy
    Mar 9, 2010 at 1:26 am
    Mar 17, 2010 at 3:44 am
  • On using embedded Pig Server and registering a pig script for execution 1) Does Multi Query Optimization happens automatically, or has to explicitly told so. 2) Logical Plan. What one can infer out ...
    Rohan RaiRohan Rai
    Mar 4, 2010 at 5:15 pm
    Mar 6, 2010 at 12:55 pm
  • Hello everybody, I've noticed that when i run some pig scripts, the creation of the actual hadoop jobs takes quite a while, sometimes more than 15 minutes until the first map/reduce job starts. How ...
    Mar 26, 2010 at 11:43 am
    Mar 26, 2010 at 6:16 pm
  • Hi, Could pig recognize files name are importing ? If could, how to do ? I want to combine them according filename. Exp: google_2009_12_21.csv, google_2010_01_21.csv, google_2010_02_21.csv, ...
    Mar 1, 2010 at 11:10 am
    Mar 4, 2010 at 6:48 am
  • Not just processing different formats or performing pre-processing stuff as discussed in pig udf manual. What I want is that a function that can decide where to find what files to load and then load ...
    Jiang lichtJiang licht
    Mar 10, 2010 at 1:44 am
    Mar 16, 2010 at 1:25 am
  • Guys, I have some data that has null bag. Looking at the it seems that it is an error condition for the bag passed in to be null (instead of zero for example.) I tried to change it to an ...
    Hc busyHc busy
    Mar 5, 2010 at 11:06 pm
    Mar 10, 2010 at 1:46 pm
  • I'm writing a fairly involved pig script to do some data munging and after all was said and done, I end up getting the warnings mentioned in the subject. The grunt output is actually: 2010-03-26 ...
    Eric TschetterEric Tschetter
    Mar 26, 2010 at 5:56 pm
    Apr 8, 2010 at 4:38 pm
  • Hello Everybody, i've been searching the web trying to find a nice to way to use GeoIP from pig / hadoop. So far the only things i've been able to find is a perlscript to use with streaming and some ...
    Johannes RußekJohannes Rußek
    Mar 17, 2010 at 5:34 pm
    Mar 19, 2010 at 4:22 am
  • okay. Here's the bag that I have: {group: (a: int,b: chararray,c: chararray,d: int), TABLE: {number1: int, number2:int}} and I want to do this grunt CALCULATE= FOREACH TABLE_group GENERATE group, ...
    Hc busyHc busy
    Mar 8, 2010 at 10:44 pm
    Mar 10, 2010 at 6:16 pm
  • Hi there, I am trying to load in a relation with nested bags using something like a = LOAD '*' AS (x:chararray, y:bag{t:tuple(z:chararray, b:bag{t1:tuple(u:chararray, v:long)})}); But get the ...
    Xiaomeng WanXiaomeng Wan
    Mar 8, 2010 at 4:19 pm
    Mar 8, 2010 at 9:17 pm
  • Hi everyone, I posted a patch that adds a few common String functions to the piggybank (things like Length and Split). Alan and I are discussing some naming and design issues, would appreciate your ...
    Dmitriy RyaboyDmitriy Ryaboy
    Mar 1, 2010 at 10:34 pm
    Mar 4, 2010 at 7:24 pm
  • Here is my data file : a,b,c,{(15,good),(24,total),(9,bad)} a,b,d,{(2,bad),(6,good),(8,total)} I tried following combinations but neither of then work : r1 = load '/tmp/prasen/foo1.txt' using ...
    Prasenjit mukherjeePrasenjit mukherjee
    Mar 1, 2010 at 9:57 am
    Mar 1, 2010 at 9:57 am
  • Hi All, Is there a way to get current InputSplit in a UDF (more specifically, a filter function)? I have a filter function that validates input rows according to certain criteria and I would like to ...
    Sandesh DevarajuSandesh Devaraju
    Mar 30, 2010 at 8:53 pm
    Mar 31, 2010 at 3:05 pm
  • Hi, I am trying to get the elements of B not in A. My code is like this C = JOIN A BY id RIGHT OUTER, B BY id; D = FILTER C BY A::id is null; id is an INT, and this doesn't work. I also tried A::id ...
    Kent ShiKent Shi
    Mar 30, 2010 at 2:06 am
    Mar 30, 2010 at 6:14 pm
  • Hi Alexander Alexander Schätzle wrote: Translating the SPARQL algebra into PigLatin scripts is IMHO a nice idea. With a small(?) amount of glue code you join two big communities delivering value to ...
    Paolo CastagnaPaolo Castagna
    Mar 26, 2010 at 10:48 am
    Mar 29, 2010 at 4:30 pm
  • Just a quick q, is there a way to evaluate "contains"? contains((1,2), {(2,4),(4,5),(6,7),(1,2)}); thnx
    Hc busyHc busy
    Mar 17, 2010 at 6:44 pm
    Mar 20, 2010 at 12:34 am
  • Hi everyone. I implemented LoadFunc and Slicer interface and I tried to use it with the following pig script: a = LOAD '/data/part-00000.anc.gz' USING MyStorage() AS (source:chararray); b = FILTER a ...
    Bae, Jae HyeonBae, Jae Hyeon
    Mar 16, 2010 at 6:02 am
    Mar 16, 2010 at 9:29 am
  • I am trying to play around with pig, but i have problems running even the basic tutorial. Both scripts just hang. I tried it in cygwin in vista, as well as on a RedHat box with the same results. ...
    Pavel GutinPavel Gutin
    Mar 9, 2010 at 4:28 pm
    Mar 15, 2010 at 6:54 pm
  • Hi everyone I want to use LZO splittable IO in pig. I saw Kevin Weil's github repository that he wrote LZO LoadFunc functionalities but he didn't contribute it yet. According to him, For Pig jobs, ...
    Bae, Jae HyeonBae, Jae Hyeon
    Mar 12, 2010 at 6:54 am
    Mar 12, 2010 at 8:55 am
  • It seems like at the end of 2008 it wasn't possible to get the field aliases in a custom store function. Today, it seems that ...
    Sean TimmSean Timm
    Mar 8, 2010 at 6:56 pm
    Mar 8, 2010 at 7:04 pm
  • grunt r1 = load '/tmp/agg_qat.txt' USING PigStorage (',') AS (f1:chararray, f2:chararray,f3:chararray, i1:int,i2:int,i3:int); grunt tmp = group r1 by (f1,f2); grunt tmp1 = foreach tmp generate ...
    Prasenjit mukherjeePrasenjit mukherjee
    Mar 1, 2010 at 9:52 am
    Mar 1, 2010 at 12:10 pm
  • Hello, I'm trying to use a Pig variable in a foreach and I can't. I'm passing a var to the pig script. Let's call it $LOCALE and in my script I want to: ' varOut = FOREACH varIn GENERATE $0, $LOCALE; ...
    Alex ParvulescuAlex Parvulescu
    Mar 31, 2010 at 2:57 pm
    Apr 1, 2010 at 8:42 am
  • Hello, I have a quick question for the big brains here: is PIG able to take advantage of reusing intermediate results used in several statemants, for example: 1- load table1 and table2 2- group ...
    Vincent BaratVincent Barat
    Mar 23, 2010 at 1:26 pm
    Mar 24, 2010 at 4:53 am
  • Hi, I have a question, in a remark that Alan Gates made a few months ago on these mailing lists regarding the computability and expressibility of Pig, Hive, and the MapReduce model. In particular, it ...
    Rob StewartRob Stewart
    Mar 20, 2010 at 11:55 pm
    Mar 23, 2010 at 2:25 am
  • This is so basic I'm almost afraid to ask but I don't seem to be able to find the answer. I have pig and Hadoop running and I have done the tutorials through the command line. My question is: how do ...
    Katie legereKatie legere
    Mar 22, 2010 at 4:21 pm
    Mar 23, 2010 at 12:00 am
  • Hello everybody, I'm trying to use pig with compressed input files. I have a bunch of 1-2GB big apache log files which are compressed down to 30-40MB by using bzip2. I tried to simply load the .bz2 ...
    Johannes RußekJohannes Rußek
    Mar 17, 2010 at 3:15 pm
    Mar 17, 2010 at 4:21 pm
  • Hi, I was wondering where I could get the schema information (basically the output of 'describe') to use in a StoreFunc. I'd like to get the schema for all tuples in a given bag ahead of time ...
    Zaki rahamanZaki rahaman
    Mar 16, 2010 at 12:08 am
    Mar 16, 2010 at 7:25 pm
  • I always receive the failed deviled message from google. -- Best Regards Jeff Zhang
    Jeff ZhangJeff Zhang
    Mar 11, 2010 at 2:08 am
    Mar 11, 2010 at 8:57 pm
  • Just wondering if Pig can work with HBase to load/store data and process fields of various types in Hbase (or is there a roadmap for this)? Thanks, -- Michael
    Jiang lichtJiang licht
    Mar 2, 2010 at 7:32 pm
    Mar 3, 2010 at 12:19 am
  • I have written several variations of UDFs to achieve the following: They take a bag of tuples, sort the tuples on a field, then perform operations using this order. That could be simply returning a ...
    Russell JurneyRussell Jurney
    Mar 29, 2010 at 9:08 pm
    Mar 31, 2010 at 5:34 pm
  • Is compiling pig to run against Hadoop 0.19.x supported anymore? If so, how would one accomplish this? Cheers, Anthony
    Anthony UrsoAnthony Urso
    Mar 30, 2010 at 2:12 am
    Mar 30, 2010 at 6:03 pm
  • I have a sequence of uniq tokens and I would like to add a sequential unique integer id to each token. I appreciate that this is going to be difficult because mapping is likely to be performed in ...
    Edward MiddletonEdward Middleton
    Mar 27, 2010 at 5:36 pm
    Mar 27, 2010 at 5:58 pm
  • Hello Guys! Is there a method in the UDF classes i can overwrite for when the class/object is being destroyed? I'd need to let go of some resources when this happens. Or is the jvm purged after a pig ...
    Mar 23, 2010 at 12:04 pm
    Mar 23, 2010 at 12:11 pm
  • Hi all, Is there any way to pass to pig? I notice that my multi-core nodes are under-utilized with pig job, only processor is busy all the time. I notice is set to ...
    Qiming HeQiming He
    Mar 21, 2010 at 4:24 pm
    Mar 22, 2010 at 4:16 pm
  • Hi, I wonder if it is faster to firstly extract only the interesting fiels from a bag of tuples before performing other operations on it, or if it is automatically handled by the optimizer: For ...
    Vincent BaratVincent Barat
    Mar 18, 2010 at 10:23 pm
    Mar 18, 2010 at 10:30 pm
  • Can any one give advice on testing the output of a hadoop system or unit testing for pig scripts? The best I have so far is to create a minimal data set and run that through the the system, and check ...
    Paul RogersPaul Rogers
    Mar 18, 2010 at 9:01 pm
    Mar 18, 2010 at 9:23 pm
  • I'm using the 0.6.0 build and I was trying out the ILLUSTRATE command the example given in the Pig Latin Reference Manual 2 for JOIN (inner). I have the following pig script: A = LOAD 'data1' AS ...
    Ming-Hay LukMing-Hay Luk
    Mar 17, 2010 at 8:58 am
    Mar 17, 2010 at 9:17 am
  • forwarding to hdfs and pig mailing-lists for responses from wider audience. ---------- Forwarded message ---------- From: prasenjit mukherjee < Date: Tue, Mar 16, 2010 at 11:47 AM ...
    Prasenjit mukherjeePrasenjit mukherjee
    Mar 16, 2010 at 12:36 pm
    Mar 16, 2010 at 5:19 pm
  • Hi, quick question: What is the origin of the formula for specifying the number of reducers for a Pig job (using PARALLEL). I have it as: <number of nodes * <maximum number of reducers per node * 0.9 ...
    Rob StewartRob Stewart
    Mar 16, 2010 at 10:35 am
    Mar 16, 2010 at 1:16 pm
  • Is there a way to use the bincond operator, or some other way to do a conditional exit in Pig script? Say if the value of variable given to script is x then do this, but exit if it is y. Thanks!
    Kelvin MossKelvin Moss
    Mar 9, 2010 at 4:19 am
    Mar 9, 2010 at 10:34 pm
  • Hi, Does anyone have experience running MultiStorage-like UDF on Elastic MapReduce? Basically we are trying to store output into multiple directories based on certain field values. We have some ...
    Jialong WuJialong Wu
    Mar 4, 2010 at 7:16 pm
    Mar 4, 2010 at 7:24 pm
  • Hi, I suddenly started getting this error and I don't understand why since the input path exists on the dfs. It started a couple of days ago when I checked out trunk again (the previous working ...
    Tamir KamaraTamir Kamara
    Mar 2, 2010 at 6:54 am
    Mar 2, 2010 at 12:58 pm
  • I'd be happy to put these together into a NOOB faq =). Please feel free to forward me to the docs where I might have missed this. How do I generate a simple Tuple? I have a value, say a sum, and I ...
    Cory RadcliffCory Radcliff
    Mar 1, 2010 at 7:29 am
    Mar 1, 2010 at 4:50 pm
  • Thanks, Deqiang
    Deqiang sunDeqiang sun
    Mar 1, 2010 at 2:26 pm
    Mar 1, 2010 at 4:32 pm
  • the store line has backticks around id.out and it causes this very confusing message to show up in the logs with pig 0.5: Pig Stack ...
    Wilkes, ChrisWilkes, Chris
    Mar 27, 2010 at 12:06 am
    Mar 27, 2010 at 12:06 am
Group Navigation
period‹ prev | Mar 2010 | next ›
Group Overview
groupuser @
categoriespig, hadoop

63 users for March 2010

Dmitriy Ryaboy: 30 posts Hc busy: 22 posts Jr: 14 posts Alan Gates: 13 posts Mridul Muralidharan: 11 posts Ashutosh Chauhan: 10 posts Jeff Zhang: 9 posts Rohan Rai: 8 posts Zaki Rahaman: 8 posts Bill Graham: 7 posts Tamir Kamara: 7 posts Bae, Jae Hyeon: 6 posts Jiang licht: 6 posts Prasenjit mukherjee: 5 posts Thejas Nair: 5 posts Rob Stewart: 4 posts Winkler, Robert (Civ, ARL/CISD): 4 posts Jumping: 3 posts Richard Ding: 3 posts Romain Rigaux: 3 posts
show more