Grokbase Groups Pig user January 2010

Search Discussions

44 discussions - 182 posts

  • Hi folks, I have some initial results to run through with you. I have a number of implementations ready to push onto the Hadoop cluster, but I have finalized the tests for Hive, JAQL and Pig for the ...
    Rob StewartRob Stewart
    Jan 19, 2010 at 12:35 am
    May 18, 2010 at 12:54 am
  • Hi there. I am well underway with comparing Pig, Hive, JAQL etc... The DataGenerator is proving a valuable tool for me. Thanks for that. I have one query. I am able to use it in local mode, no ...
    Rob StewartRob Stewart
    Jan 14, 2010 at 2:50 pm
    Jan 15, 2010 at 7:16 am
  • Hi Mridul, Thanks your approach works fine. This is how my current pig script looks like : define CMD `` SHIP('/root/'); r1 = LOAD '/ip/s3fetch_input_files' AS ...
    Prasenjit mukherjeePrasenjit mukherjee
    Jan 26, 2010 at 3:58 pm
    Jan 30, 2010 at 4:30 am
  • Hi, I've written a Ruby DSL for writing Pig scripts, which I hope might interest some of you. It makes it possible to do a lot of things you can't do in Pig Latin, like loops, reuse code through ...
    Theo HultbergTheo Hultberg
    Jan 13, 2010 at 6:39 pm
    Jan 15, 2010 at 4:58 pm
  • I had a question about storing data to different files. The basic jist of what we are doing is taking a large set of data, performing a group by and then storing each group's dataBag into a distinct ...
    Jennie Cochran-ChinnJennie Cochran-Chinn
    Jan 30, 2010 at 2:32 am
    Feb 4, 2010 at 7:26 am
  • Hi all, Just downloaded it and when following the instruction to build there is compilation errors. Please let me know how to fix this. Thanks, Felix ---------------------------------------- ...
    Felix gaoFelix gao
    Jan 27, 2010 at 4:34 pm
    Feb 1, 2010 at 4:09 am
  • A cluster I'm using was recently upgraded to PIG 0.6. Since then, I've been having problems with scripts that use PiggyBank functions. All the map jobs for the script fail with: WARN ...
    Jeff DaltonJeff Dalton
    Jan 9, 2010 at 11:22 pm
    Jan 11, 2010 at 6:11 pm
  • Hi, I have summary data created in directories every 10 minutes and I have a job that needs to LOAD from all directories in a one hour period. I was hoping to use Hadoop file path globing, but it ...
    Bill GrahamBill Graham
    Jan 21, 2010 at 7:03 pm
    Jan 21, 2010 at 8:34 pm
  • based on a talk i gave at work recently hope it might help someone as an intro to pig mat
    Mat KelceyMat Kelcey
    Jan 17, 2010 at 4:21 am
    Jan 19, 2010 at 9:31 pm
  • I want to use Pig to paralelize processing on a number of requests. There are ~ 300 request which needs to be processed. Each processing consist of following : 1. Fetch file from s3 to local 2. Do ...
    Prasenjit mukherjeePrasenjit mukherjee
    Jan 24, 2010 at 10:46 am
    Feb 10, 2010 at 7:16 am
  • currently, Pig's SUBSTRING (in piggybank) takes parameters (string, startIndex, endIndex). If endindex is past the end of the string, an error is logged and the string is dropped (a null is ...
    Dmitriy RyaboyDmitriy Ryaboy
    Jan 22, 2010 at 6:20 pm
    Jan 25, 2010 at 6:18 am
  • Hi again, I am wanting to know about the using "skewed" optimization is only applicable for JOIN's ? Is it (or will it be) available for GROUP BY's and ORDER BY's ? Or is it not logically possible? ...
    Rob StewartRob Stewart
    Jan 18, 2010 at 10:36 am
    Jan 20, 2010 at 6:15 pm
  • Hi folks, I have a somewhat obvious question, that needs asking (for my sakes). Pig can do Joins, I realise that. But take for example: Table_1 ---------------------- 1 foo.dat 2 bar.dat 3 harry.dat ...
    Rob StewartRob Stewart
    Jan 12, 2010 at 10:58 pm
    Jan 13, 2010 at 12:14 am
  • My apologies if this is the wrong mailing list to ask this question. I've started playing around with Pig and Hadoop, with the intention of using it to do some analysis of a collection of MySQL slow ...
    Chris HartjesChris Hartjes
    Jan 11, 2010 at 6:29 pm
    Jan 12, 2010 at 8:28 am
  • Hi everyone, should be a simple task, but couldn't find an efficient way to do it. I have a relation looks like: a 3 b 10 c 7 I want to convert the raw metrics into percentages. the expected relation ...
    Xiaomeng WanXiaomeng Wan
    Jan 6, 2010 at 8:19 pm
    Jan 7, 2010 at 4:35 am
  • Hello, It seems to have a bug in PIG when ORDER BY is used with the DESC modifier: I have the following script: imei_start = FOREACH sessions GENERATE imei, start; imei_starts = GROUP imei_start BY ...
    Vincent BaratVincent Barat
    Jan 5, 2010 at 2:44 pm
    Jan 6, 2010 at 8:35 am
  • Hello, Is there a (standard) way to output traces from within a custom PIG UDF function (in order to see these traces inside the set of regular traces ?) Thanks
    Vincent BaratVincent Barat
    Jan 5, 2010 at 1:17 pm
    Jan 5, 2010 at 8:34 pm
  • Hi I want to create UDF which compares a tuple with a string value like this. public class IsEqual extends FilterFunc { public Boolean exec(Tuple input,String str) throws IOException { // binary ...
    Ramana VenkataRamana Venkata
    Jan 28, 2010 at 6:16 pm
    Jan 29, 2010 at 9:55 am
  • Hi again, The results have been produced. I can tell you that I made the following improvements: 1. Removed unnecessary "words = FOREACH myinput GENERATE FLATTEN(TOKENIZE(\$0));" 2. Using PigStorage, ...
    Rob StewartRob Stewart
    Jan 20, 2010 at 9:59 am
    Jan 21, 2010 at 7:33 pm
  • I can't figure out how to run a UDF on the result of "GROUP BY" from the current documentation. I'd like to do something along these lines: A = LOAD 'A'; B = LOAD 'B'; C = JOIN A BY $0, B by $0; D = ...
    Anthony UrsoAnthony Urso
    Jan 18, 2010 at 7:54 am
    Jan 18, 2010 at 6:17 pm
  • Hi all, I am currently working on a JIRA which will change the interface of Tuple and DataBag: PIG-1166 < So here I'd like to know whether pig users has ...
    Jeff ZhangJeff Zhang
    Jan 6, 2010 at 4:15 pm
    Jan 15, 2010 at 11:49 pm
  • Is this supported? Say I have a map [f2#(1,6)] I cannot figure out how to de-reference the (1,6) tuple, I either get type conversion failure and () returned, or a 1066 error message "ERROR 1066: ...
    Guy BayesGuy Bayes
    Jan 5, 2010 at 5:10 am
    Jan 6, 2010 at 3:01 am
  • I am using pig-0.3.1. Is there a way to pass hadoop params ( like -Dmapred.task.timeout=0 ) to pig executable ? Even if its not straightforward can I acheive that by modifying the script ...
    Prasenjit mukherjeePrasenjit mukherjee
    Jan 29, 2010 at 3:25 pm
    Jan 29, 2010 at 6:08 pm
  • Hi, Every time I run a Pig script I get a number of Job jars left in the /tmp directory of my client, 1 per MR job it seems. The file names look like /tmp/Job875278192.jar. I have scripts that run ...
    Bill GrahamBill Graham
    Jan 26, 2010 at 6:31 pm
    Jan 27, 2010 at 7:33 pm
  • I have a question on how to handle data that I would usually store in an array, or into a normalized child table in a database. The input data is a set of key/value pairs where one key can be ...
    Jan 21, 2010 at 1:38 pm
    Jan 25, 2010 at 9:22 pm
  • Hi all, I want to use a generate to munge my data into some tuples. (This is a contrived example to illustrate my problem) grunt dump test (a,1,8) (b,2,4) (c,3,1) i can use generate to reorder and ...
    Mat KelceyMat Kelcey
    Jan 21, 2010 at 12:59 pm
    Jan 22, 2010 at 4:50 am
  • Hi, I'm interested in running the PigMix benchmark described at to test some scheduling work in Hadoop. However, I can't find any code for it in Pig 0.5.0 or trunk. ...
    Matei ZahariaMatei Zaharia
    Jan 16, 2010 at 2:31 am
    Jan 16, 2010 at 7:48 pm
  • Hello fellow Pig users. I am brand new to Pig/hadoop, and am having trouble with something that I am guessing is very basic. I have a relation where I did a group by several values, then counted the ...
    Jan 14, 2010 at 9:21 pm
    Jan 15, 2010 at 2:56 pm
  • Hi again, Here's the scenario. I'm doing a word count: ----------- myinput = LOAD 'Inputs/WordCount/wordsTestLarge.dat' USING TextLoader(); words = FOREACH myinput GENERATE FLATTEN(TOKENIZE(\$0)); ...
    Rob StewartRob Stewart
    Jan 13, 2010 at 11:35 am
    Jan 13, 2010 at 3:53 pm
  • Hi, We are trying to test a pig script using JUnit and the PigServer. This pig script uses parameters. It works great running it through the pig command line, but the registerScript function blows ...
    Jan 6, 2010 at 4:56 pm
    Jan 6, 2010 at 6:20 pm
  • When I run Pig, I connect to the local file system, when I run (java -cp pig-0.5.0-core.jar:$HADOOP_HOME/conf org.apache.pig.Main) I connect to hdfs. It seems like Pig is not finding my hadoop conf ...
    Aryeh BerkowitzAryeh Berkowitz
    Jan 27, 2010 at 7:57 pm
    Jan 27, 2010 at 10:07 pm
  • I'm trying to use Pig to solve a fairly common SQL scenario that I run into. I have boiled the problem down into its most basic form: You have a table of transactions defined as so: CREATE TABLE ...
    Mike RobertsMike Roberts
    Jan 26, 2010 at 5:21 pm
    Jan 26, 2010 at 5:34 pm
  • Hey pig gurus - I'm having an issue with cast-to-tuple errors, such as: ERROR 2999: Unexpected internal error.$DefaultDataBagIterator cannot be cast to ...
    Travis CrawfordTravis Crawford
    Jan 16, 2010 at 9:51 pm
    Jan 18, 2010 at 6:01 am
  • Is there any way to retrieve job parameters and task execution / environment variables from inside a pig script? I'm trying to grab the name of the file I am processing using map.input.file thanks Guy
    Guy BayesGuy Bayes
    Jan 17, 2010 at 3:50 am
    Jan 17, 2010 at 4:06 am
  • Hello fellow pig users. I am new to both hadoop and pig, with a background in relational databases and perl scripting. Yesterday I ran a fairly simple pig script that ran in around 45 minutes on our ...
    Wickham, JeremyWickham, Jeremy
    Jan 15, 2010 at 5:32 pm
    Jan 15, 2010 at 6:46 pm
  • Dear users, I'm new to hadoop and pig and I really feel I need some help... I managed to set up a hadoop cluster on two Ubuntu boxes. All hadoop deamons begin without any problems. I can also ...
    Anastasia TheodouliAnastasia Theodouli
    Jan 4, 2010 at 9:01 pm
    Jan 4, 2010 at 9:43 pm
  • Hi, I'm coming across a problem creating a file with DataGenerator and then uploading to the HDFS, Here's what I'm running: ------------------- -conf $conf_file -s , -i allFiles.dat -f theDir1.dat ...
    Rob StewartRob Stewart
    Jan 29, 2010 at 1:16 pm
    Jan 29, 2010 at 1:16 pm
  • Greetings, I'm in the Bay Area doing startup-stuff this week, so Nick Dimiduk will be running this meetup again. You can reach him at and 614-657-0267 A friendly reminder that the ...
    Bradford StephensBradford Stephens
    Jan 26, 2010 at 12:53 am
    Jan 26, 2010 at 12:53 am
  • NOTE: Please forward this message to local use groups, especially those that might be less active on the core Apache mailing lists. Feel free to translate into local languages. Hadoop Fans, Over the ...
    Christophe BiscigliaChristophe Bisciglia
    Jan 25, 2010 at 7:57 pm
    Jan 25, 2010 at 7:57 pm
  • Dear Colleagues! With this letter we send you information about the NINTH INTERNATIONAL SCIENTIFIC - PRACTICAL CONFERENCE “Research, Development and Application of High Technologies in Industry”, - ...
    Organization CommitteeOrganization Committee
    Jan 19, 2010 at 6:01 pm
    Jan 19, 2010 at 6:01 pm
  • Hadoop Fans, We're looking forward to a full house at our upcoming sessions in January (Bay Area) and February (NYC). For those of you who have inquired recently, we're sorry we sold out so early ...
    Christophe BiscigliaChristophe Bisciglia
    Jan 12, 2010 at 8:41 pm
    Jan 12, 2010 at 8:41 pm
  • Greetings, A friendly reminder that the Seattle Hadoop, NoSQL, etc. meetup is on January 27th at University of Washington in the Allen Computer Science Building, room 303. I believe Razorfish will be ...
    Bradford StephensBradford Stephens
    Jan 12, 2010 at 5:15 am
    Jan 12, 2010 at 5:15 am
  • Dear Hadoop and Pig Users, This is just to let you know that the submission deadline for ICS'10 ( is two weeks from today. ICS is a premier forum for research in ...
    Viraj BhatViraj Bhat
    Jan 5, 2010 at 11:23 pm
    Jan 5, 2010 at 11:23 pm
  • (our apologies if you receive this announcement multiple times) ------------------------------------------------------------------- CALL FOR PAPERS The First International Workshop on MapReduce and ...
    Gilles FedakGilles Fedak
    Jan 4, 2010 at 9:18 am
    Jan 4, 2010 at 9:18 am
Group Navigation
period‹ prev | Jan 2010 | next ›
Group Overview
groupuser @
categoriespig, hadoop

47 users for January 2010

Dmitriy Ryaboy: 29 posts Alan Gates: 22 posts Rob Stewart: 21 posts Mridul Muralidharan: 11 posts Jeff Zhang: 9 posts Prasenjit mukherjee: 7 posts Rekha Joshi: 7 posts Bill Graham: 6 posts Vincent Barat: 6 posts Thejas Nair: 5 posts Theo Hultberg: 5 posts Jeff Dalton: 4 posts Mat Kelcey: 4 posts Guy Bayes: 3 posts Scott: 3 posts Amogh Vasekar: 2 posts Bradford Stephens: 2 posts Christophe Bisciglia: 2 posts Felix gao: 2 posts Matei Zaharia: 2 posts
show more