Grokbase Groups Pig user April 2010

Search Discussions

50 discussions - 192 posts

  • Hi. I have a question, how to generate complex structures in pig. My question can be illustrated by following example: $ cat test_data.txt 1,a,b,c,d 2,u,v,x, a = load 'test_data.txt' using ...
    Andrey SAndrey S
    Apr 19, 2010 at 6:32 am
    Apr 21, 2010 at 5:44 am
  • Hi All, I am using PIG-1229 to write pig query output to a database. However, I noticed that because of speculative execution, spurious records end up being written. I was wondering if there is a way ...
    Sandesh DevarajuSandesh Devaraju
    Apr 13, 2010 at 11:17 pm
    Apr 14, 2010 at 1:18 am
  • Guys, I have a row containing a map 'id','data', {((1,2)), ((2,3)), ((4,5))} What is the expected behavior when I flatten on that bag? I had expected it to result in 'id','data', (1,2) 'id','data', ...
    Hc busyHc busy
    Apr 2, 2010 at 6:30 pm
    Apr 2, 2010 at 10:30 pm
  • Hi, I've developed an UDF that receives two bags as inputs and outputs one bag. One of the bags is different in every group and the other is always the same. Example code: A = LOAD 'a' AS (group, ...
    Jordi Deu-PonsJordi Deu-Pons
    Apr 30, 2010 at 11:32 am
    May 1, 2010 at 6:38 am
  • There doesnt seem to be any "default" case ( like else ) in SPLIT command. This forces a user to unnecessary create a complex-condition specifically for the "default" case, which could be painful. Am ...
    Prasenjit mukherjeePrasenjit mukherjee
    Apr 14, 2010 at 8:11 pm
    Apr 14, 2010 at 10:40 pm
  • Finally, sit down to work on this, kind of a new feature of pig. Goal: load data from a list of files through one load operator: a = load 'filelist' using someloadfunc(...) as (...); where "filelist" ...
    Jiang lichtJiang licht
    Apr 8, 2010 at 12:36 am
    Apr 8, 2010 at 6:24 am
  • Guys, is there a easy way to generate a unique row id that is guaranteed to be unique? R = foreach T generate *, globally_unique() as id; The reason why I need this is because I have a really nasty ...
    Hc busyHc busy
    Apr 23, 2010 at 6:48 pm
    Apr 23, 2010 at 7:45 pm
  • Hi folks, I just want to get a show of hands -- does anyone actually use the current implementation of HBaseStorage in production? Thanks, -Dmitriy
    Dmitriy RyaboyDmitriy Ryaboy
    Apr 12, 2010 at 6:57 am
    Apr 22, 2010 at 5:30 pm
  • Hello, I want to use IS NULL in a FILTER but the behavior seems to be a Bug: I make a LeftJoin with a result of 7 tuples with fields 's' and 'nick'. 4 tuples have a value for 'nick', the other 3 ...
    Alexander SchätzleAlexander Schätzle
    Apr 21, 2010 at 10:07 am
    Apr 22, 2010 at 6:51 am
  • Hello, We have a file heirarchy we want to be accessable with MR/Hive/Pig. In this way everyone can pick favorites :) Currently the layout looks like this. ...
    Edward CaprioloEdward Capriolo
    Apr 20, 2010 at 4:37 pm
    Apr 20, 2010 at 7:25 pm
  • Hi, I have encountered the following error in using pig's built in function AVG. "ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.AVG as multiple or none of them fit. ...
    Katukuri, JayKatukuri, Jay
    Apr 29, 2010 at 6:11 pm
    May 4, 2010 at 3:47 am
  • hi all, i have a case where i want to avoid a divide by zero case relation2 = foreach relation1 { val = (n==0 ? 0 : val/n); generate val; } the trouble is the right hand side of the bincond; val/n is ...
    Mat KelceyMat Kelcey
    Apr 25, 2010 at 10:39 am
    Apr 28, 2010 at 6:36 pm
  • Hello, I have two pig scripts and a java program that need to be chained in the following order: Pig-script1 -- Java Program -- Pig-Script2 That is the output of the Pig-Script1 in HDFS is the input ...
    Katukuri, JayKatukuri, Jay
    Apr 27, 2010 at 2:42 am
    Apr 28, 2010 at 1:10 am
  • Hello dear pig list, with the help of our "java" guy we've been debugging pig today and seemed to have found a workaround to pig losing contructor arguments during execution. (very annoying if you ...
    Johannes RußekJohannes Rußek
    Apr 15, 2010 at 5:00 pm
    Apr 22, 2010 at 6:53 pm
  • Hi there, I've got a bunch of pig scripts that produce some output that I would like to assert based on some known good data. I've found that running the scripts in local mode (via bash scripts) on ...
    Corbin HoenesCorbin Hoenes
    Apr 15, 2010 at 4:58 pm
    Apr 22, 2010 at 7:54 am
  • Hi, I've hit a somewhat obscure bug in the scripts I'm writing caused by the combination of a few factors: multiple column groups, PARALLEL 1 for grouping, and a nested for-each body following the ...
    Michael DaltonMichael Dalton
    Apr 7, 2010 at 7:08 am
    Apr 8, 2010 at 9:57 am
  • Hey guys, I'm trying to figure out why my pig script crashes half way (after 2 or 3 mr's) The error is below. I took a dump of the plan and didn't see anything suspicious except that the missing temp ...
    Hc busyHc busy
    Apr 29, 2010 at 11:05 pm
    May 1, 2010 at 2:28 am
  • I am a newer to Pig.... I installed pig 0.6.0, but My Cluster Hadoop version is 0.19.2, does that work for me? Thx
    Apr 19, 2010 at 4:51 am
    Apr 19, 2010 at 4:52 pm
  • CDH2 Pig 0.5+. Mapred mode, with CDH2 0.20.1+ Both latest as of 2 weeks ago. Joins on multiple columns have null key values matching. IN = LOAD 'test_nulls' using PigStorage(',') as (ind:chararray, ...
    Scott CareyScott Carey
    Apr 15, 2010 at 11:25 pm
    Apr 16, 2010 at 6:29 pm
  • I'm writing because I was having some issues with the register command not working how I expected it to. Specifically, it seemed like the way a jar was specified as an entry in the -cp list passed to ...
    Eric TschetterEric Tschetter
    Apr 8, 2010 at 2:05 am
    Apr 16, 2010 at 4:40 pm
  • Here are my input: file 'lala' a b c d File 'lele': b c Here are my pig commands: A = load 'lala' as (url); B = load 'lele' as (url); joining = join A by url left outer, B by url USING "replicated"; ...
    Doug LuuDoug Luu
    Apr 9, 2010 at 12:28 am
    Apr 9, 2010 at 5:45 pm
  • I'm writing a user defined LoadFunc. In the bindTo function the fileName parameter appears as the verbatim text passed as the parameter to the LOAD function in my script. In the case where I'm ...
    Andrew RothsteinAndrew Rothstein
    Apr 29, 2010 at 8:13 pm
    Apr 29, 2010 at 9:46 pm
  • okay, some times, I end up with a bag of one item. I wonder if it'll be quicker if we had a udf called takeOne() that takes one thing out of bag and returns it. I know if the bag has one item ...
    Hc busyHc busy
    Apr 27, 2010 at 12:03 am
    Apr 27, 2010 at 5:34 am
  • Hello, can anybody tell me what the LEFT OUTER JOIN produces in case of non matching tuples? I thought it would produce nulls for the right relation but a later test for IS NULL does not produce the ...
    Alexander SchätzleAlexander Schätzle
    Apr 22, 2010 at 9:41 am
    Apr 22, 2010 at 12:42 pm
  • Hi All, so, after Pig generates the DAG with each node representing a MepReduce job, it will sort it in topological order. All the MapReduce jobs, in the form of a sequence, will be packed into a jar ...
    Gang LuoGang Luo
    Apr 17, 2010 at 9:31 pm
    Apr 20, 2010 at 12:47 am
  • Hey folks, My understanding of the elephant bird code Twitter recently released is that 'repeated' protocol buffer fields map to DataBags in Pig. I'm getting an error saying ...
    Vikram OberoiVikram Oberoi
    Apr 19, 2010 at 9:58 pm
    Apr 20, 2010 at 12:28 am
  • I've seen keyword 'arrange' mentioned in an error message: Was expecting one of: "filter" ... "order" ... "arrange" ... "distinct" ... "limit" ... but I could not find any mention of it in ...
    Apr 17, 2010 at 1:00 am
    Apr 19, 2010 at 4:54 pm
  • Hello , I have few questions about the out-of-memory issues that I am running into. If you could please answer them, that will be great. I am using Pig0.40 on hadoop 0.18.3 in map reduce mode. The ...
    Katukuri, JayKatukuri, Jay
    Apr 13, 2010 at 10:54 pm
    Apr 17, 2010 at 2:09 am
  • Hi, Can anyone advice on how to ship a ruby program with the "DEFINE.... SHIP( )" command, when the ruby program is actually on an S3 or HDFS instance instead of on local HDD? This pig script runs ...
    Apr 5, 2010 at 6:59 am
    Apr 7, 2010 at 2:57 am
  • I see in the pig source code and want to do something similar in my own tests.... tried just coping that class but it's failing with "Can't assign requested address" trying to assign ...
    Corbin HoenesCorbin Hoenes
    Apr 29, 2010 at 4:33 pm
    May 12, 2010 at 8:25 pm
  • Folks, I just noticed that the Pig wiki didn't have a powered by page, and added it: It's seeded with some entries I pulled from the Hadoop Powered By page, but I ...
    Dmitriy RyaboyDmitriy Ryaboy
    Apr 22, 2010 at 6:56 pm
    May 6, 2010 at 7:58 pm
  • Is there a JSON load function already available for Pig?
    Anthony UrsoAnthony Urso
    Apr 29, 2010 at 10:47 pm
    Apr 29, 2010 at 11:37 pm
  • Hi We are using PigServer local mode to unit test our pig scripts and under eclipse everything works. Running under maven however fails. We tried a very simply script and it does work but more ...
    Corbin HoenesCorbin Hoenes
    Apr 29, 2010 at 5:16 pm
    Apr 29, 2010 at 5:34 pm
  • guys, I'm looking at the doc's for CROSS join and noticed that it's not really a cross join, more rather just a cross: alias = CROSS alias, alias [, alias …] [PARALLEL n]; there's no join key to do: ...
    Hc busyHc busy
    Apr 28, 2010 at 2:21 am
    Apr 28, 2010 at 10:43 am
  • Hello folks, Those of you in or near NYC and using Lucene or Solr should come to "Lucandra - a Cassandra-based backend for Lucene and Solr" on April 26th: ...
    Otis GospodneticOtis Gospodnetic
    Apr 22, 2010 at 5:31 pm
    Apr 26, 2010 at 4:08 pm
  • Hi, I'm trying to read in a comma-separated file with a simple command: a = load 'myfile' using PigStorage(','); However, some lines in my file have the , inside a quoted string, and Pig is picking ...
    Toli KuznetsToli Kuznets
    Apr 24, 2010 at 2:28 am
    Apr 24, 2010 at 2:29 pm
  • Hello, what does the FLATTEN Operator produce in case of an empty Bag? Example: FLATTEN ({(a)}, {}) What is the result of this? Thx, Alex
    Alexander SchätzleAlexander Schätzle
    Apr 22, 2010 at 9:32 am
    Apr 22, 2010 at 5:28 pm
  • I have a few questions: 1) Is 0.5 the last version of Pig to have its own local mode? It's quite fast! 2) When using Pig's local mode with large aggregate jobs, I often run out of memory. What's the ...
    Brian DonaldsonBrian Donaldson
    Apr 19, 2010 at 7:08 pm
    Apr 20, 2010 at 12:33 am
  • Hi folks, I'm having a new issue with Pig and have not been able to find a solution using Google, was wondering if anyone here knew off the top of their head whats going wrong? The only comparable ...
    Matthew MoloneyMatthew Moloney
    Apr 19, 2010 at 10:08 pm
    Apr 19, 2010 at 11:49 pm
  • Can someone send an example of the proper usage of pig -c <clustername ? I'm having trouble convincing pig to connect to a different cluster than localhost. Thanks, -Brian
    Brian DonaldsonBrian Donaldson
    Apr 19, 2010 at 9:15 pm
    Apr 19, 2010 at 9:27 pm
  • Hello, I'm working with the Cloudera Distribution for Hadoop (CDH3 Beta) which uses Pig 0.5.0. I want to write a custom Load-Function for RDF-Data in N3-Format. My development platform is NetBeans ...
    Alexander SchätzleAlexander Schätzle
    Apr 9, 2010 at 2:28 pm
    Apr 16, 2010 at 1:03 am
  • Dear Pig users, Please assist in obtatining 2 skewed data sets and 2 non-skewed datasets for testing join Appreciate your help Thank you RP
    Radhika ParvathaneniRadhika Parvathaneni
    Apr 16, 2010 at 12:48 am
    Apr 16, 2010 at 12:53 am
  • Hello everybody, I'm trying to split apache log data into two output sets, one for all uris that match a certain criterie and one for uris that don't match a certain criteria. I've been trying SPLIT ...
    Apr 8, 2010 at 10:09 am
    Apr 8, 2010 at 10:50 am
  • Elephant Bird has a nice little counter UDF to allow incrementing hadoop counters but I am not exactly sure how to use it from a pig script. Is it mostly used from a FOREACH statement?
    Corbin HoenesCorbin Hoenes
    Apr 2, 2010 at 2:19 pm
    Apr 2, 2010 at 2:19 pm
  • Hello, I'm looking for some feedback on how real world pig jobs are run in production, how are they managed and scheduled etc...? We are currently using bash scripts and cron to fire jobs off and we ...
    Corbin HoenesCorbin Hoenes
    Apr 1, 2010 at 2:56 am
    Apr 1, 2010 at 3:26 am
  • Hi, The Heap size option in pig is set using JAVA_HEAP_MAX=-Xmx????m Does it override the heap size option set in hadoop-site.xml using <property <name</name <value ...
    Katukuri, JayKatukuri, Jay
    Apr 29, 2010 at 11:07 pm
    Apr 29, 2010 at 11:07 pm
  • I have a pig job that is processing around 61,000 gzip files. It is failing on 1 file, that appears to be corrupted. I have looked at every log file I can find, but can't find the name of the bad ...
    Scott KesterScott Kester
    Apr 21, 2010 at 11:48 pm
    Apr 21, 2010 at 11:48 pm
  • Hey there! Wanted to let you all know about our next meetup, April 28th. We've got a killer new venue thanks to Amazon. Check out the details at the link: ...
    Bradford StephensBradford Stephens
    Apr 21, 2010 at 10:38 pm
    Apr 21, 2010 at 10:38 pm
  • hi Team, Can you please provide me some skewed and non-skewed datasets for checking the performance of different join types in PIG. Thank you in advance Radhika
    Radhikadevi ParvathaneniRadhikadevi Parvathaneni
    Apr 16, 2010 at 10:00 pm
    Apr 16, 2010 at 10:00 pm
  • Hadoop Fans, we wanted to share some news with the Hadoop community about new upcoming courses, new locations, and a substantial discount on next week's session in the Bay Area. We're excited to ...
    Christophe BiscigliaChristophe Bisciglia
    Apr 13, 2010 at 4:44 pm
    Apr 13, 2010 at 4:44 pm
Group Navigation
period‹ prev | Apr 2010 | next ›
Group Overview
groupuser @
categoriespig, hadoop

51 users for April 2010

Dmitriy Ryaboy: 33 posts Hc busy: 31 posts Alan Gates: 15 posts Jr: 7 posts Katukuri, Jay: 7 posts Corbin Hoenes: 6 posts Mridul Muralidharan: 6 posts Alexander Schätzle: 5 posts Ashutosh Chauhan: 5 posts Andrey Stepachev: 4 posts Jiang licht: 4 posts Michael Dalton: 4 posts Brian Donaldson: 3 posts Edward Capriolo: 3 posts Eric Tschetter: 3 posts Prasenjit mukherjee: 3 posts Rekha Joshi: 3 posts Richard Ding: 3 posts Zaki Rahaman: 3 posts Andrew Rothstein: 2 posts
show more