Grokbase Groups Pig user June 2010

Search Discussions

37 discussions - 171 posts

  • Is there a UDF for generating the top X % of results? For example, in a log parsing context, it might be the set of search queries that represent the top 80% of all queries. I see in the piggybank ...
    Dave VinerDave Viner
    Jun 29, 2010 at 3:16 pm
    Jun 30, 2010 at 9:45 pm
  • I'm curious to hear how other people are scaling the code on big Pig projects. Thousands of lines of dataflow code can get pretty hairy for a team of developers - and practices to ensure code sanity ...
    Russell JurneyRussell Jurney
    Jun 22, 2010 at 5:40 pm
    Jun 24, 2010 at 5:25 pm
  • Hi everyone, today I came across with a particular query that I don't know how to model in PIG. Part of my data looks like this: Id1 Id2 Sc Va P1 P2 --------- --------- ----- --------- ----- ---- ...
    Renato Marroquín MogrovejoRenato Marroquín Mogrovejo
    Jun 10, 2010 at 1:55 am
    Jun 23, 2010 at 4:11 am
  • I am trying to get Pig to query my HBase table, but I cannot find any examples on the web. Can anyone provide me with a simple example? The best I could find so far, was a little blurb on the ...
    Pavel GutinPavel Gutin
    Jun 28, 2010 at 3:01 pm
    Jul 1, 2010 at 4:42 pm
  • What is the best way to manage multiple pig jobs such that they can get chance to run simultaneously? W/o priority control, some job will block other jobs (a small job with e.g. a mapper and a ...
    Jiang lichtJiang licht
    Jun 21, 2010 at 5:00 pm
    Jun 28, 2010 at 10:05 pm
  • Title really says it all. I'm looking to run a job that takes the output of a pig script and writes that to an excel file for further analysis. Can somebody point me to a past thread or what commands ...
    Matthew SmithMatthew Smith
    Jun 25, 2010 at 6:14 pm
    Jun 25, 2010 at 10:50 pm
  • Hi all, the JOIN operator of Pig produces duplicate columns in its output. Let's say the statement is like this: C = JOIN A BY (var1, var2), B BY (var1, var2); Then C contains var1 and var2 two times ...
    Alexander SchätzleAlexander Schätzle
    Jun 8, 2010 at 11:46 am
    Jun 11, 2010 at 5:44 pm
  • Wrote a... thing about Pig at LinkedIn that might be useful to some: Russ
    Russell JurneyRussell Jurney
    Jun 24, 2010 at 6:52 pm
    Jun 25, 2010 at 12:40 am
  • I'm having trouble using S3 as a data source for files in the LOAD statement. From research, it definitely appears that I want s3n://, not s3:// because the file was placed there by another ...
    Dave VinerDave Viner
    Jun 14, 2010 at 2:37 am
    Jun 14, 2010 at 3:46 pm
  • Hi all, my script looks like this: A = LOAD 'left_rel.txt' AS (var1, var2); B = LOAD 'right_rel.txt' AS (var1, var3); C = JOIN A BY var1 LEFT OUTER, B BY var1; D = FILTER C BY $2 is null; DUMP D; But ...
    Alexander SchätzleAlexander Schätzle
    Jun 7, 2010 at 7:27 am
    Jun 8, 2010 at 11:11 am
  • lsc = LOAD '/user/hadoop/radio_event/listenerStateChange/2010-06-30' AS (daterecorded:chararray, listener_id:long, to_state:chararray, from_state:chararray); describe lsc; lscg = group lsc by ...
    Jun 30, 2010 at 9:19 pm
    Jun 30, 2010 at 10:35 pm
  • Is there any documentation on how to read this output when I 'set debug on' I get in my reducer syslog: DEBUG: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - New ...
    Corbin HoenesCorbin Hoenes
    Jun 16, 2010 at 6:53 pm
    Jun 17, 2010 at 5:06 pm
  • Hello, this is my first contact with Pig and its community ;-) I need to generate all the possible permutations from a bag. Let me explain it with examples: A = LOAD 'data' AS f1:chararray; DUMP A; ...
    Jun 12, 2010 at 5:45 pm
    Jun 16, 2010 at 6:48 pm
  • Does there exist any reporting tools that can run on top of pig or using pig? Or does everyone load TSV results in some type of excel. I will need to create reports with labels and sequential pig ...
    Jun 14, 2010 at 11:25 pm
    Jun 15, 2010 at 2:43 am
  • Hi I'm absolutely new with using Pig, only just picked it up like 3 days ago, and still trying to wrap my head around it. I'm stuck with putting together a query. A DUMP of my sample dataset is as ...
    Jun 30, 2010 at 4:41 pm
    Jul 1, 2010 at 3:35 pm
  • Hello, Does this make sense? I'm generate reports using Pig where I only want to report on rows matching a set of regular expressions, but those regular expressions are pretty numerous. Some reports ...
    Mike SubelskyMike Subelsky
    Jun 29, 2010 at 6:14 pm
    Jun 29, 2010 at 7:38 pm
  • Hello, I'm implementing a custom Store UDF using StoreFuncInterface. I need access to the ResourceSchema object each time I do a putNext operation, but am unable to do this since checkSchema() [which ...
    Harsh JHarsh J
    Jun 28, 2010 at 10:08 am
    Jun 28, 2010 at 4:10 pm
  • Hello Everybody, I'm looking for a way to run REPLACE on multiple columns in a dataset to escape some characters that would confuse loading after processing in pig. Is there an easy way to do that ...
    Jun 16, 2010 at 1:11 pm
    Jun 18, 2010 at 6:21 pm
  • Need to have a bunch of non related aliases into a single alias (so I can pass this alias into my UDF). Is it possible to do this? Or is it possible to pass a number of Tuple objects into an ...
    Corbin HoenesCorbin Hoenes
    Jun 18, 2010 at 7:17 am
    Jun 18, 2010 at 7:35 am
  • Hi, I am trying to find a way to return only DISTINCT values within a bag. Any ideas? Thanks Scott A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray); B = FILTER A BY mname IS ...
    Scott WineScott Wine
    Jun 16, 2010 at 8:14 pm
    Jun 16, 2010 at 8:35 pm
  • I am having some trouble getting cogroup and flattening to work as I'd like. The cogroup statement looks like: cg = COGROUP A BY aid INNER, B BY bid; The cg group has rows in which the information in ...
    Dave VinerDave Viner
    Jun 1, 2010 at 6:31 pm
    Jun 2, 2010 at 1:25 am
  • Hi, What would be the simple of way of writing the Exist clause (oracle) in pig.
    Syed WastiSyed Wasti
    Jun 21, 2010 at 6:13 pm
    Jun 21, 2010 at 9:41 pm
  • Hi, I need a suggestion on how I can write this query in pig. I have 3 tables, some records of table A may be present in table B and some of A in table C. I want to write a query where I will pick ...
    Syed WastiSyed Wasti
    Jun 16, 2010 at 7:14 am
    Jun 16, 2010 at 7:29 am
  • it says that I should use the setUDFContext function to communicate between the getTuple function and the LoadPushDown.pushProjection(RequiredFieldList) function implementations. Where do I put the ...
    Andrew RothsteinAndrew Rothstein
    Jun 4, 2010 at 2:00 am
    Jun 15, 2010 at 3:14 pm
  • Hello, I face a difficult issue: I need to extract some data from HBase columns whose names include non ASCII characters like "Cinéma" or event white spaces " " and coma ",". exemple: activity = LOAD ...
    Vincent BaratVincent Barat
    Jun 14, 2010 at 7:54 pm
    Jun 14, 2010 at 7:54 pm
  • I am having a problem getting Pig 0.7.0 to use a variable I add from a UDF. Here's the basic pig script: LOGS = LOAD '$INPUT' USING PigStorage('\t') ; IMP_SID = FOREACH IMPRESSIONS_ONLY GENERATE *, ...
    Dave VinerDave Viner
    Jun 12, 2010 at 4:52 am
    Jun 14, 2010 at 2:30 am
  • Hi, I have written a UDF to sort the grouped data on a given field (in my case date field) and return the sorted data in a databag. I want my method to get the schema of my fields within the input ...
    Syed WastiSyed Wasti
    Jun 11, 2010 at 11:29 pm
    Jun 12, 2010 at 9:06 pm
  • Hello Fellow Hadoopists, We are meeting at 7:15 pm on June 17th at the University Heights Community Center 5031 University Way NE Seattle WA 98105 Room #110 We are looking for people to present. So ...
    Sean Jensen-GreySean Jensen-Grey
    Jun 10, 2010 at 1:41 am
    Jun 10, 2010 at 7:47 pm
  • Hello, How do we implement the following "if-else" SQL logic in Pig ? Select column-1, column-2, if(column-3 =200, 20, column-3) column_new_3 In the above SQL , if the value of a field (column-3) is ...
    Katukuri, JayKatukuri, Jay
    Jun 9, 2010 at 11:59 pm
    Jun 10, 2010 at 12:34 am
  • Hi all, I'm Carmelo Badalamenti, and I'm working with Pig with great satisfaction :) I have a problem, indeed... I Load a file into pig script like this: raw = LOAD 'filename' USING PigStorage(',') ...
    Carmelo BadalamentiCarmelo Badalamenti
    Jun 9, 2010 at 9:52 am
    Jun 10, 2010 at 12:31 am
  • Hi all, the conditions of the Merge Join say that there are only FILTER and FOREACH allowed between the LOAD and the Merge Join. I wonder why it is not possible to order the loaded input on the join ...
    Alexander SchätzleAlexander Schätzle
    Jun 4, 2010 at 9:49 am
    Jun 4, 2010 at 1:50 pm
  • How does PIG 0.7.0 handle schema of data? Sometimes, 1) there is a missing column in the input data to be loaded and the total number of columns to be read is smaller than that specified in the ...
    Jiang lichtJiang licht
    Jun 28, 2010 at 7:04 pm
    Jun 28, 2010 at 7:04 pm
  • I finally got around to open sourcing the beginnings of the 'PigPen' web app I was working on here: Video of it in action is here: ...
    Russell JurneyRussell Jurney
    Jun 24, 2010 at 4:06 am
    Jun 24, 2010 at 4:06 am
  • I have a pig job which supposes to save some result in the hdfs. Most time it runs just fine. But it failed once and no result was generated. There was no dead node. And namenode, tasktracker and ...
    Jiang lichtJiang licht
    Jun 24, 2010 at 12:58 am
    Jun 24, 2010 at 12:58 am
  • I have a UDF using: FileLocalizer.openDFSFile(filename); in the ctor; it complains it can't open the file while running locally--does this opening of the file need to exist somewhere else like in the ...
    Corbin HoenesCorbin Hoenes
    Jun 22, 2010 at 5:55 pm
    Jun 22, 2010 at 5:55 pm
  • Alan GatesAlan Gates
    Jun 1, 2010 at 5:04 pm
    Jun 1, 2010 at 5:04 pm
Group Navigation
period‹ prev | Jun 2010 | next ›
Group Overview
groupuser @
categoriespig, hadoop

47 users for June 2010

Hc busy: 17 posts Dmitriy Ryaboy: 14 posts Alan Gates: 12 posts Russell Jurney: 12 posts Dave Viner: 9 posts Jiang licht: 7 posts Scott Carey: 7 posts Syed Wasti: 7 posts Pavel Gutin: 6 posts Thejas Nair: 5 posts Alexander Schätzle: 4 posts Ankur C. Goel: 4 posts Ashutosh Chauhan: 4 posts BalaSundaraRaman: 4 posts Corbin Hoenes: 4 posts Renato Marroquín Mogrovejo: 4 posts Dave Viner: 3 posts Elein: 3 posts Mark Stetzer: 3 posts Matthew Smith: 3 posts
show more