Search Discussions
-
Is there a UDF for generating the top X % of results? For example, in a log parsing context, it might be the set of search queries that represent the top 80% of all queries. I see in the piggybank ...
Dave Viner
Jun 29, 2010 at 3:16 pm
Jun 30, 2010 at 9:45 pm -
I'm curious to hear how other people are scaling the code on big Pig projects. Thousands of lines of dataflow code can get pretty hairy for a team of developers - and practices to ensure code sanity ...
Russell Jurney
Jun 22, 2010 at 5:40 pm
Jun 24, 2010 at 5:25 pm -
Hi everyone, today I came across with a particular query that I don't know how to model in PIG. Part of my data looks like this: Id1 Id2 Sc Va P1 P2 --------- --------- ----- --------- ----- ---- ...
Renato Marroquín Mogrovejo
Jun 10, 2010 at 1:55 am
Jun 23, 2010 at 4:11 am -
I am trying to get Pig to query my HBase table, but I cannot find any examples on the web. Can anyone provide me with a simple example? The best I could find so far, was a little blurb on the ...
Pavel Gutin
Jun 28, 2010 at 3:01 pm
Jul 1, 2010 at 4:42 pm -
What is the best way to manage multiple pig jobs such that they can get chance to run simultaneously? W/o priority control, some job will block other jobs (a small job with e.g. a mapper and a ...
Jiang licht
Jun 21, 2010 at 5:00 pm
Jun 28, 2010 at 10:05 pm -
Title really says it all. I'm looking to run a job that takes the output of a pig script and writes that to an excel file for further analysis. Can somebody point me to a past thread or what commands ...
Matthew Smith
Jun 25, 2010 at 6:14 pm
Jun 25, 2010 at 10:50 pm -
Hi all, the JOIN operator of Pig produces duplicate columns in its output. Let's say the statement is like this: C = JOIN A BY (var1, var2), B BY (var1, var2); Then C contains var1 and var2 two times ...
Alexander Schätzle
Jun 8, 2010 at 11:46 am
Jun 11, 2010 at 5:44 pm -
Wrote a... thing about Pig at LinkedIn that might be useful to some: http://sna-projects.com/blog/2010/06/when-pigs-fly-apache-pig-open-source-and-understanding-systems/ Russ
Russell Jurney
Jun 24, 2010 at 6:52 pm
Jun 25, 2010 at 12:40 am -
I'm having trouble using S3 as a data source for files in the LOAD statement. From research, it definitely appears that I want s3n://, not s3:// because the file was placed there by another ...
Dave Viner
Jun 14, 2010 at 2:37 am
Jun 14, 2010 at 3:46 pm -
Hi all, my script looks like this: A = LOAD 'left_rel.txt' AS (var1, var2); B = LOAD 'right_rel.txt' AS (var1, var3); C = JOIN A BY var1 LEFT OUTER, B BY var1; D = FILTER C BY $2 is null; DUMP D; But ...
Alexander Schätzle
Jun 7, 2010 at 7:27 am
Jun 8, 2010 at 11:11 am -
lsc = LOAD '/user/hadoop/radio_event/listenerStateChange/2010-06-30' AS (daterecorded:chararray, listener_id:long, to_state:chararray, from_state:chararray); describe lsc; lscg = group lsc by ...
Elein
Jun 30, 2010 at 9:19 pm
Jun 30, 2010 at 10:35 pm -
Is there any documentation on how to read this output when I 'set debug on' I get in my reducer syslog: DEBUG: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - New ...
Corbin Hoenes
Jun 16, 2010 at 6:53 pm
Jun 17, 2010 at 5:06 pm -
Hello, this is my first contact with Pig and its community ;-) I need to generate all the possible permutations from a bag. Let me explain it with examples: A = LOAD 'data' AS f1:chararray; DUMP A; ...
Christian
Jun 12, 2010 at 5:45 pm
Jun 16, 2010 at 6:48 pm -
Does there exist any reporting tools that can run on top of pig or using pig? Or does everyone load TSV results in some type of excel. I will need to create reports with labels and sequential pig ...
Elein
Jun 14, 2010 at 11:25 pm
Jun 15, 2010 at 2:43 am -
Hi I'm absolutely new with using Pig, only just picked it up like 3 days ago, and still trying to wrap my head around it. I'm stuck with putting together a query. A DUMP of my sample dataset is as ...
Diagnostix
Jun 30, 2010 at 4:41 pm
Jul 1, 2010 at 3:35 pm -
Hello, Does this make sense? I'm generate reports using Pig where I only want to report on rows matching a set of regular expressions, but those regular expressions are pretty numerous. Some reports ...
Mike Subelsky
Jun 29, 2010 at 6:14 pm
Jun 29, 2010 at 7:38 pm -
Hello, I'm implementing a custom Store UDF using StoreFuncInterface. I need access to the ResourceSchema object each time I do a putNext operation, but am unable to do this since checkSchema() [which ...
Harsh J
Jun 28, 2010 at 10:08 am
Jun 28, 2010 at 4:10 pm -
Hello Everybody, I'm looking for a way to run REPLACE on multiple columns in a dataset to escape some characters that would confuse loading after processing in pig. Is there an easy way to do that ...
Jr
Jun 16, 2010 at 1:11 pm
Jun 18, 2010 at 6:21 pm -
Need to have a bunch of non related aliases into a single alias (so I can pass this alias into my UDF). Is it possible to do this? Or is it possible to pass a number of Tuple objects into an ...
Corbin Hoenes
Jun 18, 2010 at 7:17 am
Jun 18, 2010 at 7:35 am -
Hi, I am trying to find a way to return only DISTINCT values within a bag. Any ideas? Thanks Scott A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray); B = FILTER A BY mname IS ...
Scott Wine
Jun 16, 2010 at 8:14 pm
Jun 16, 2010 at 8:35 pm -
I am having some trouble getting cogroup and flattening to work as I'd like. The cogroup statement looks like: cg = COGROUP A BY aid INNER, B BY bid; The cg group has rows in which the information in ...
Dave Viner
Jun 1, 2010 at 6:31 pm
Jun 2, 2010 at 1:25 am -
Hi, What would be the simple of way of writing the Exist clause (oracle) in pig.
Syed Wasti
Jun 21, 2010 at 6:13 pm
Jun 21, 2010 at 9:41 pm -
Hi, I need a suggestion on how I can write this query in pig. I have 3 tables, some records of table A may be present in table B and some of A in table C. I want to write a query where I will pick ...
Syed Wasti
Jun 16, 2010 at 7:14 am
Jun 16, 2010 at 7:29 am -
it says that I should use the setUDFContext function to communicate between the getTuple function and the LoadPushDown.pushProjection(RequiredFieldList) function implementations. Where do I put the ...
Andrew Rothstein
Jun 4, 2010 at 2:00 am
Jun 15, 2010 at 3:14 pm -
Hello, I face a difficult issue: I need to extract some data from HBase columns whose names include non ASCII characters like "Cinéma" or event white spaces " " and coma ",". exemple: activity = LOAD ...
Vincent Barat
Jun 14, 2010 at 7:54 pm
Jun 14, 2010 at 7:54 pm -
I am having a problem getting Pig 0.7.0 to use a variable I add from a UDF. Here's the basic pig script: LOGS = LOAD '$INPUT' USING PigStorage('\t') ; IMP_SID = FOREACH IMPRESSIONS_ONLY GENERATE *, ...
Dave Viner
Jun 12, 2010 at 4:52 am
Jun 14, 2010 at 2:30 am -
Hi, I have written a UDF to sort the grouped data on a given field (in my case date field) and return the sorted data in a databag. I want my method to get the schema of my fields within the input ...
Syed Wasti
Jun 11, 2010 at 11:29 pm
Jun 12, 2010 at 9:06 pm -
Hello Fellow Hadoopists, We are meeting at 7:15 pm on June 17th at the University Heights Community Center 5031 University Way NE Seattle WA 98105 Room #110 We are looking for people to present. So ...
Sean Jensen-Grey
Jun 10, 2010 at 1:41 am
Jun 10, 2010 at 7:47 pm -
Hello, How do we implement the following "if-else" SQL logic in Pig ? Select column-1, column-2, if(column-3 =200, 20, column-3) column_new_3 In the above SQL , if the value of a field (column-3) is ...
Katukuri, Jay
Jun 9, 2010 at 11:59 pm
Jun 10, 2010 at 12:34 am -
Hi all, I'm Carmelo Badalamenti, and I'm working with Pig with great satisfaction :) I have a problem, indeed... I Load a file into pig script like this: raw = LOAD 'filename' USING PigStorage(',') ...
Carmelo Badalamenti
Jun 9, 2010 at 9:52 am
Jun 10, 2010 at 12:31 am -
Hi all, the conditions of the Merge Join say that there are only FILTER and FOREACH allowed between the LOAD and the Merge Join. I wonder why it is not possible to order the loaded input on the join ...
Alexander Schätzle
Jun 4, 2010 at 9:49 am
Jun 4, 2010 at 1:50 pm -
How does PIG 0.7.0 handle schema of data? Sometimes, 1) there is a missing column in the input data to be loaded and the total number of columns to be read is smaller than that specified in the ...
Jiang licht
Jun 28, 2010 at 7:04 pm
Jun 28, 2010 at 7:04 pm -
I finally got around to open sourcing the beginnings of the 'PigPen' web app I was working on here: http://github.com/rjurney/Cloud-Stenography Video of it in action is here: http://vimeo.com/6032078 ...
Russell Jurney
Jun 24, 2010 at 4:06 am
Jun 24, 2010 at 4:06 am -
I have a pig job which supposes to save some result in the hdfs. Most time it runs just fine. But it failed once and no result was generated. There was no dead node. And namenode, tasktracker and ...
Jiang licht
Jun 24, 2010 at 12:58 am
Jun 24, 2010 at 12:58 am -
I have a UDF using: FileLocalizer.openDFSFile(filename); in the ctor; it complains it can't open the file while running locally--does this opening of the file need to exist somewhere else like in the ...
Corbin Hoenes
Jun 22, 2010 at 5:55 pm
Jun 22, 2010 at 5:55 pm -
Alan Gates
Jun 18, 2010 at 8:28 pm
Jun 18, 2010 at 8:28 pm -
Alan Gates
Jun 1, 2010 at 5:04 pm
Jun 1, 2010 at 5:04 pm
Group Overview
group | user |
categories | pig, hadoop |
discussions | 37 |
posts | 171 |
users | 47 |
website | pig.apache.org |
47 users for June 2010
Archives
- May 2013 (92)
- April 2013 (226)
- March 2013 (362)
- February 2013 (192)
- January 2013 (166)
- December 2012 (115)
- November 2012 (223)
- October 2012 (249)
- September 2012 (275)
- August 2012 (249)
- July 2012 (219)
- June 2012 (371)
- May 2012 (281)
- April 2012 (377)
- March 2012 (341)
- February 2012 (323)
- January 2012 (364)
- December 2011 (266)
- November 2011 (234)
- October 2011 (207)
- September 2011 (321)
- August 2011 (271)
- July 2011 (253)
- June 2011 (249)
- May 2011 (239)
- April 2011 (341)
- March 2011 (321)
- February 2011 (276)
- January 2011 (320)
- December 2010 (244)
- November 2010 (136)
- October 2010 (251)
- September 2010 (161)
- August 2010 (201)
- July 2010 (198)
- June 2010 (171)
- May 2010 (205)
- April 2010 (192)
- March 2010 (237)
- February 2010 (192)
- January 2010 (182)
- December 2009 (106)
- November 2009 (169)
- October 2009 (105)
- September 2009 (134)
- August 2009 (108)
- July 2009 (140)
- June 2009 (151)
- May 2009 (150)
- April 2009 (133)
- March 2009 (124)
- February 2009 (119)
- January 2009 (66)
- December 2008 (45)
- November 2008 (80)
- October 2008 (102)
- September 2008 (112)
- August 2008 (32)
- July 2008 (46)
- June 2008 (78)
- May 2008 (79)
- April 2008 (26)
- March 2008 (42)
- February 2008 (30)
- January 2008 (15)
- December 2007 (31)
- November 2007 (13)
- October 2007 (9)