Search Discussions
-
Is anyone else seeing a bunch of errors when trying to run the pig or piggybank unit tests? I had to change ivy to use xercesImpl 2.9.1 instead of xerces 2.4.4 to make this exception go away: ...
Bill Graham
Jan 9, 2012 at 11:55 pm
Jan 12, 2012 at 7:02 am -
Hi All, I am quite new to hadoop world and trying to work on a project using hadoop and pig. The data is continuously being written in hadoop by many producers. All producers concurrently write data ...
Rakesh sharma
Jan 10, 2012 at 8:43 pm
Jan 19, 2012 at 6:26 pm -
14
Pig UI
Is there anything out there that allows you to issue Pig queries/scripts through a browser/UI? Thanks, PrashantPrashant Kommireddi
Jan 18, 2012 at 2:36 am
Jan 22, 2012 at 6:13 pm -
HI: I want to implement a new pig backend . Can I replache the hadoop backend with a hadoop--streaming only backend? I decide to use streaming to implement backend.storage and backend ...
Devdoer bird
Jan 9, 2012 at 3:50 am
Feb 21, 2012 at 9:43 am -
Ran into this today. Using trunk (0.11) If you are using a custom loader and are trying to get input split information In prepareToRead(), getWrappedSplit() is providing the fist split instead of ...
Alex Rovner
Jan 6, 2012 at 9:50 pm
Jan 10, 2012 at 2:31 pm -
How could I use the CombineFileInputFormat in Pig? I have a performance issue with lots of small files which I want to get rid of. I think by default the FileInputFormat is used.
Marcel Holle
Jan 11, 2012 at 1:11 am
Jan 13, 2012 at 1:12 am -
i have downloaded jython jar and installed it.. but i couldnt run pig scripts using py script... i have created a .py in pig/bin forder.. how to run the pig.. pig wiki is not so clear..please help. ...
Rahul raghavendhra
Jan 12, 2012 at 6:25 am
Jan 12, 2012 at 9:25 am -
Hi folks, Is there an another way to perform string concat on multiple columns instead of using the built in CONCAT function which only takes 2 arguments? I can do CONCAT(str1, CONCAT(str2, str3)), ...
Michael Lok
Jan 19, 2012 at 7:40 am
Jan 25, 2012 at 9:22 am -
I have a pig script that does basically a map-only job: raw = LOAD 'input.txt' ; processed = FOREACH raw GENERATE convert_somehow($1,$2...); store processed into 'output.txt'; I have many nodes on my ...
Yang
Jan 12, 2012 at 2:13 am
Jan 17, 2012 at 9:29 pm -
Hallo, When I run a simple pig script to LOAD and STORE avro data, I get:- java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord ...
Andrew Kenworthy
Jan 9, 2012 at 9:16 am
Jan 16, 2012 at 9:14 pm -
Hello, I am wondering if there is a way for me to load multiple files into pig, while still keeping track of what record came from what file. To give some background, I have about half a million ...
Yulia Tolskaya
Jan 9, 2012 at 6:44 am
Jan 13, 2012 at 5:40 pm -
Hi Guys, I came across a use case that seems to require an 'explode' operation which to my knowledge is not currently available. That is, given a tuple (x,y,z), 'explode' would generate the tuples ...
Stan Rosenberg
Jan 26, 2012 at 3:11 am
Jan 30, 2012 at 4:05 pm -
Hi folks, I've got one resultset which I need to run a comparison with all the rows within the same resultset. For example: R1 R2 R3 R4 R5 Take R1, I'll need to compare R1 with all rows from R2-R5. ...
Michael Lok
Jan 19, 2012 at 8:55 am
Jan 20, 2012 at 6:04 pm -
Hi folks, I've a simple script which does CROSS join (thanks to Dimitry for the tip :D) and calls a UDF to perform simple matching between 2 values from the joined result. The script was initially ...
Michael Lok
Jan 6, 2012 at 7:52 am
Jan 7, 2012 at 4:17 pm -
Let's say I have this dataset: 1,undefined,text1 1,,text2 1,event1,text3 1,undefined,text4 1,event2,text5 1,event3,text6 I would like to group by 1st value, but not quite an ordinary grouping. I ...
Grig Gheorghiu
Jan 27, 2012 at 12:02 am
Feb 3, 2012 at 3:36 am -
Dear Pig users, I tried to load several files with AvroStorage by using a comma separated list. The statement I used is: test_data= LOAD 'repo_1/part-r-00000.avro,repo_2/part-r-00000.avro' USING ...
Philipp
Jan 24, 2012 at 9:27 am
Jan 25, 2012 at 7:06 pm -
I need to visualize some Pig queries for some writing I'm doing. Does PigPen work these days? What versions of Eclipse/Pig/etc. do I need to use to make it go? We look for things. Things to make us ...
Russell Jurney
Jan 18, 2012 at 8:50 am
Feb 12, 2012 at 5:16 am -
Hi folks, I have 2 tables which I'd like to perform joins via "!=" condition similar to the SQL syntax below: select * from yee a left join yer b on a.loc != b.loc I've read through Pig Latin basics ...
Michael Lok
Jan 3, 2012 at 8:23 am
Jan 5, 2012 at 1:58 am -
Hi folks, I use replicated joins, and recently I encountered an issue : my rightmost relation seems to become too big and, even if I don't get any "Java heap space" the time it take to finish the ...
Vincent Barat
Jan 27, 2012 at 1:16 pm
Jan 31, 2012 at 8:44 am -
Hi Folks, While working with Pig it came to my mind that that currently there's no way to call Pig Latin set of commands or a script from Java code and receive results directly into Java. Or, maybe ...
Vlad S
Jan 27, 2012 at 4:40 pm
Jan 27, 2012 at 6:08 pm -
Hi, Im looking for a solution to load/store lzo compressed files for Pig-0.9.1 . Any pointers would be useful. Thanks, Sam William sampd@stumbleupon.com
Sam William
Jan 23, 2012 at 9:42 pm
Jan 25, 2012 at 11:20 pm -
I have been stuck on this for several hours and I cannot figure out what I am doing wrong. I have a relation "grouped" with the schema of grouped: {seedword: chararray,baggy: {outertup: (groupy: ...
Yulia Tolskaya
Jan 12, 2012 at 6:24 pm
Jan 12, 2012 at 10:54 pm -
Hello, If there is a way for pig to connect HBase in standalone mode. It means pig is standalone and HBase is also standalone. I tried org.apache.pig.backend.hadoop.hbase.HBaseStorage function. And ...
Yonghu
Jan 9, 2012 at 4:49 pm
Jan 10, 2012 at 6:21 pm -
Hi folks, I've got a dataset as below: 10,234324234,NAME 1,3 10,346464646,NAME 1,3 10,438389232,NAME 1,3 20,397383737,NAME 2,4 20,383783234,NAME 2,4 20,387382828,NAME 2,4 20,309323333,NAME 2,4 ...
Michael Lok
Jan 25, 2012 at 7:20 am
Jan 27, 2012 at 3:32 am -
4
DBLoader
Hi, Quick question: is anyone aware of a DBLoad UDF, preferably based on hadoop's DBInputFormat? I am aware that there are other better solutions, e.g., sqoop. I can see DBStorage in piggybank, but ...Stan Rosenberg
Jan 24, 2012 at 10:10 pm
Jan 25, 2012 at 7:00 pm -
I can't order my relations in Pig 0.9. If I do, the script fails. Has anyone else seen this behavior? -- Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
Russell Jurney
Jan 21, 2012 at 4:15 am
Jan 22, 2012 at 12:26 am -
Hello, I wonder if you guys can help. I'm running Pig 0.9.1 with Hadoop 0.20.203.0, and am having the below problem consistently. I have tried various cast operators, none of which are working, so I ...
Ian Meyers
Jan 19, 2012 at 10:06 am
Jan 20, 2012 at 10:57 am -
Hello, My pig version is 0.8.1. I have got some information of the mailing list. I rebuilt the pig using: ant jar-withouthadoop and replace the hadoop jar file in /pig_home/build/ivy/lib/pig with the ...
Yonghu
Jan 18, 2012 at 10:04 am
Jan 18, 2012 at 1:17 pm -
Hey Guys, Is there anyway through which I can see the M/R jobs that pig runs internally for a given pig script ? I wanted to get unique values for a particular column. For that I wrote the following ...
Praveenesh kumar
Jan 16, 2012 at 5:48 am
Jan 16, 2012 at 9:09 am -
I have a small pig script that outputs the top 500 of a simple computed relation. It works fine on a small data set but fails on a larger (45 GB) data set. I don’t see errors in the hadoop logs (but ...
William Dowling
Jan 5, 2012 at 10:16 pm
Jan 6, 2012 at 9:36 pm -
Hello, I'm having an out of memory problem that seems rather weird to me. Perhaps you can help me. Here's what I do: dump = LOAD '/user/accounting/dump_2012-01-05.lst' AS ( ts:chararray, ...
Mario Lassnig
Jan 18, 2012 at 10:07 pm
Jan 20, 2012 at 7:32 am -
In pig-0.9.1, current hbase is version 0.90.0. My question is couls pig-0.9.1 work with latest stable hbase with version 0.90.5? Thank you.
Lulynn_2008
Jan 18, 2012 at 3:37 am
Jan 18, 2012 at 12:02 pm -
Hi folks, Is it possible to run Pig scripts via an API call from a remote server? My plan is to have a web app which users can use to submit and monitor their Pig scripts and jobs. Thanks.
Michael Lok
Jan 11, 2012 at 9:35 am
Jan 11, 2012 at 7:07 pm -
Hi, What I would like to do is to store outputs to different directories based on record value. Essentially I want to read the date from a field and store the output in yyyy/mm/dd directory ...
IGZ Nick
Jan 10, 2012 at 5:40 am
Jan 10, 2012 at 6:58 am -
Hi, We have a use-case where it would be beneficial to "select" multiple files to process by a regex pattern (or a loop-like functionality to dynamically adjust which files to pick). We have files of ...
Meyer, Dennis
Jan 2, 2012 at 2:27 pm
Jan 3, 2012 at 10:12 am -
Hi, To increase performance of my computation, I would like to use a merge join between two tables. I wrote this code to do that : pigServer.registerQuery("start_sessions = LOAD During the first job ...
Kevin Lion
Jan 24, 2012 at 4:33 pm
Mar 6, 2012 at 1:59 pm -
Hi, I have an group of records that gets outputted like the below. ((1010046645226466896,http://www.url.com/),1277793285) ((1010046645226466896,http:///www.url.com/?image=580),1277793315) ...
David Houston
Jan 25, 2012 at 10:13 am
Jan 27, 2012 at 2:00 am -
Hi, I'm trying to write a pig script to create a list of the top N ip entries per hour. Currently I have something like this: PER_IP = GROUP CFP_LOGS_CLICKS_WITHOUT_0 BY (dayNumber, hourNumber, ip); ...
Peter Maas
Jan 26, 2012 at 9:20 am
Jan 26, 2012 at 12:08 pm -
I would like to generate a set of data that represents the items not found in another set. How would I do this using Pig? I'm thinking I would do an outer join and then filter off the items that were ...
Chan, Tim
Jan 24, 2012 at 9:48 pm
Jan 24, 2012 at 10:10 pm -
Greetings All! Hopefully this isn't too annoying of a newbie question. I'd like to transpose the columns in a relation into a relation consisting of rows of bags (i.e., something akin to matrix ...
David Langer
Jan 20, 2012 at 12:48 am
Jan 20, 2012 at 3:50 pm -
Hi there, AFAICT the STORE function doesn't provide a way to overwrite the output. I guess you could use your own storage UDF to accomplish that but is there also another way of doing that? Thanks ...
Marco Cadetg
Jan 17, 2012 at 2:22 pm
Jan 18, 2012 at 12:04 pm -
Hi all, I'm new to Pig (and a bit rusty with Java!) and still just playing around with it, nothing serious yet. I might be misunderstanding something important here. I'm trying to write a custom ...
Rory McCann
Jan 13, 2012 at 12:13 pm
Jan 13, 2012 at 6:28 pm -
I have a plan to write a simple classification algorithm on any csv using Pig and embed py.. Can i write generic pig scripts that apply for any dataset..? Is there any guidelines or examples for ...
Rahul raghavendhra
Jan 12, 2012 at 9:02 am
Jan 12, 2012 at 9:23 am -
Hi folks, Not sure if this is related to Pig or Hadoop in general; but I'm posting this here since I'm running Pig scripts :) Anyway, I've been trying to perform a CROSS join between 2 files which ...
Michael Lok
Jan 10, 2012 at 12:18 am
Jan 10, 2012 at 12:46 am -
Can anyone help me understanding "Explain" Operator in pig ? I know it gives some logical/physical and Map/Reduce plan for the pig script we execute ? But its kind of tricky to understand the output ...
Praveenesh kumar
Jan 31, 2012 at 9:33 am
Jan 31, 2012 at 5:27 pm -
Hi there I am trying to load in some data using the PigStorage with a schema. But i can't seem to get the schema right and was hoping someone could point out my mistake. Here is the data being loaded ...
Sandopolus
Jan 26, 2012 at 5:46 pm
Jan 27, 2012 at 12:20 am -
this question was asked a few days before, after that, I dug around the source code and found some knobs to turn, but after adding the following, I still could not get pig to use smaller split sizes ...
Yang
Jan 26, 2012 at 2:07 am
Jan 26, 2012 at 7:25 pm -
I tried this with both the 0.9.2.tar.gz distro and hithub source, ant eclipse-files then "create java project from an existing ant build file" in eclipse, and choose the build.xml then eclipse gave ...
Yang
Jan 26, 2012 at 6:42 pm
Jan 26, 2012 at 6:46 pm -
Hello Pig Users, I was wondering what the best way to store my output into an existing file would be. When I was looking on line I found this jira: https://issues.apache.org/jira/browse/PIG-259 It ...
Yulia Tolskaya
Jan 24, 2012 at 7:34 pm
Jan 24, 2012 at 8:05 pm -
Hi Marek, Moving question to user@pig.apache.org which may be more relevant. (BCC'd mapreduce-user@, CC'd you) -- Harsh J Customer Ops. Engineer, Cloudera
Harsh J
Jan 17, 2012 at 6:43 pm
Jan 17, 2012 at 6:57 pm
Group Overview
group | user |
categories | pig, hadoop |
discussions | 68 |
posts | 364 |
users | 63 |
website | pig.apache.org |
63 users for January 2012
Archives
- May 2013 (92)
- April 2013 (226)
- March 2013 (362)
- February 2013 (192)
- January 2013 (166)
- December 2012 (115)
- November 2012 (223)
- October 2012 (249)
- September 2012 (275)
- August 2012 (249)
- July 2012 (219)
- June 2012 (371)
- May 2012 (281)
- April 2012 (377)
- March 2012 (341)
- February 2012 (323)
- January 2012 (364)
- December 2011 (266)
- November 2011 (234)
- October 2011 (207)
- September 2011 (321)
- August 2011 (271)
- July 2011 (253)
- June 2011 (249)
- May 2011 (239)
- April 2011 (341)
- March 2011 (321)
- February 2011 (276)
- January 2011 (320)
- December 2010 (244)
- November 2010 (136)
- October 2010 (251)
- September 2010 (161)
- August 2010 (201)
- July 2010 (198)
- June 2010 (171)
- May 2010 (205)
- April 2010 (192)
- March 2010 (237)
- February 2010 (192)
- January 2010 (182)
- December 2009 (106)
- November 2009 (169)
- October 2009 (105)
- September 2009 (134)
- August 2009 (108)
- July 2009 (140)
- June 2009 (151)
- May 2009 (150)
- April 2009 (133)
- March 2009 (124)
- February 2009 (119)
- January 2009 (66)
- December 2008 (45)
- November 2008 (80)
- October 2008 (102)
- September 2008 (112)
- August 2008 (32)
- July 2008 (46)
- June 2008 (78)
- May 2008 (79)
- April 2008 (26)
- March 2008 (42)
- February 2008 (30)
- January 2008 (15)
- December 2007 (31)
- November 2007 (13)
- October 2007 (9)