Grokbase Groups Pig user January 2012

Search Discussions

68 discussions - 364 posts

  • Is anyone else seeing a bunch of errors when trying to run the pig or piggybank unit tests? I had to change ivy to use xercesImpl 2.9.1 instead of xerces 2.4.4 to make this exception go away: ...
    Bill GrahamBill Graham
    Jan 9, 2012 at 11:55 pm
    Jan 12, 2012 at 7:02 am
  • Hi All, I am quite new to hadoop world and trying to work on a project using hadoop and pig. The data is continuously being written in hadoop by many producers. All producers concurrently write data ...
    Rakesh sharmaRakesh sharma
    Jan 10, 2012 at 8:43 pm
    Jan 19, 2012 at 6:26 pm
  • Is there anything out there that allows you to issue Pig queries/scripts through a browser/UI? Thanks, Prashant
    Prashant KommireddiPrashant Kommireddi
    Jan 18, 2012 at 2:36 am
    Jan 22, 2012 at 6:13 pm
  • HI: I want to implement a new pig backend . Can I replache the hadoop backend with a hadoop--streaming only backend? I decide to use streaming to implement and backend ...
    Devdoer birdDevdoer bird
    Jan 9, 2012 at 3:50 am
    Feb 21, 2012 at 9:43 am
  • Ran into this today. Using trunk (0.11) If you are using a custom loader and are trying to get input split information In prepareToRead(), getWrappedSplit() is providing the fist split instead of ...
    Alex RovnerAlex Rovner
    Jan 6, 2012 at 9:50 pm
    Jan 10, 2012 at 2:31 pm
  • How could I use the CombineFileInputFormat in Pig? I have a performance issue with lots of small files which I want to get rid of. I think by default the FileInputFormat is used.
    Marcel HolleMarcel Holle
    Jan 11, 2012 at 1:11 am
    Jan 13, 2012 at 1:12 am
  • i have downloaded jython jar and installed it.. but i couldnt run pig scripts using py script... i have created a .py in pig/bin forder.. how to run the pig.. pig wiki is not so clear..please help. ...
    Rahul raghavendhraRahul raghavendhra
    Jan 12, 2012 at 6:25 am
    Jan 12, 2012 at 9:25 am
  • Hi folks, Is there an another way to perform string concat on multiple columns instead of using the built in CONCAT function which only takes 2 arguments? I can do CONCAT(str1, CONCAT(str2, str3)), ...
    Michael LokMichael Lok
    Jan 19, 2012 at 7:40 am
    Jan 25, 2012 at 9:22 am
  • I have a pig script that does basically a map-only job: raw = LOAD 'input.txt' ; processed = FOREACH raw GENERATE convert_somehow($1,$2...); store processed into 'output.txt'; I have many nodes on my ...
    Jan 12, 2012 at 2:13 am
    Jan 17, 2012 at 9:29 pm
  • Hallo, When I run a simple pig script to LOAD and STORE avro data, I get:- java.lang.ClassCastException: cannot be cast to org.apache.avro.generic.IndexedRecord ...
    Andrew KenworthyAndrew Kenworthy
    Jan 9, 2012 at 9:16 am
    Jan 16, 2012 at 9:14 pm
  • Hello, I am wondering if there is a way for me to load multiple files into pig, while still keeping track of what record came from what file. To give some background, I have about half a million ...
    Yulia TolskayaYulia Tolskaya
    Jan 9, 2012 at 6:44 am
    Jan 13, 2012 at 5:40 pm
  • Hi Guys, I came across a use case that seems to require an 'explode' operation which to my knowledge is not currently available. That is, given a tuple (x,y,z), 'explode' would generate the tuples ...
    Stan RosenbergStan Rosenberg
    Jan 26, 2012 at 3:11 am
    Jan 30, 2012 at 4:05 pm
  • Hi folks, I've got one resultset which I need to run a comparison with all the rows within the same resultset. For example: R1 R2 R3 R4 R5 Take R1, I'll need to compare R1 with all rows from R2-R5. ...
    Michael LokMichael Lok
    Jan 19, 2012 at 8:55 am
    Jan 20, 2012 at 6:04 pm
  • Hi folks, I've a simple script which does CROSS join (thanks to Dimitry for the tip :D) and calls a UDF to perform simple matching between 2 values from the joined result. The script was initially ...
    Michael LokMichael Lok
    Jan 6, 2012 at 7:52 am
    Jan 7, 2012 at 4:17 pm
  • Let's say I have this dataset: 1,undefined,text1 1,,text2 1,event1,text3 1,undefined,text4 1,event2,text5 1,event3,text6 I would like to group by 1st value, but not quite an ordinary grouping. I ...
    Grig GheorghiuGrig Gheorghiu
    Jan 27, 2012 at 12:02 am
    Feb 3, 2012 at 3:36 am
  • Dear Pig users, I tried to load several files with AvroStorage by using a comma separated list. The statement I used is: test_data= LOAD 'repo_1/part-r-00000.avro,repo_2/part-r-00000.avro' USING ...
    Jan 24, 2012 at 9:27 am
    Jan 25, 2012 at 7:06 pm
  • I need to visualize some Pig queries for some writing I'm doing. Does PigPen work these days? What versions of Eclipse/Pig/etc. do I need to use to make it go? We look for things. Things to make us ...
    Russell JurneyRussell Jurney
    Jan 18, 2012 at 8:50 am
    Feb 12, 2012 at 5:16 am
  • Hi folks, I have 2 tables which I'd like to perform joins via "!=" condition similar to the SQL syntax below: select * from yee a left join yer b on a.loc != b.loc I've read through Pig Latin basics ...
    Michael LokMichael Lok
    Jan 3, 2012 at 8:23 am
    Jan 5, 2012 at 1:58 am
  • Hi folks, I use replicated joins, and recently I encountered an issue : my rightmost relation seems to become too big and, even if I don't get any "Java heap space" the time it take to finish the ...
    Vincent BaratVincent Barat
    Jan 27, 2012 at 1:16 pm
    Jan 31, 2012 at 8:44 am
  • Hi Folks, While working with Pig it came to my mind that that currently there's no way to call Pig Latin set of commands or a script from Java code and receive results directly into Java. Or, maybe ...
    Vlad SVlad S
    Jan 27, 2012 at 4:40 pm
    Jan 27, 2012 at 6:08 pm
  • Hi, Im looking for a solution to load/store lzo compressed files for Pig-0.9.1 . Any pointers would be useful. Thanks, Sam William
    Sam WilliamSam William
    Jan 23, 2012 at 9:42 pm
    Jan 25, 2012 at 11:20 pm
  • I have been stuck on this for several hours and I cannot figure out what I am doing wrong. I have a relation "grouped" with the schema of grouped: {seedword: chararray,baggy: {outertup: (groupy: ...
    Yulia TolskayaYulia Tolskaya
    Jan 12, 2012 at 6:24 pm
    Jan 12, 2012 at 10:54 pm
  • Hello, If there is a way for pig to connect HBase in standalone mode. It means pig is standalone and HBase is also standalone. I tried org.apache.pig.backend.hadoop.hbase.HBaseStorage function. And ...
    Jan 9, 2012 at 4:49 pm
    Jan 10, 2012 at 6:21 pm
  • Hi folks, I've got a dataset as below: 10,234324234,NAME 1,3 10,346464646,NAME 1,3 10,438389232,NAME 1,3 20,397383737,NAME 2,4 20,383783234,NAME 2,4 20,387382828,NAME 2,4 20,309323333,NAME 2,4 ...
    Michael LokMichael Lok
    Jan 25, 2012 at 7:20 am
    Jan 27, 2012 at 3:32 am
  • Hi, Quick question: is anyone aware of a DBLoad UDF, preferably based on hadoop's DBInputFormat? I am aware that there are other better solutions, e.g., sqoop. I can see DBStorage in piggybank, but ...
    Stan RosenbergStan Rosenberg
    Jan 24, 2012 at 10:10 pm
    Jan 25, 2012 at 7:00 pm
  • I can't order my relations in Pig 0.9. If I do, the script fails. Has anyone else seen this behavior? -- Russell Jurney
    Russell JurneyRussell Jurney
    Jan 21, 2012 at 4:15 am
    Jan 22, 2012 at 12:26 am
  • Hello, I wonder if you guys can help. I'm running Pig 0.9.1 with Hadoop, and am having the below problem consistently. I have tried various cast operators, none of which are working, so I ...
    Ian MeyersIan Meyers
    Jan 19, 2012 at 10:06 am
    Jan 20, 2012 at 10:57 am
  • Hello, My pig version is 0.8.1. I have got some information of the mailing list. I rebuilt the pig using: ant jar-withouthadoop and replace the hadoop jar file in /pig_home/build/ivy/lib/pig with the ...
    Jan 18, 2012 at 10:04 am
    Jan 18, 2012 at 1:17 pm
  • Hey Guys, Is there anyway through which I can see the M/R jobs that pig runs internally for a given pig script ? I wanted to get unique values for a particular column. For that I wrote the following ...
    Praveenesh kumarPraveenesh kumar
    Jan 16, 2012 at 5:48 am
    Jan 16, 2012 at 9:09 am
  • I have a small pig script that outputs the top 500 of a simple computed relation. It works fine on a small data set but fails on a larger (45 GB) data set. I don’t see errors in the hadoop logs (but ...
    William DowlingWilliam Dowling
    Jan 5, 2012 at 10:16 pm
    Jan 6, 2012 at 9:36 pm
  • Hello, I'm having an out of memory problem that seems rather weird to me. Perhaps you can help me. Here's what I do: dump = LOAD '/user/accounting/dump_2012-01-05.lst' AS ( ts:chararray, ...
    Mario LassnigMario Lassnig
    Jan 18, 2012 at 10:07 pm
    Jan 20, 2012 at 7:32 am
  • In pig-0.9.1, current hbase is version 0.90.0. My question is couls pig-0.9.1 work with latest stable hbase with version 0.90.5? Thank you.
    Jan 18, 2012 at 3:37 am
    Jan 18, 2012 at 12:02 pm
  • Hi folks, Is it possible to run Pig scripts via an API call from a remote server? My plan is to have a web app which users can use to submit and monitor their Pig scripts and jobs. Thanks.
    Michael LokMichael Lok
    Jan 11, 2012 at 9:35 am
    Jan 11, 2012 at 7:07 pm
  • Hi, What I would like to do is to store outputs to different directories based on record value. Essentially I want to read the date from a field and store the output in yyyy/mm/dd directory ...
    IGZ NickIGZ Nick
    Jan 10, 2012 at 5:40 am
    Jan 10, 2012 at 6:58 am
  • Hi, We have a use-case where it would be beneficial to "select" multiple files to process by a regex pattern (or a loop-like functionality to dynamically adjust which files to pick). We have files of ...
    Meyer, DennisMeyer, Dennis
    Jan 2, 2012 at 2:27 pm
    Jan 3, 2012 at 10:12 am
  • Hi, To increase performance of my computation, I would like to use a merge join between two tables. I wrote this code to do that : pigServer.registerQuery("start_sessions = LOAD During the first job ...
    Kevin LionKevin Lion
    Jan 24, 2012 at 4:33 pm
    Mar 6, 2012 at 1:59 pm
  • Hi, I have an group of records that gets outputted like the below. ((1010046645226466896,,1277793285) ((1010046645226466896,http:///,1277793315) ...
    David HoustonDavid Houston
    Jan 25, 2012 at 10:13 am
    Jan 27, 2012 at 2:00 am
  • Hi, I'm trying to write a pig script to create a list of the top N ip entries per hour. Currently I have something like this: PER_IP = GROUP CFP_LOGS_CLICKS_WITHOUT_0 BY (dayNumber, hourNumber, ip); ...
    Peter MaasPeter Maas
    Jan 26, 2012 at 9:20 am
    Jan 26, 2012 at 12:08 pm
  • I would like to generate a set of data that represents the items not found in another set. How would I do this using Pig? I'm thinking I would do an outer join and then filter off the items that were ...
    Chan, TimChan, Tim
    Jan 24, 2012 at 9:48 pm
    Jan 24, 2012 at 10:10 pm
  • Greetings All! Hopefully this isn't too annoying of a newbie question. I'd like to transpose the columns in a relation into a relation consisting of rows of bags (i.e., something akin to matrix ...
    David LangerDavid Langer
    Jan 20, 2012 at 12:48 am
    Jan 20, 2012 at 3:50 pm
  • Hi there, AFAICT the STORE function doesn't provide a way to overwrite the output. I guess you could use your own storage UDF to accomplish that but is there also another way of doing that? Thanks ...
    Marco CadetgMarco Cadetg
    Jan 17, 2012 at 2:22 pm
    Jan 18, 2012 at 12:04 pm
  • Hi all, I'm new to Pig (and a bit rusty with Java!) and still just playing around with it, nothing serious yet. I might be misunderstanding something important here. I'm trying to write a custom ...
    Rory McCannRory McCann
    Jan 13, 2012 at 12:13 pm
    Jan 13, 2012 at 6:28 pm
  • I have a plan to write a simple classification algorithm on any csv using Pig and embed py.. Can i write generic pig scripts that apply for any dataset..? Is there any guidelines or examples for ...
    Rahul raghavendhraRahul raghavendhra
    Jan 12, 2012 at 9:02 am
    Jan 12, 2012 at 9:23 am
  • Hi folks, Not sure if this is related to Pig or Hadoop in general; but I'm posting this here since I'm running Pig scripts :) Anyway, I've been trying to perform a CROSS join between 2 files which ...
    Michael LokMichael Lok
    Jan 10, 2012 at 12:18 am
    Jan 10, 2012 at 12:46 am
  • Can anyone help me understanding "Explain" Operator in pig ? I know it gives some logical/physical and Map/Reduce plan for the pig script we execute ? But its kind of tricky to understand the output ...
    Praveenesh kumarPraveenesh kumar
    Jan 31, 2012 at 9:33 am
    Jan 31, 2012 at 5:27 pm
  • Hi there I am trying to load in some data using the PigStorage with a schema. But i can't seem to get the schema right and was hoping someone could point out my mistake. Here is the data being loaded ...
    Jan 26, 2012 at 5:46 pm
    Jan 27, 2012 at 12:20 am
  • this question was asked a few days before, after that, I dug around the source code and found some knobs to turn, but after adding the following, I still could not get pig to use smaller split sizes ...
    Jan 26, 2012 at 2:07 am
    Jan 26, 2012 at 7:25 pm
  • I tried this with both the 0.9.2.tar.gz distro and hithub source, ant eclipse-files then "create java project from an existing ant build file" in eclipse, and choose the build.xml then eclipse gave ...
    Jan 26, 2012 at 6:42 pm
    Jan 26, 2012 at 6:46 pm
  • Hello Pig Users, I was wondering what the best way to store my output into an existing file would be. When I was looking on line I found this jira: It ...
    Yulia TolskayaYulia Tolskaya
    Jan 24, 2012 at 7:34 pm
    Jan 24, 2012 at 8:05 pm
  • Hi Marek, Moving question to which may be more relevant. (BCC'd mapreduce-user@, CC'd you) -- Harsh J Customer Ops. Engineer, Cloudera
    Harsh JHarsh J
    Jan 17, 2012 at 6:43 pm
    Jan 17, 2012 at 6:57 pm
Group Navigation
period‹ prev | Jan 2012 | next ›
Group Overview
groupuser @
categoriespig, hadoop

63 users for January 2012

Daniel Dai: 37 posts Dmitriy Ryaboy: 35 posts Prashant Kommireddi: 30 posts Jon Coveney: 24 posts Michael Lok: 21 posts Russell Jurney: 18 posts Stan Rosenberg: 18 posts Yulia Tolskaya: 15 posts Alan Gates: 10 posts Yang: 10 posts Alex Rovner: 9 posts Bill Graham: 9 posts Rahul raghavendhra: 9 posts Aniket Mokashi: 7 posts Rakesh sharma: 7 posts Yonghu: 6 posts William Dowling: 5 posts Devdoer bird: 5 posts Grig Gheorghiu: 5 posts IGZ Nick: 5 posts
show more