Grokbase Groups Pig user October 2012

Search Discussions

77 discussions - 249 posts

  • Hi everyone.. We just upgraded to CDH4.0.0 and are seeing a very weird issue with Pig.. Everytime I try to run a LOAD command, it dies with the following exception: ERROR 2998: Unhandled internal ...
    Dhaval ShahDhaval Shah
    Oct 9, 2012 at 10:31 pm
    Oct 11, 2012 at 12:20 am
  • one of the greatest pains I face with debugging a pig code is that the iteration cycles are really long: the applications for which we use pig typically deal with large dataset, and if a pig script ...
    Oct 19, 2012 at 9:10 am
    Nov 7, 2012 at 9:06 pm
  • Here is another Pig committer announcement today. Please welcome Rohini Palaniswamy to be a Pig committer! Thanks, Daniel
    Daniel DaiDaniel Dai
    Oct 26, 2012 at 11:38 pm
    Oct 29, 2012 at 6:32 pm
  • All, Please join me in welcoming Cheolsoo Park as our newest Pig committer. He's been contributing to Pig for a while now, helping fixing the build and improve Pig. We look forward to him being a ...
    Julien Le DemJulien Le Dem
    Oct 26, 2012 at 9:54 pm
    Oct 26, 2012 at 11:27 pm
  • Hi,**** I am trying to write a pig udf function.. Basically the data is of format* *** ** ** Id,time**** What I am trying to do is … parse the time and then see whether its breakfast, lunch or ...
    Jamal sashaJamal sasha
    Oct 25, 2012 at 9:47 pm
    Nov 12, 2012 at 3:36 pm
  • Hi all, I have this file. I want this operation to perform in HIVE & PIG NAME DATE URL HITCOUNT 2008-08-27 15 2008-08-27 ...
    Yogesh dhariYogesh dhari
    Oct 14, 2012 at 2:54 pm
    Oct 18, 2012 at 4:13 am
  • Hi, I am using Pig-0.10.0 & hbase-0.94.2. I am trying to store the processed output to Hbase cluster using pig script. I registered the required .jar and set the mapreduce and zookeeper parameters ...
    Manu SManu S
    Oct 25, 2012 at 3:03 pm
    Oct 26, 2012 at 6:45 am
  • Hi, I'm using Pig for my daily job. Pig dose a good job in mapreduce mode in our internal hadoop cluster. But it signals an error whenever I want to run it in local mode. I believe this is due to ...
    Lei tangLei tang
    Oct 15, 2012 at 10:44 pm
    Oct 23, 2012 at 12:39 am
  • Hi, I am trying to do matrix multiplication using pig. Basically I have data in the form: data1.txt item1,item2,0.3 item1, item3, 0.4 item1, item5, 0.6 And then I another data in the form data2.txt ...
    Jamal sashaJamal sasha
    Oct 22, 2012 at 2:40 am
    Oct 22, 2012 at 10:45 pm
  • I am trying to load some text files in hive partitions on S3 using the AllLoader function with no success. I get an error which indicates that AllLoader is expecting the files to be on hdfs: a = LOAD ...
    Martin GoodsonMartin Goodson
    Oct 12, 2012 at 3:49 pm
    Oct 18, 2012 at 9:00 pm
  • Hi, I'm using cdh 4.0.1 with pig-0.9.2+26. I'v tried to gather some information about my result files aggregated by pig with the HadoopJobHistoryLoader() as described here ...
    Zebeljan, NebojsaZebeljan, Nebojsa
    Oct 10, 2012 at 12:23 pm
    Oct 11, 2012 at 7:06 pm
  • To close this round of announcements. Please welcome Jonathan Coveney as our latest Pig PMC member. Congrats Jonathan! Julien
    Julien Le DemJulien Le Dem
    Oct 29, 2012 at 6:28 pm
    Oct 29, 2012 at 9:57 pm
  • Hi, Pig writes a 0 byte _SUCCESS file, when the STORE has been successfully done. Is there something like a _FAILURE file that indicates that the STORE failed? I need to determine with a another ...
    Zebeljan, NebojsaZebeljan, Nebojsa
    Oct 18, 2012 at 7:44 am
    Oct 19, 2012 at 4:01 pm
  • Hi all, I have a simple question about join in Pig. I want to do a simple self join on a relation in Pig. So I load two instances of the same relation in this way: I1 = LOAD '/myPath/myFile' as ...
    Alberto CordioliAlberto Cordioli
    Oct 12, 2012 at 9:47 am
    Oct 15, 2012 at 5:58 pm
  • Not sure if this is a noob question but I've been digging quite a lot and trying different things and I just can't seem to use a non static class for Initial/Intermed/Final getters or use ...
    Ugljesa StojanovicUgljesa Stojanovic
    Oct 10, 2012 at 1:18 pm
    Oct 11, 2012 at 5:55 pm
  • Hi, I'm fairly new to writing UDFs and Pig in general. I want to be able to write a UDF that can take advantage of MapReduce's sorting of data. Specifically, I'm trying to conceive how I'd write a ...
    Brian StempinBrian Stempin
    Oct 5, 2012 at 3:46 pm
    Oct 5, 2012 at 6:40 pm
  • Hi all, I am new to pig. In hive we can optimize the code by using Indexing Bucketing Partitions Storing the file in different formats, such as Rc file,sequence file Overriding some property in the ...
    Oct 4, 2012 at 10:19 pm
    Oct 5, 2012 at 1:06 am
  • Hello, I've a script which group lot of alias and is doing some operation on it. But it can happen that I don't need one of this alias. To don't change my code, I would like to create an empty ...
    Kevin LIONKevin LION
    Oct 18, 2012 at 4:16 pm
    Nov 8, 2012 at 3:43 am
  • Team, Are any out of the box load functions for fixed width files?
    Ranjith raghunathRanjith raghunath
    Oct 23, 2012 at 1:04 pm
    Nov 6, 2012 at 1:59 pm
  • Hi folks, I have a pig script that right now looks like this: … likes = FILTER main_set BY blah == 'a' AND meh == 'b'; likes_time = FOREACH likes GENERATE date, 'likes' AS type; dislikes = FILTER ...
    Eli FinkelshteynEli Finkelshteyn
    Oct 24, 2012 at 1:45 pm
    Oct 24, 2012 at 5:22 pm
  • Hi I have a file in format {(1,123,score) ,(1,124,score)} {(2,356,score),(2,678,score)} etc I am guessing the person who was working on this forgot to flatten this in last step? How do I read and ...
    Jamal sashaJamal sasha
    Oct 23, 2012 at 3:25 pm
    Oct 23, 2012 at 6:59 pm
  • Based on AvroStorage code and documentation, it looks like compression is enabled by default, codec set to "deflate". But the file size is almost same as that of uncompressed tab separated text ...
    Thejas NairThejas Nair
    Oct 21, 2012 at 5:23 am
    Oct 23, 2012 at 1:32 pm
  • As a committer, I enjoy nothing more than committing the code of non-committers (except perhaps a rare sunny day of San Francisco). It's great for Pig, and it's great for open source in general. I ...
    Jonathan CoveneyJonathan Coveney
    Oct 19, 2012 at 6:56 pm
    Oct 23, 2012 at 4:42 am
  • BinStorage() PigDump() PigStorage() TextLoader() Load or storing in which of the above format.Will optimize the queries. Can cache be any where in pig.How can the cache be use ful in pig. Regards Abhi
    Oct 5, 2012 at 9:52 pm
    Oct 18, 2012 at 4:10 am
  • Hello all, Please help me how to store this kind of file using PigStorage, Please find the attachment of the file format, this file is generated by using insert overwrite local directory ...
    Yogesh dhariYogesh dhari
    Oct 11, 2012 at 4:57 pm
    Oct 12, 2012 at 8:20 pm
  • hi all, I am fairly new to pig.Can any one tell me how to write below hive query in pig latin. In this query iam using Cartesian join to achieve instring or contains in java. Example col1 -- ...
    Abhishek doddaAbhishek dodda
    Oct 3, 2012 at 12:05 am
    Oct 12, 2012 at 4:55 pm
  • I would like to be able to decide if I want to use the Algebraic or regular implementation of an EvalFunc on the front end (planning phase), preferably in the function constructor. Is there any way ...
    Ugljesa StojanovicUgljesa Stojanovic
    Oct 8, 2012 at 7:01 pm
    Oct 10, 2012 at 1:18 pm
  • Hi All Is there any way to load a text file as single record (text:chararray) in Pig. I am trying to load a bunch of text files from a directory . But it keeps each line as single record. -- ...
    Oct 3, 2012 at 4:49 pm
    Oct 4, 2012 at 9:24 pm
  • I have a test.pig script that I am executing from my java application public static void main(String[] args) { try { PigServer pigServer = new PigServer("local"); runIdQuery(pigServer, "passwd"); } ...
    Pankaj AndhalePankaj Andhale
    Oct 26, 2012 at 9:43 pm
    Oct 29, 2012 at 6:51 pm
  • I am new to Hadoop and Pig. I am trying to get Pig 0.10.0 [1] and Hadoop 2.0.2 [2] working together. I get java.lang.NoSuchMethodError ...
    Nishant NeerajNishant Neeraj
    Oct 26, 2012 at 5:15 am
    Oct 26, 2012 at 8:04 am
  • Hi All, Is it true that Pig's JOIN operation is not so efficient as of HIVE. I have just tried over and found differences over JOIN query. Hive resulted the same as My Sql but Pig resulted some ...
    Yogesh dhariYogesh dhari
    Oct 22, 2012 at 6:22 pm
    Oct 23, 2012 at 4:20 am
  • Hello, I wonder if M/R jobs compiled from pig script support pipeline between jobs. For example, let's assume there are 5 independent consecutive M/R jobs doing some joining and aggregating task. My ...
    W WW W
    Oct 22, 2012 at 10:35 am
    Oct 22, 2012 at 3:31 pm
  • Hi, I would like to set the HDFS block size of my pig scripts output files. How do I do that? I tried to use PIG_OPTS="-Dpig.path.block.size=1048576"; which seemed to me the only appropriate option I ...
    Johannes SchwenkJohannes Schwenk
    Oct 15, 2012 at 10:05 am
    Oct 22, 2012 at 2:02 am
  • I have a pig job that keeps failing at near completion. After 3 runs (long ones), I've finally found something out of the ordinary in a log: Anyone have any ideas what could be causing this? Thanks ...
    Lauren BlauLauren Blau
    Oct 21, 2012 at 1:34 pm
    Oct 22, 2012 at 1:16 am
  • hi all, I am trying to learn and implement pig optimization rules, Can any one help me understanding below properities. The amount of memory allocated to bags is determined by ...
    Abhishek doddaAbhishek dodda
    Oct 16, 2012 at 3:48 am
    Oct 17, 2012 at 3:16 am
  • Greetings. I currently have two sets of data, let's call them QUERY and TARGETS. What I am currently trying to do is the following: 1. For each row in QUERY extract a 'query' property 2. For each ...
    Joshua PentonJoshua Penton
    Oct 16, 2012 at 10:06 pm
    Oct 17, 2012 at 1:39 am
  • Hello, We are using PigStorageSchema to store our results on S3 with HDFS still as the file system and we are running into issues writing out the schema file to s3. We are just loading a CSV file ...
    Meghana NarasimhanMeghana Narasimhan
    Oct 12, 2012 at 9:00 pm
    Oct 12, 2012 at 11:16 pm
  • Wanted to see if anyone else is seeing this behavior. I have a python file with a single 40 line function/UDF that seems to take 20+ minutes to get registered. I don’t see the same issue when I ...
    Duckworth, WillDuckworth, Will
    Oct 10, 2012 at 4:11 pm
    Oct 10, 2012 at 6:56 pm
  • I have a script something like DEFINE udf .. DEFINE udf2 .. IMPORT 'macros.pig' rel = calltomacro('string',$keyparam); rel2 = calltomacro('string2',$keyparam); .... if I run this with pig -p ...
    Lauren BlauLauren Blau
    Oct 9, 2012 at 12:53 am
    Oct 10, 2012 at 3:13 am
  • Hi, I have a table in format: Id: int, amount: float, true_date: chararray, time:chararray, state:chararray Fortunately, there are only two states in my db. So if I have a state as “CA” then add +1 ...
    Jamal sashaJamal sasha
    Oct 3, 2012 at 2:54 pm
    Oct 4, 2012 at 2:02 am
  • Hi, I'm having trouble figuring out how to redirect the pig logs outputted by the Grunt shell to another directory. Currently, the logs get written to directory where I execute the pig script. I ...
    Terry SiuTerry Siu
    Oct 3, 2012 at 7:08 pm
    Oct 3, 2012 at 7:20 pm
  • Hi, i am currently writing a PIG script that works with a bags of timestamp tuples. So i am basically working on a datastructure like this: (tuple(chararray)), int, bag{tuple(chararray)}) for ...
    Björn-Elmar MacekBjörn-Elmar Macek
    Oct 1, 2012 at 2:42 pm
    Oct 2, 2012 at 8:14 am
  • In my pig script I am registering some Json jackson jars which are newer than what's in hadoop default path. But what's happening is that my jar files are not being used. How can I ensure that my jar ...
    Mohit AnchliaMohit Anchlia
    Oct 31, 2012 at 1:15 am
    Nov 1, 2012 at 10:16 pm
  • I have a cogroup which effectively does a full outer join of two relations. Some of the relations are blank, so I have a FOREACH statement like grouped = COGROUP relation1 BY x, relation2 BY y ...
    David LaBarberaDavid LaBarbera
    Oct 30, 2012 at 5:23 pm
    Oct 30, 2012 at 6:30 pm
  • Hi folks, I have been experimenting with the PigStorageWithInputPath example (see ...
    Diederik van LiereDiederik van Liere
    Oct 25, 2012 at 4:25 am
    Oct 26, 2012 at 5:46 pm
  • an ideal combination but things are looking good.. but I am not able to solve this out.. filling the above steps using an example.. wrt my original query which is how do I import and package udf ...
    Jamal sashaJamal sasha
    Oct 24, 2012 at 7:45 pm
    Oct 24, 2012 at 7:55 pm
  • Hi, I have a table in hbase that I want to load all records sorted by row key which is an integer number. Here is my code: library = LOAD 'discovery_rnaseq_library' USING ...
    Oct 24, 2012 at 5:35 pm
    Oct 24, 2012 at 6:32 pm
  • Hi, I have data in form 12345,1 12346,1 output the result So basically Output is 12345,1,135 12346,1,136 How do I do this in pig?
    Jamal sashaJamal sasha
    Oct 24, 2012 at 2:47 pm
    Oct 24, 2012 at 2:58 pm
  • Hi All, Do you know the general time for next version pig release? Thanks
    Oct 22, 2012 at 7:50 am
    Oct 22, 2012 at 3:55 pm
  • Hi, all
    Zahra HajihashemiZahra Hajihashemi
    Oct 20, 2012 at 12:45 am
    Oct 20, 2012 at 2:45 am
Group Navigation
period‹ prev | Oct 2012 | next ›
Group Overview
groupuser @
categoriespig, hadoop

78 users for October 2012

Cheolsoo Park: 17 posts Dmitriy Ryaboy: 15 posts Abhishek dodda: 13 posts Jamal sasha: 12 posts Prashant Kommireddi: 9 posts Russell Jurney: 9 posts Alan Gates: 8 posts Jon Coveney: 8 posts Yogesh dhari: 8 posts Gianmarco De Francisci Morales: 7 posts Adam Kawa: 5 posts Bill Graham: 5 posts Lauren Blau: 5 posts Ruslan Al-Fakikh: 5 posts Ugljesa Stojanovic: 5 posts Yang: 5 posts Brian Stempin: 4 posts Dhaval Shah: 4 posts Julien Le Dem: 4 posts Thejas Nair: 4 posts
show more