FAQ

Search Discussions

34 discussions - 134 posts

  • Hi, I am trying to load a large gzip file and process using pig. Everytime I run the following script, I get outofmemory errors. The hadoop-site.xml is attached. The pig and the hadoop jobtracker ...
    Irfan MohammedIrfan Mohammed
    Sep 8, 2009 at 2:58 pm
    Sep 10, 2009 at 3:56 am
  • I'm having trouble thinking about to use pig to do a pairwise document similarity. If i have a huge list of word counts: (using dummy names to explain easier) doc_id, word, count doc1, testword1, ...
    Tommy ChhengTommy Chheng
    Sep 9, 2009 at 8:22 pm
    Sep 10, 2009 at 7:37 am
  • Hi all, I'm becoming a bit more comfortable writing scripts, but still not always sure what the best way to structure/frame my statements in order to optimize performance. When it comes to Split and ...
    Zaki rahamanZaki rahaman
    Sep 3, 2009 at 8:04 pm
    Sep 9, 2009 at 2:35 am
  • Hello, In the process of to trying to add the support for HBase 0.20.0 in PIG (trunk) I was trying the tutorial from PIG documentation: http://hadoop.apache.org/pig/docs/r0.3.0/udf.html#Custom+Slicer ...
    Vincent BARATVincent BARAT
    Sep 14, 2009 at 1:39 pm
    Sep 17, 2009 at 8:08 pm
  • Hi, I have a pig script reading/writing to S3. $ export PIG_OPTS="-Dfs.default.name=s3n://bucket_1/"; $ pig r0 = LOAD 'input2/transaction_ar20090909_14*' using PigStorage('\u0002'); r1 = FILTER r0 by ...
    Irfan MohammedIrfan Mohammed
    Sep 9, 2009 at 10:03 pm
    Sep 15, 2009 at 12:21 am
  • I'm writing a document similarity script. I created an inverted index and trying to create a pairwise comparison list with docs of shared words. I got my data reduced my data to : ({(233),(534)}) ...
    Tommy ChhengTommy Chheng
    Sep 4, 2009 at 1:48 am
    Sep 4, 2009 at 6:49 pm
  • I want to read from s3 and write to local hdfs. But when I am setting -Dfs... it is setting for all load/store pig latin statements. Is there a way I can instruct pig to read from a s3 filesystem ( ...
    Prasenjit mukherjeePrasenjit mukherjee
    Sep 24, 2009 at 5:53 pm
    Sep 29, 2009 at 4:22 pm
  • Okay... I can see why some of this would be "Removing duplicate links to the documentation per discussion on the user list." ... but why leave that vendor link there? That's already on the Pig home ...
    Greg SteinGreg Stein
    Sep 11, 2009 at 9:09 pm
    Sep 11, 2009 at 9:59 pm
  • Hi everyone! Recently, I updated my HDFS from 0.18.0 to 0.20.1 and installed PigPen (0.04) for Eclipse Galileo under Windows. With the older version of Hadoop (0.18) the plugin worked correctly and I ...
    Alberto Luengo CabanillasAlberto Luengo Cabanillas
    Sep 30, 2009 at 11:07 pm
    Oct 1, 2009 at 8:54 am
  • Hi, I would like to use Pig Latin to process the contents of Nutch files. I understand that the Nutch crawler stores the contents of pages crawled in MapFile's, that is, a binary database. MapReduce ...
    Guillermo GarridoGuillermo Garrido
    Sep 8, 2009 at 4:51 pm
    Sep 15, 2009 at 4:25 pm
  • It is probably more appropriate for hadoop forum. I am looking for a web-based debugging interface for hadoop on Amazon/EC2. Is there a way I can see ( in Amazon/EC2 ) what is the current status of ...
    Prasenjit mukherjeePrasenjit mukherjee
    Sep 15, 2009 at 12:49 am
    Sep 15, 2009 at 2:42 am
  • Hello, I'm new to pig, I use it on MacOS, and I wonder if there is a way to avoid the double log traces in the grunt console, and if there is a way to make the ^D key work (the DEL key). I think this ...
    Vincent BARATVincent BARAT
    Sep 11, 2009 at 10:29 am
    Sep 11, 2009 at 4:50 pm
  • I found the following in pig-docs : A = LOAD 'data'; B = STREAM A THROUGH `stream.pl -n 5`; What does stream.pl do ? Can I get a simpler example of stream ? -Thanks, Prasenjit
    Prasenjit mukherjeePrasenjit mukherjee
    Sep 7, 2009 at 6:59 pm
    Sep 7, 2009 at 8:51 pm
  • Hi, everyone. I am trying to use pig, but now I am confused about what the difference is between Pig and Hive, which is more powerful or more convenient for users. In addition, what is their ...
    刘祥龙刘祥龙
    Sep 8, 2009 at 2:21 pm
    Sep 19, 2009 at 8:55 pm
  • Hello, For those who need this, I have attached to this email a small PIG patch to support HBase 0.20.0. It can be applied on the trunk as of today. It is a minimal patch that only modifies the ...
    Vincent BARATVincent BARAT
    Sep 18, 2009 at 1:34 pm
    Sep 19, 2009 at 4:58 pm
  • Hi, I have a file which uses non-default delimiters. Here is an example: type::click;;date::Sun Apr 26 23:57:20 CDT 2009;; It uses '::' to delimit between the field name, and the value, and ';;' to ...
    Craig HamiltonCraig Hamilton
    Sep 18, 2009 at 11:35 pm
    Sep 19, 2009 at 1:43 am
  • I am trying to track my pig job's progress till now via inspecting hadoop/pig logs. I see the following set of logs in my hadoop logdir : 1. seemingly Hadoop's logs : datanode, jobtracker, namenode, ...
    Prasenjit mukherjeePrasenjit mukherjee
    Sep 16, 2009 at 1:02 am
    Sep 16, 2009 at 3:26 pm
  • I would like to filter a set of rows ( of groups ) based on its arity of a specified field . Can I do this in my pig script : r1 = LOAD ...... AS f1,f2,f3; gr1 = GROUP r1 BY (f1,f2); gr1 = FILTER gr1 ...
    Prasenjit mukherjeePrasenjit mukherjee
    Sep 14, 2009 at 10:29 pm
    Sep 15, 2009 at 12:36 pm
  • Hello, I'm trying to use pig (trunk) with hbase 0.20.0 and hadoop 0.20.0. No success so far (I'm still unable to make the TestHBaseStorage unit test run correctly, and I'm lost in the code ...
    Vincent BARATVincent BARAT
    Sep 11, 2009 at 9:19 am
    Sep 11, 2009 at 4:47 pm
  • I have a set of logfiles that I'm parsing and analyzing using Pig in various ways. As of right now, for each different dimension (time, geography, etc.) I am writing a new script each time to ...
    Zaki rahamanZaki rahaman
    Sep 3, 2009 at 8:09 pm
    Sep 8, 2009 at 6:15 pm
  • Hi all, Is there a provision to turn off speculative execution in Pig similar to hadoop. If yes, can someone kindly let me know how to do that. I tried to search over the web but couldn't find any ...
    Palleti, PallaviPalleti, Pallavi
    Sep 2, 2009 at 9:26 am
    Sep 2, 2009 at 1:44 pm
  • I am randomly getting the following stack trace, and having otherwise simple jobs fail. We're running the version of pig-0.4 that comes with Cloudera's new 0.20.1-based distro, so I know it's a ...
    Kevin WeilKevin Weil
    Sep 25, 2009 at 7:37 pm
    Sep 25, 2009 at 8:11 pm
  • Hi, As you know, a lot of work this year went into performance optimization of Pig. One of the main sources of performance problems is high memory usage. In an effort to address this problem we ...
    Olga NatkovichOlga Natkovich
    Sep 11, 2009 at 6:56 pm
    Sep 21, 2009 at 8:01 pm
  • Any idea why I get this following error? When I run pig script in local mode it works but when i run it in hadoop/mapreduce mode it throws the following error: ERROR org.apache.pig.tools.grunt.Grunt ...
    Seshadri bashyamSeshadri bashyam
    Sep 10, 2009 at 4:22 pm
    Sep 11, 2009 at 4:30 pm
  • So for a given pig script 'A = LOAD 'test' USING PigStorage(',') AS (...); B = GROUP A BY (FOO,BAR); Is their a way to dump each different foo,bar group into a different directory?
    Alex NewmanAlex Newman
    Sep 2, 2009 at 5:32 pm
    Sep 2, 2009 at 8:17 pm
  • i'm trying to run my pig script on ec2 large instance using the Cloudera 0.18 distribution.The pig script itself works in local mode on a reduced data set. The map phase went by fast but the script ...
    Tommy ChhengTommy Chheng
    Sep 25, 2009 at 5:15 am
    Sep 25, 2009 at 5:15 am
  • Hi, all With PIG-891 checked in (credit to Jeff Zhang), Pig implements complete set of hadoop dfs commands. The syntax for dfs commands are exactly the same with hadoop (actually, Pig delegates all ...
    Daniel DaiDaniel Dai
    Sep 21, 2009 at 6:13 pm
    Sep 21, 2009 at 6:13 pm
  • Hadoop Fans, we're getting down to the wire for Hadoop World. We couldn't be happier with how the schedule has come together. For a full list of speakers and registration details, see ...
    Christophe BiscigliaChristophe Bisciglia
    Sep 15, 2009 at 9:41 pm
    Sep 15, 2009 at 9:41 pm
  • Hi all. I´ve integrated the PigPen plugin in Eclipse 3.5 (Galileo) succesfully so as the Hadoop plugin (0.18.1). Also I did execute the example script 'id.pig' (the one who splits the contents from ...
    Alberto Luengo CabanillasAlberto Luengo Cabanillas
    Sep 15, 2009 at 4:18 pm
    Sep 15, 2009 at 4:18 pm
  • Hi, There seems to be some confusion of why user documentation links were removed from the wiki page. The main reason is that they migrated to the main site (hadoop.apache.org/pig). They have ...
    Olga NatkovichOlga Natkovich
    Sep 10, 2009 at 8:03 pm
    Sep 10, 2009 at 8:03 pm
  • Message forwarded from Irfan : ---------- Forwarded message ---------- From: Irfan Mohammed <irfan.ma@gmail.com Date: Wed, Sep 9, 2009 at 11:26 AM Subject: [Fwd: Re: OutOfMemory Errors when loading a ...
    Prasenjit mukherjeePrasenjit mukherjee
    Sep 9, 2009 at 3:48 pm
    Sep 9, 2009 at 3:48 pm
  • So right off the bat, I fixed the regex patterns in my split, but what I kept getting an error from the multiquery optimize. Specifically, the following: ERROR 2146: Internal Error. Inconsistency in ...
    Zaki rahamanZaki rahaman
    Sep 4, 2009 at 6:35 pm
    Sep 4, 2009 at 6:35 pm
  • Hi, I want to make sure whether pig is integrated with Hbase. I mean, does pig support some query and other operations like create\write\append to HBase tables. I found ...
    刘祥龙刘祥龙
    Sep 3, 2009 at 2:04 pm
    Sep 3, 2009 at 2:04 pm
  • Hi, I want to make sure whether pig is integrated with Hbase. I mean, does pig support some query and other operations like create\write\append to HBase tables. I found ...
    刘祥龙刘祥龙
    Sep 3, 2009 at 8:47 am
    Sep 3, 2009 at 8:47 am
Group Navigation
period‹ prev | Sep 2009 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions34
posts134
users37
websitepig.apache.org

37 users for September 2009

Prasenjit mukherjee: 17 posts Alan Gates: 15 posts Mridul Muralidharan: 9 posts Vincent BARAT: 9 posts Zaki Rahaman: 7 posts Zjffdu: 7 posts Irfan Mohammed: 6 posts Tommy Chheng: 6 posts Dmitriy Ryaboy: 5 posts Olga Natkovich: 5 posts Ted Dunning: 5 posts Nikhil Gupta: 4 posts Ashutosh Chauhan: 3 posts 刘祥龙: 3 posts Alberto Luengo Cabanillas: 2 posts Ankur Goel: 2 posts Benjamin Reed: 2 posts Christopher Olston: 2 posts Daniel Dai: 2 posts George Pang: 2 posts
show more