Grokbase Groups Pig user January 2011

Search Discussions

85 discussions - 320 posts

  • Hi, I need to implement an application that is iterative in nature. At the end of each iteration, I need to take the result and provide it as an input for the next iteration. Embedding PIG statements ...
    Deepak N85Deepak N85
    Jan 10, 2011 at 11:10 am
    Jan 13, 2011 at 1:42 pm
  • Hi, I'm processing gzipped compressed files in a directory, but some files are corrupted and can't be decompressed. Is there a way to skip the bad files with a custom load func? -Kim
    Kim VogtKim Vogt
    Jan 25, 2011 at 10:55 pm
    Jan 26, 2011 at 2:08 am
  • Hi everyone. In considering Pig for our HBase querying needs, I've run into a discrepancy between the size of Pig's result set and the size of the table being queried. I hope this is due to a ...
    Ian StevensIan Stevens
    Jan 5, 2011 at 9:14 pm
    Jan 27, 2011 at 9:30 am
  • Hello list, I've installed the LZO codecs ( and now I'm looking into using LZO in Pig. Elephant Bird ( seems to ...
    Evert LammertsEvert Lammerts
    Jan 12, 2011 at 8:12 pm
    Jan 20, 2011 at 8:44 pm
  • So I have a udf, let's call it myudf.bag2bag, which takes a bag which contains "prop," and creates a new bag of tuples based on that. I have data in the form of id prop other1 other2 If all I care ...
    Jonathan CoveneyJonathan Coveney
    Jan 10, 2011 at 5:56 pm
    Jan 11, 2011 at 2:03 am
  • I've been able to isolate the problem, but have no idea what is causing it. The input is in this form (this is correct): {({(a),(b),(c)}),({(a),(b),(c)}),({(a),(b),(c)})} and the output is in this ...
    Jonathan CoveneyJonathan Coveney
    Jan 25, 2011 at 6:25 pm
    Feb 14, 2011 at 6:03 pm
  • I'm looking for some suggestions and ideas for how to handle JAR dependencies in a production environment. Most of the pig scripts I write require multiple JAR files. For instance, I have a pig ...
    Geoffrey GallawayGeoffrey Gallaway
    Jan 19, 2011 at 10:24 pm
    Jan 22, 2011 at 1:04 am
  • Hello, I have a pig script that uses piggy bank to calculate date differences. Sometimes, when I get a wierd date or wrong format in the input, the script throws and error and aborts. Is there a way ...
    Hadoop n00bHadoop n00b
    Jan 11, 2011 at 5:58 am
    Jan 17, 2011 at 12:57 pm
  • Hi, so it seems to be more efficient if storing to hbase partitions by regions and orders by hbase keys. I see that pig 0.8 (pig-282) added custom partitioner in a group but i am not sure if order is ...
    Dmitriy LyubimovDmitriy Lyubimov
    Jan 24, 2011 at 7:48 pm
    Jan 25, 2011 at 2:47 am
  • I'd be interested to hear people's experience / best practices for running pig scripts on demand from a web app. What do you use as the calling mechanism? how to you handle priority / scheduling for ...
    Jan 11, 2011 at 12:07 am
    Jan 13, 2011 at 4:46 am
  • Hi all, I'm writing a bit of code to grab some logfiles, parse them, and run some sanity checks on them (before subjecting them to further analysis). Naturally, logfiles being logfiles, they ...
    Kris CowardKris Coward
    Jan 28, 2011 at 12:41 am
    Jan 28, 2011 at 10:05 pm
  • Hi, I get an error when I try to register my python udf. Why is this happening? grunt Register '' USING jython AS udf 2011-01-07 19:39:31,818 [main] ERROR - ...
    Deepak N85Deepak N85
    Jan 7, 2011 at 2:27 pm
    Jan 7, 2011 at 5:35 pm
  • A is (val:int) B is (thing:chararray, min:int, max:int) Basically what I want is C = (val, thing) where val is between min and max for that thing. In sql the syntax for this would not be hard, in pig ...
    Jonathan CoveneyJonathan Coveney
    Jan 27, 2011 at 1:10 am
    Jan 27, 2011 at 5:01 pm
  • Hi, I want to write a python udf to split string into bags ------------------------------------------------------------ #!/usr/bin/python import re @outputSchema("y:bag{t:tuple(word:chararray)}") def ...
    Xiaomeng WanXiaomeng Wan
    Jan 24, 2011 at 11:54 pm
    Jan 26, 2011 at 9:42 pm
  • Hi All, I'm looking into embedding pig latin in a host language using pig trunk. so far, basic features work fine for me. but I need to know how can I get result tuples from the stored bag. I need to ...
    Jan 18, 2011 at 5:34 am
    Jan 20, 2011 at 1:40 am
  • I see there are some builtin string functions, but I don't know how to use them. I got this error when I follow the examples: grunt REGEX_EXTRACT_ALL('', '(.*)\:(.*)'); 2011-01-12 ...
    Dexin WangDexin Wang
    Jan 13, 2011 at 1:44 am
    Jan 14, 2011 at 2:25 am
  • I've written a regular expression EvalFunc similar to ExtractAll except this is called FindAll. It returns a tuple of all strings found that match the given pattern. The syntax looks like this: A = ...
    Xavier StevensXavier Stevens
    Jan 27, 2011 at 8:54 pm
    Feb 4, 2011 at 12:00 am
  • I wonder if discussion of the Piggybank and other User Defined Fields is best done here (since it is *using* Pig) or on the Development list (because it is enhancing Pig). I'm trying to load some ...
    Alex McLintockAlex McLintock
    Jan 29, 2011 at 12:13 pm
    Jan 30, 2011 at 10:24 pm
  • *Hey Guys,* * * *I was just wondering if any of you might have come across the FileSystem closed error message as below:* Filesystem closed at ...
    Robert WaddellRobert Waddell
    Jan 27, 2011 at 2:47 pm
    Jan 28, 2011 at 12:06 am
  • Hi, i'd be grateful if you could help me with this little but quite annoying detail When i pass specifications to loadFunc, the string argument in case of wide input specifications can become quite ...
    Dmitriy LyubimovDmitriy Lyubimov
    Jan 22, 2011 at 12:50 am
    Jan 22, 2011 at 1:32 am
  • Hi, Hope there is some simple answer to this. I have bunch of rows, for each row, I want to add a column which is derived from some existing columns. And I have large number of columns in my input ...
    Dexin WangDexin Wang
    Jan 12, 2011 at 10:52 pm
    Jan 12, 2011 at 11:44 pm
  • I will implement this if I need to, but it seems to me that SOMEBODY has to have run into this. I don't know if it's possible, but it's worth asking... Basically I have a hadoop cluster of X servers, ...
    Jonathan CoveneyJonathan Coveney
    Jan 7, 2011 at 10:35 pm
    Jan 9, 2011 at 8:35 am
  • I wrote this up for LinkedIn Hadoop Users today, figured it was worth sharing. If you have any other tips, or edits, please submit and I'll put these in a wiki some place: /* Russell's philosophy of ...
    Russell JurneyRussell Jurney
    Jan 7, 2011 at 7:45 am
    Jan 8, 2011 at 3:02 am
  • I wasn't quite sure what title this, but hopefully it'll make sense. I have a couple of questions relating to a query that ultimately seeks to do this You have 1 10 1 12 1 15 1 16 2 1 2 2 2 3 2 6 You ...
    Jonathan CoveneyJonathan Coveney
    Jan 4, 2011 at 7:22 pm
    Jan 4, 2011 at 11:24 pm
  • So I have a relation apa which when DUMPed, ends up getting output just fine, but when I run STORE apa INTO '/rawfiles/f3453efd460348bbaeee2e9496e25871/1294311600/apa' USING PigStorage(','); I get ...
    Kris CowardKris Coward
    Jan 31, 2011 at 6:30 pm
    Feb 1, 2011 at 1:55 am
  • Hello, Lets say I have two tables like A: 1,11 2,15 and B: 1,10 4,11 5,10 and joinin them J = JOIN A by $0 FULL, B by $0 I get J: 1,11,1,21 2,16,, ,,4,11 ,,5,10 which is a full outer join: what I ...
    Cam BazzCam Bazz
    Jan 30, 2011 at 11:11 am
    Jan 30, 2011 at 4:25 pm
  • Running this script: data = LOAD '$TABLE' USING HBaseStorage('$CF:field_1'); DUMP data; fails with the following error: Failed Jobs: JobId Alias Feature Message Outputs N/A data MAP_ONLY Message: ...
    Jacob PerkinsJacob Perkins
    Jan 27, 2011 at 10:59 pm
    Jan 28, 2011 at 12:01 am
  • Pig 0.8 executes my script by running six jobs. One of them is identified as "MAP_ONLY" and it always fails, with the innermost error I can find either saying "GC overhead limit exceeded" or "Java ...
    Greg LangmeadGreg Langmead
    Jan 26, 2011 at 10:49 pm
    Jan 27, 2011 at 6:37 pm
  • Hi all, I'm running a Pig script in local mode, and it finishes successfully. When I use the same dataset and script to run pig in its distributed mode, it hangs at 90% and the hadoop processes in ...
    Martin ZMartin Z
    Jan 26, 2011 at 3:45 pm
    Jan 26, 2011 at 5:03 pm
  • Howdy, I think I saw in one post that some people use TextMate, but what do those among you who use Windows develop in? Is there any syntax highlighting for vim, notepad++, anything common on windows ...
    Jonathan CoveneyJonathan Coveney
    Jan 25, 2011 at 6:35 pm
    Jan 25, 2011 at 6:44 pm
  • Hello, I have rigged my web application so it generates some sort of custom access log. Each line in my access log has the ipnumber, sessionCookie, idOfPage. How can i count unique visits to per ...
    Cam BazzCam Bazz
    Jan 17, 2011 at 1:03 am
    Jan 17, 2011 at 2:24 am
  • Hello, I'm looking for some clues to help me fix an annoying error I'm getting using Pig. I need to parse a large JSON file so I grabbed kimsterv's ( JSON loader, ...
    Geoffrey GallawayGeoffrey Gallaway
    Jan 10, 2011 at 4:48 am
    Jan 12, 2011 at 9:25 pm
  • Hi, I have a python UDF, used by a PIG Script. I get a parsing error for some reason. ------------ REGISTER '/path/to/' USING jython AS udf; records = LOAD 'path/to/data' AS ...
    Deepak N85Deepak N85
    Jan 7, 2011 at 6:31 pm
    Jan 8, 2011 at 1:47 pm
  • I'm not sure if this can be done at the UDF level, or if it'd have to be done lower level. Imagine you have a good candidate for a replicated join, but beyond that you know most about the structure ...
    Jonathan CoveneyJonathan Coveney
    Jan 28, 2011 at 3:36 pm
    Feb 7, 2011 at 6:43 pm
  • Hi, I found similar problems on the web but didn't find a solution for it so I'm asking here. I have some pig job that has been working fine for couple of months and it started failing. But the same ...
    Dexin WangDexin Wang
    Jan 31, 2011 at 9:55 pm
    Jan 31, 2011 at 11:16 pm
  • I'm having problems getting HBaseStorage to work with Pig 0.8 and HBase 0.89. I applied the patch in PIG-1680. Here's the error I see in the JobTracker logs: 2011-01-24 22:51:25,764 INFO ...
    Jan 25, 2011 at 12:15 am
    Jan 25, 2011 at 1:35 am
  • I am seeing this stack when running a script that runs fine in 0.5.0, 0.6.0 and 0.7.0. Is this a known issue? ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false. ...
    Kaluskar, SanjayKaluskar, Sanjay
    Jan 20, 2011 at 11:14 am
    Jan 24, 2011 at 7:32 pm
  • Good news. Pig 0.8.0 now is available through maven repository: Thanks -- Richard
    Richard DingRichard Ding
    Jan 22, 2011 at 12:01 am
    Jan 24, 2011 at 2:21 pm
  • Hi, Is GROUP order-preserving? It is not mentioned in the Pig Latin doc while it does mention that UNION is not. Thanks, T
    Todd LeeTodd Lee
    Jan 17, 2011 at 10:14 pm
    Jan 18, 2011 at 4:53 pm
  • I am looking for a pointer to where I should place the following functionality. I have a Web archive on a remote server, which provides large data record sets fragmented into into 2GB, gzipped files. ...
    Andreas PaepckeAndreas Paepcke
    Jan 17, 2011 at 1:07 am
    Jan 17, 2011 at 6:21 am
  • Hello, I was searching online to find more info about network marketing and I came across your information. Can you tell me, are you still involved with your company? If you are, how are things going ...
    Frank E. Calabro JrFrank E. Calabro Jr
    Jan 15, 2011 at 3:32 pm
    Jan 15, 2011 at 3:47 pm
  • I did a search online, and while someone had the same error, I don't think it was related. From the error log, I see this... Caused by: java.lang.RuntimeException: Final function of ...
    Jonathan CoveneyJonathan Coveney
    Jan 14, 2011 at 11:06 pm
    Jan 15, 2011 at 12:02 am
  • I'm just curious how people usually interact with Maps? Do you write UDFs that do interesting things with the maps and simply take advantage of the fact that pig can deal with datatypes, or do you ...
    Jonathan CoveneyJonathan Coveney
    Jan 11, 2011 at 5:24 pm
    Jan 11, 2011 at 9:41 pm
  • Hi Everybody, I'm a soon-to-graduate student of computer science at the Univeristy of Wrocław in Poland. Currently I'm starting to write my master thesis and I'm looking for some inspirations/ideas. ...
    Michał AnglartMichał Anglart
    Jan 4, 2011 at 9:22 pm
    Jan 8, 2011 at 11:19 pm
  • Hey Guys, I am running into a bit of trouble, and I know its something that must be commonly done. I have created a loader function which uses external JARs, which is fine when ran in local mode; the ...
    Jan 7, 2011 at 10:00 pm
    Jan 7, 2011 at 10:37 pm
  • Hi, I've got an outer bag/relation consistig of a bunch of user information, one of the pieces of which is an inner bag of possible events for that user, and the value of those events, should they ...
    Kris CowardKris Coward
    Jan 7, 2011 at 5:21 pm
    Jan 7, 2011 at 6:53 pm
  • Hi, I don't seem to understand the OutputSchema constructs given in the documentation. What is the significance of the letters 'x', 't', and the parentheses '{}' & '()' ...
    Deepak N85Deepak N85
    Jan 7, 2011 at 1:04 pm
    Jan 7, 2011 at 5:36 pm
  • I have a custom LoadFunc (I'm actually just extending PigStorage) that has some added logic to spider a given path and pick out the paths that I want. I am currently doing the spidering in ...
    Eric TschetterEric Tschetter
    Jan 4, 2011 at 7:52 pm
    Jan 6, 2011 at 12:14 am
  • Fellow Pig users, Join me in extending warm congratulations to the newest committer to the Pig project, Julien Le Dem. Julien has done outstanding work on extending Pig's reach into scripting ...
    Dmitriy RyaboyDmitriy Ryaboy
    Jan 4, 2011 at 1:21 am
    Jan 4, 2011 at 4:52 pm
  • Newbie issue. I find myself wanting a spillable hashmap facility within my UDFs. Maybe I'm still not thinking hadoopy enough. But hashmaps are often convenient as temporary tools when operating over ...
    Andreas PaepckeAndreas Paepcke
    Jan 1, 2011 at 5:10 pm
    Jan 1, 2011 at 10:22 pm
Group Navigation
period‹ prev | Jan 2011 | next ›
Group Overview
groupuser @
categoriespig, hadoop

72 users for January 2011

Dmitriy Ryaboy: 62 posts Jonathan Coveney: 39 posts Deepak N85: 17 posts Alan Gates: 14 posts Dmitriy Lyubimov: 11 posts Jacob Perkins: 10 posts Kim Vogt: 8 posts Dexin Wang: 7 posts Kris Coward: 7 posts Thejas M Nair: 7 posts 김영우: 6 posts Andreas Paepcke: 6 posts Cam Bazz: 6 posts Daniel Dai: 5 posts Richard Ding: 5 posts Julien Le Dem: 4 posts Kaluskar, Sanjay: 4 posts Mr. Lukas: 4 posts Robert Waddell: 4 posts Xiaomeng Wan: 4 posts
show more