Search Discussions

52 discussions - 266 posts

  • Hi all, How can I implement a binary search in pig? In one relation, there exists a bag whose items are sorted. And I want to check there exists a specific item in the bag. In UDF, I can't random ...
    Dec 13, 2011 at 3:55 am
    Dec 19, 2011 at 4:56 am
  • Hi all, I want to keep the pig script and storage schema separate. Is it possible to do this in a clean way? THe only way that has worked so far is to do like: AvroStorage('schema', ...
    IGZ NickIGZ Nick
    Dec 13, 2011 at 10:50 am
    Jan 2, 2012 at 8:56 am
  • I'm trying to figure out why the following pig script takes forever to run. logData = FOREACH flattenedLogData GENERATE opname, host, nanoTime, depth; opNameGroupAll = GROUP logData by opname; ...
    Cameron GandeviaCameron Gandevia
    Dec 18, 2011 at 2:31 am
    Dec 21, 2011 at 12:01 am
  • Hi, I have this script whose stage 1 has n maps where n = # of input splits (# gz files) but has 1 reducer. I need to understand why my script causes 1 reducer. When I think about how I'd do it in ...
    Ayon SinhaAyon Sinha
    Dec 6, 2011 at 7:53 am
    Dec 6, 2011 at 10:36 am
  • Hi, The pig codes are as below: raw_data = load ... as (id:chararray, weight:float); group_id = group raw_data by id; filter_spec_id = filter group_id by group == '1'; count_spec_id = foreach ...
    Dec 3, 2011 at 1:46 am
    Dec 6, 2011 at 2:17 am
  • I have a Pig job typically finish in 20 minutes. I tried Pig code from trunk, it takes more than 1 hours to finish. My input and output are on Amazon s3. One interesting thing is it takes about 40 ...
    Yang LingYang Ling
    Dec 29, 2011 at 12:02 am
    Jan 2, 2012 at 5:56 am
  • Hi, I am trying to implement a loader which is partition-aware. As prescribed, my loader implements LoadMetadata, however, getPartitionKeys is never invoked. The script is of this form: X = LOAD ...
    Stan RosenbergStan Rosenberg
    Dec 7, 2011 at 5:25 pm
    Jan 2, 2012 at 2:16 am
  • Hello, Noob here. I am trying to analyze some Nginx log files and get some aggregate stats based on date and URL. Here is the beginning of a Pig script I have (I am running this in Elastic MapReduce, ...
    Grig GheorghiuGrig Gheorghiu
    Dec 20, 2011 at 10:20 pm
    Dec 21, 2011 at 1:40 am
  • Hi All, Can someone please point me to any documentation on how exactly REGISTER copies over the job.jar to the slave machines? I have some loader UDFs and some helper methods which are utilized by ...
    Gayatri RaoGayatri Rao
    Dec 15, 2011 at 12:15 am
    Dec 15, 2011 at 6:01 pm
  • Dear All, I am doing a PoC on Lzo compression with Protobuf using elephant bird and Pig 0.8.0. I am doing this PoC on cluster of 10 nodes. I have also done indexing for the Lzo file. i have noticed ...
    Vijaya bhaskar peddintiVijaya bhaskar peddinti
    Dec 11, 2011 at 7:12 am
    Dec 12, 2011 at 7:08 pm
  • Hi, I have a EC2 box setup with Pig 0.8.1 which can run my jobs fine in local mode. So now I want to configure the NN & JT such that the job goes to the EMR cluster I've spun up. I have a local ...
    Ayon SinhaAyon Sinha
    Dec 2, 2011 at 12:13 am
    Dec 5, 2011 at 8:17 pm
  • Hi, I am trying to use Pig 0.9.1 with CDH3u1 packaged hadoop. I compiled pig without hadoop jars to avoid conflicts, and using that jar to run pig jobs. Thigns are running fine in local mode but on ...
    Rohini URohini U
    Dec 15, 2011 at 3:15 pm
    Dec 16, 2011 at 1:46 am
  • I understand that Pig Latin is a data flow language. In that sense it should be theoretically possible to execute Pig Latin in any framework though currently and it is meant to be executed in a ...
    Tharindu MathewTharindu Mathew
    Dec 14, 2011 at 12:18 pm
    Dec 15, 2011 at 3:52 pm
  • I am trying to find the most efficient way to count the total number of records in a relation. The simplest way would be to do a GROUP ALL, and then do a COUNT, but doing a GROUP ALL seems to always ...
    Austin StickneyAustin Stickney
    Dec 7, 2011 at 8:36 pm
    Dec 7, 2011 at 9:53 pm
  • I am embedding Pig Latin in Java, and want to check the error logs. Where can I find them? The program below runs fine if I don't use the line *pigServer.registerQuery("REGISTER '" +lib+ "';"); But ...
    Prashant KommireddiPrashant Kommireddi
    Dec 5, 2011 at 10:22 pm
    Dec 6, 2011 at 3:48 am
  • I was impressed by MongoDB's Pig integration enough to write about it. Thought it might be of interest to the list: http://datasyndrome.com/post/14631249157/mongodb-is-web-scale-hadoop-mongodb -- ...
    Russell JurneyRussell Jurney
    Dec 22, 2011 at 9:36 pm
    Jan 3, 2012 at 9:31 pm
  • I have a query and I want to improve on the following steps: LFV is an alias to a Custom UDF. Step 1: pruneFields = FOREACH logs GENERATE LFV(row, 'organizationId') as orgId, LFV(row, 'userId') as ...
    Prashant KommireddiPrashant Kommireddi
    Dec 7, 2011 at 10:06 pm
    Dec 14, 2011 at 6:15 am
  • Hi, I was trying out a simple example script uisng ORDER and it doesnt seem to work. Does any one seem to know if there is any error with ORDER? raw = LOAD '$input' USING PigStorage() AS (a:int, ...
    Gayatri RaoGayatri Rao
    Dec 9, 2011 at 2:29 am
    Dec 9, 2011 at 7:24 pm
  • I want to recreate PigPen as a web app. Can to has Illustrate or some kind of sample records in PigStats? That is all that is missing to make this trivial. Looking at this: ...
    Russell JurneyRussell Jurney
    Dec 28, 2011 at 6:46 pm
    Jan 3, 2012 at 5:39 pm
  • when using -param input=s3n://foo/bar/baz/*/ blah.pig it throws java.lang.NullPointerException at ...
    Ayon SinhaAyon Sinha
    Dec 15, 2011 at 7:18 pm
    Dec 28, 2011 at 7:27 pm
  • Hi, I am using a static HashMap in EvalUDF which needs configuration, so I am initializing it in exec method checking if it is null @Override public String exec(Tuple input) throws IOException { ...
    Rohini URohini U
    Dec 20, 2011 at 11:32 pm
    Dec 21, 2011 at 10:32 pm
  • I see a lot of cases when users store data as SequenceFiles and would like to parse through the Value (ONLY if Text/DataByteArray) similar to PigStorage(String delim). For example, lets say the logs ...
    Prashant KommireddiPrashant Kommireddi
    Dec 14, 2011 at 11:12 pm
    Dec 15, 2011 at 1:40 am
  • TOKENIZE UDF parses input based on defaults " \",()*" We should have a UDF that takes in a delimiter as argument and parses based on that. Thoughts? -Prashant
    Prashant KommireddiPrashant Kommireddi
    Dec 14, 2011 at 11:44 pm
    Dec 15, 2011 at 12:12 am
  • Hi all, I have a strange problem running pig using one of my own UDFs (in java). My java UDF calls a one of my util methods which is also part of the same UDF jar. Now, I try to run in local mode, it ...
    Gayatri RaoGayatri Rao
    Dec 12, 2011 at 3:57 pm
    Dec 12, 2011 at 9:08 pm
  • Hello guys, I have a group of users aggregated by day period, lets assume there are 10mill users and each user has got 1000 transactions. Now is there a way to iterate through this group NOT foreach, ...
    Marek MiglinskiMarek Miglinski
    Dec 2, 2011 at 12:39 pm
    Dec 9, 2011 at 1:33 pm
  • In pig8, the following worked: bag_of_stuff = load 'thing' as (x:int); a = group bag_of_stuff all; b = foreach a generate FLATTEN((IsEmpty(bag_of_stuff) ? null : bag_of_stuff)) as stuff:int; dump b; ...
    Jonathan CoveneyJonathan Coveney
    Dec 6, 2011 at 1:04 am
    Dec 6, 2011 at 10:47 pm
  • This is all I got during the run. What does it mean? Now I have to debug line by line. Any hint is greatly appreciated. 2011-12-05 17:28:01,854 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR ...
    Ayon SinhaAyon Sinha
    Dec 6, 2011 at 1:40 am
    Dec 6, 2011 at 5:04 am
  • Hi, I was wondering if there is a place to store common macros and global parameters in pig (pigrc?). This should be available to all the users accessing pig via grunt or script. Please let me know ...
    Aniket MokashiAniket Mokashi
    Dec 22, 2011 at 11:40 pm
    Jan 1, 2012 at 8:46 am
  • I recently switched to pig 0.9.1 and noticed it runs slower than previous version (like 0.6 which was only recent version supported on Amazon couple of months ago) in local mode. Haven't tried the ...
    Dexin WangDexin Wang
    Dec 19, 2011 at 10:24 pm
    Dec 22, 2011 at 1:01 am
  • Hi I would like to load Json data remove any duplicates and write it back as json. I am using the elephant-bird json libraries but I cant figure out how to project a map. DEFINE JsonLoader ...
    Cameron GandeviaCameron Gandevia
    Dec 17, 2011 at 12:58 am
    Dec 17, 2011 at 10:18 pm
  • Hi I am currently injesting lzo compressed log files, running the lzo indexer on them and then running a bunch of pig scripts. The typical size of the lzo files are around 100mb. I am wondering if I ...
    Cameron GandeviaCameron Gandevia
    Dec 16, 2011 at 8:07 pm
    Dec 17, 2011 at 1:57 am
  • Hi All, I'm trying to play with counters with PigServer and have a couple issues. First, I've found very little documentation on how to do this, so I'm not sure if the method I'm trying is the good ...
    Charles MenguyCharles Menguy
    Dec 6, 2011 at 9:49 pm
    Dec 12, 2011 at 8:36 pm
  • I know regex matching is provided via EXTRACT, but is there any such functionality for regex substitution? Or do I need to write my own UDF for that? Thanks, Grig
    Grig GheorghiuGrig Gheorghiu
    Dec 21, 2011 at 8:08 pm
    Jan 3, 2012 at 2:57 pm
  • Hi everyone, I am trying to build Pig from SVN trunk on hadoop 0.20.205. While doing that, I am getting the following error : Any idea why its happening ? Thanks, Praveenesh root@lxe ...
    Praveenesh kumarPraveenesh kumar
    Dec 30, 2011 at 5:29 am
    Dec 31, 2011 at 2:54 am
  • Is there a UDF that could be used to check if a string is numeric (or an Integer). This would be nice to have (if we don't already) as part of Piggybank. A lot of other tools such as Splunk, AbInitio ...
    Prashant KommireddiPrashant Kommireddi
    Dec 20, 2011 at 7:40 pm
    Dec 20, 2011 at 10:00 pm
  • This might get outdated quickly as EMR upgrades the Pig version and Pig 0.9.1 is being used by everyone anyway. But here is my write-up for your review: The main obstacles for running Pig on Elastic ...
    Ayon SinhaAyon Sinha
    Dec 16, 2011 at 8:03 pm
    Dec 17, 2011 at 12:27 am
  • I noticed the property "aggregate.warning" is not being set by default when running PigServer, embedding Pig in Java. I was initially creating a PigServer object this way: PigServer pigServer = new ...
    Prashant KommireddiPrashant Kommireddi
    Dec 9, 2011 at 7:29 am
    Dec 13, 2011 at 1:43 am
  • I wrote up a pig-centric tutorial about how to get started collecting data with Avro, processing it with Pig and publishing it with Voldemort/Sinatra: ...
    Russell JurneyRussell Jurney
    Dec 7, 2011 at 10:41 pm
    Dec 8, 2011 at 4:37 am
  • hi, just wondering if there is any String function that returns the length of the string like JAVA does: string.length()? I use pig 0.8.1
    Dan YiDan Yi
    Dec 1, 2011 at 10:10 pm
    Dec 2, 2011 at 6:14 pm
  • Hello, is there _any_ way to specify an empty byte array (but not NULL)? There also seems to be no way to specify byte array constatnts or convert other constants to bytearray. Is there any reason ...
    Dmitriy LyubimovDmitriy Lyubimov
    Dec 22, 2011 at 12:52 am
    Dec 22, 2011 at 1:01 am
  • I'm seeing some strange behavior but I don't know if it's a bug. I have a pig script that looks something like: REGISTER myjar.jar raw = LOAD 'mydata' USING myLoader(); partial = FOREACH raw GENERATE ...
    Adam PortleyAdam Portley
    Dec 16, 2011 at 1:43 am
    Dec 16, 2011 at 4:21 am
  • In the pig9 branch in svn, running this gives me an error: a = load 'thing' as (x:int); b = group a by x; c = foreach b generate group as x, COUNT(a) as count; d = limit (order c by count DESC) 2000; ...
    Jonathan CoveneyJonathan Coveney
    Dec 13, 2011 at 10:00 pm
    Dec 14, 2011 at 1:57 pm
  • I am seeing something weird with running Pig embedded in Java. Basically the script exits without any information. Here are the steps I am following: pkommireddi@pkommireddi-wsl:~/misc/pig$ echo ...
    Prashant KommireddiPrashant Kommireddi
    Dec 7, 2011 at 10:43 pm
    Dec 8, 2011 at 5:42 pm
  • can anyone tell me why the following won't work i have bags like this: x: (utm_source,3) (sprint_&utm_medium,3) (banner&utm_campaign,3) (sprint,3) i wanna filter out all the bags with 'utm' included ...
    Dan YiDan Yi
    Dec 1, 2011 at 10:49 pm
    Dec 1, 2011 at 10:59 pm
  • Hello, Everyone: I am trying to run the Pig e2e test. I found the source from repo http://svn.apache.org/repos/asf/pig/branches/branch-0.9/test/e2e/ And there is a post describes how to kick of the ...
    Zhang XiaoyuZhang Xiaoyu
    Dec 1, 2011 at 4:49 am
    Dec 1, 2011 at 9:33 pm
  • Hey guys, I'm running a hadoop-pig job, and it fails after an hour :( on the last mapper, with what appears to me as a memory leak in log4j - could this be the case??? My job has 3370 mappers, and ...
    Hadanny, IdoHadanny, Ido
    Dec 29, 2011 at 12:02 am
    Dec 29, 2011 at 12:02 am
  • Hi, I have been having trouble having figure out how to set up classpath on child jvm. Is there a way for me to set up classpath of the child jvm in pig? Can some one please point me to any sample ...
    Gayatri RaoGayatri Rao
    Dec 15, 2011 at 1:47 am
    Dec 15, 2011 at 1:47 am
  • I'll file a bug if this is a bug, but here's an example of a script that will generate the error: A = LOAD 'thing1' as (x:chararray); B = LOAD 'thing2' AS (y:long); C = LOAD 'thing3' AS ...
    Jonathan CoveneyJonathan Coveney
    Dec 13, 2011 at 11:04 pm
    Dec 13, 2011 at 11:04 pm
  • Hi, I noticed the following as I'm learning Pig, which I thought I'd share to get some insight on. It seems to me that Pig cannot (automatically) distinguish between an empty tuple and a tuple with a ...
    Scott GorlinScott Gorlin
    Dec 13, 2011 at 4:48 pm
    Dec 13, 2011 at 4:48 pm
  • Hello, I am compiling pigmix and encouter the following error: compile-sources-all-warnings: [javac] Compiling 719 source files to /home/kereno/Desktop/pig-0.8.1-test/build/classes * [javac] ...
    Keren OuaknineKeren Ouaknine
    Dec 6, 2011 at 11:45 pm
    Dec 6, 2011 at 11:45 pm
Group Navigation
period‹ prev | Dec 2011 | next ›
Group Overview
groupuser @
categoriespig, hadoop

49 users for December 2011

Prashant Kommireddi: 43 posts Dmitriy Ryaboy: 40 posts Jonathan Coveney: 19 posts Ayon Sinha: 18 posts 唐亮: 16 posts Gayatri Rao: 11 posts Thejas Nair: 11 posts Cameron Gandevia: 10 posts Russell Jurney: 9 posts Daniel Dai: 6 posts Grig Gheorghiu: 6 posts Rohini U: 6 posts Aniket Mokashi: 5 posts IGZ Nick: 5 posts Ruslan Al-fakikh: 5 posts Stan Rosenberg: 5 posts Alan Gates: 3 posts Bill Graham: 3 posts Charles Menguy: 3 posts Gianmarco De Francisci Morales: 3 posts
show more