Grokbase Groups Pig user May 2012

Search Discussions

62 discussions - 281 posts

  • I want to create a rdbms like sequence on a Pig relation. Is there any existing UDF which could do this? I am bit new to pig, Kindly suggest how to proceed? Thanks & Regards, -- Dipesh Kr. Singh
    May 16, 2012 at 5:42 pm
    May 28, 2012 at 10:52 pm
  • Hi all, when we're running a pig job for aggregating some amount of slightly compressed avro data (~160GByte), the time until the first actual mapred job starts takes ages: 15:27:21,052 [main] INFO ...
    Markus ReschMarkus Resch
    May 31, 2012 at 9:39 am
    Jun 21, 2012 at 12:13 am
  • I'm having problems using Pig's STRSPLIT (on Amazon's cloud computing environment). I also noticed that STRSPLIT isn't documented in the Pig Latin Reference Manual, so I found out about it through ...
    Nerius LandysNerius Landys
    May 17, 2012 at 5:57 pm
    May 19, 2012 at 3:00 am
  • We upgraded from Pig 0.8.1 to 0.10 and the following nested foreach no longer works: actionBagGrouped = GROUP actionBag BY (deal_id,month); dealCounts = FOREACH actionBagGrouped { sent = FILTER ...
    Steve BernsteinSteve Bernstein
    May 31, 2012 at 7:09 pm
    Jun 7, 2012 at 11:06 pm
  • I have something like this in pig script. But pig doesn't recognize *.jar regular expression. It seems to be looking for a file name "*.jar" Is there a way to register using regular expression? SET ...
    Mohit AnchliaMohit Anchlia
    May 16, 2012 at 11:52 pm
    May 18, 2012 at 7:42 am
  • Hello List, I am using Clouderas distribution (cdh3u3) which comes with pig-0.8.1. I have written a UDF extending FilterFunc that checks if the provided string is contained within the specified ...
    Johannes SchwenkJohannes Schwenk
    May 21, 2012 at 4:37 pm
    May 24, 2012 at 5:16 pm
  • Hi, I am new to ping and am unable to use pig builtin functions (please see details below). Is this a CLASSPATH issue? Any ideas on how to resolve? Thanks, John Details ### Line in pig script causing ...
    John MorrisonJohn Morrison
    May 17, 2012 at 3:24 pm
    May 18, 2012 at 1:45 am
  • Hi, During generating pig jar files, I found the contrib directory is not compiled. I assume maybe this is because the contrib directory is not for pig mainline functions. Am I right? Besides, if I ...
    May 23, 2012 at 3:08 am
    May 23, 2012 at 5:12 am
  • Here in Austin, we've been having a hack day for beginning to intermediate developers. Just wanted to post some slides that were from presentations here: Pig 101 - ...
    Jeremy HannaJeremy Hanna
    May 11, 2012 at 8:00 pm
    May 12, 2012 at 12:21 am
  • (Yet another basic udf question) I want my udf to take values of all the columns in a row. For example: If there are 3 records in my input file. (Tab delimited row) John 12 Jeff 33 Chin 20 Currently ...
    May 9, 2012 at 7:01 pm
    May 10, 2012 at 5:38 am
  • I'm trying to do something with Pig that I believe Pig wasn't really designed/intended to handle. Normally, the way I'd do things with Pig is by feeding it data like so: Fred 23 Adam 25 Mary 21 ...
    Nerius LandysNerius Landys
    May 21, 2012 at 11:40 pm
    May 22, 2012 at 8:47 pm
  • Hello list, I have an Hdfs file that has 6 columns that contain some data stored in an Hbase table.the data looks like this - 18.98 2000 1.21 193.46 2.64 58.17 52.49 2000.5 4.32 947.11 2.74 64.45 ...
    Mohammad TariqMohammad Tariq
    May 21, 2012 at 11:55 am
    May 22, 2012 at 9:51 am
  • Hi All, I'm attempting to load sequence files for the first using Elephant Bird's sequence file loader and having absolutely no luck. I did a hadoop fs -text one on of the sequence files and noticed ...
    Chris DiehlChris Diehl
    May 16, 2012 at 6:48 pm
    May 21, 2012 at 8:40 pm
  • We have logs in the following format us, foo us, foo fr, fizz us, bar fr, baz fr, fizz us, foo fr, fizz Where the first column is a country and the second column is a search term. How in the world ...
    May 11, 2012 at 12:24 am
    May 11, 2012 at 9:19 pm
  • Hi, I'm trying to write an EvalFunc UDF... I know that in some rare cases my data will cause my udf to take forever... for such cases, I want the udf to just ignore these cases so I was thinking ...
    Ahmed SobhiAhmed Sobhi
    May 31, 2012 at 1:47 pm
    Jun 5, 2012 at 2:26 am
  • Most Pig UDF development has moved away from Piggybank. I want to document where these UDFs are in a central place. If you know where some cool Pig macros, streaming examples or UDFs are, be they ...
    Russell JurneyRussell Jurney
    May 16, 2012 at 1:35 am
    May 22, 2012 at 12:13 am
  • Here is a sample LOAD statement from Programming Pig book: daily = load 'NYSE_daily' as (exchange:chararray, symbol:chararray, date:chararray, open:float, high:float, low:float, close:float, ...
    Saurabh SSaurabh S
    May 15, 2012 at 10:35 pm
    May 18, 2012 at 5:09 pm
  • Hi, I want to use RCfile to address the IO problem, and I can not find some paper about how to install or how to use it by PIG, so if you had some install or configue file, you could share with me ...
    May 24, 2012 at 6:26 am
    May 25, 2012 at 2:46 pm
  • Hi list, I would like to parse the following XML-File using Pig: <page <id 1</id <revision <id 1</id <username muehlburger</username </revision <revision <id 2</id <username muehlburger</username ...
    Herbert MühlburgerHerbert Mühlburger
    May 17, 2012 at 8:31 am
    May 22, 2012 at 3:36 pm
  • Hi all, We've been getting some funny outputs to some Pig jobs recently that contains a lot of duplicated data. I'm wondering if the cause of this could be Pig, or if we must have duplicates in our ...
    Brendan GillBrendan Gill
    May 18, 2012 at 3:56 pm
    May 18, 2012 at 5:00 pm
  • How can I combine multiple group by that are performed on essentially same relation? In the case below, can I do this in single foreach? e1 = load 'emp' using PigStorage() as (empid, school, ...
    Shan sShan s
    May 15, 2012 at 1:50 pm
    May 16, 2012 at 4:50 pm
  • Hi, Another newbie Pig question. If I have a relation with a structure like this: (city, { (productId, count), (product, count) }). This relation tracks counts of products for each city. So a tuple ...
    James NewhavenJames Newhaven
    May 9, 2012 at 11:57 am
    May 9, 2012 at 9:06 pm
  • Hi Everyone, This doesn't seem to be a *pig* error but I'm trying to do a join in pig when I get it. This use to work just fine but I did an update on some packages that left hadoop+pig alone .. is ...
    Nicholas KolegraffNicholas Kolegraff
    May 3, 2012 at 6:57 pm
    May 5, 2012 at 8:28 pm
  • We're analyzing session(s) using Pig and HBase, and this session data is currently stored in a single HBase table, where rowkey is a sessionid-eventid combo (tall table). I'm trying to optimize the ...
    Norbert BurgerNorbert Burger
    May 29, 2012 at 5:20 pm
    May 30, 2012 at 2:23 am
  • I rambled across this while reviewing one of Jon's patches. Here is the code from DefaultTuple /** * Construct a tuple with a known number of fields. Package level so that callers cannot directly ...
    Prashant KommireddiPrashant Kommireddi
    May 26, 2012 at 8:35 am
    May 27, 2012 at 7:50 am
  • Hi, I need to have pig scripts run automatically every certain amount of time. Also, I need to know if a script is still running so that the next one does not start until the previous one has ...
    Juan Martin PampliegaJuan Martin Pampliega
    May 26, 2012 at 6:10 pm
    May 26, 2012 at 7:36 pm
  • Hi, I found two jar files in pig-0.10.0 package: pig-0.10.0.jar and pig-0.10.0-withouthadoop.jar. I have questions about them: 1. Seems the differences between these two jars are: pig-0.10.0.jar ...
    May 21, 2012 at 3:03 am
    May 23, 2012 at 3:02 am
  • I am trying to write an UDF that indexes data in elasticsearch after converting it to JSON. I had 2 questions: 1. If I create a static member in UDF class is that one instance per mapper task? 2. Is ...
    Mohit AnchliaMohit Anchlia
    May 15, 2012 at 11:18 pm
    May 16, 2012 at 6:44 pm
  • I've created a simple HBase table (version 0.90.4-cdh3u3) and I'm attempting to query the contents with Pig (version 0.8.1-cdh3u3). grunt A = load 'test' using ...
    Neil YalowitzNeil Yalowitz
    May 16, 2012 at 12:28 am
    May 16, 2012 at 4:35 am
  • Thanks again to Twitter for doing their event and inspiring ours. I just wanted to report on some things we did in Austin for any interested. We had a good turnout of about 30 people. Kevin Safford ...
    Jeremy HannaJeremy Hanna
    May 12, 2012 at 6:23 pm
    May 12, 2012 at 9:26 pm
  • I have a bag of tuples like this: { (product, unwanted, count), (product, unwanted, count) } Is it possible in Pig to generate a new bag with a revised tuple structure with one of its columns ...
    James NewhavenJames Newhaven
    May 9, 2012 at 5:43 pm
    May 9, 2012 at 9:09 pm
  • Hi all, I try to use pigserver for the first time, I have an exception and I did not find much about it on the official website and google. I have a problem to launch a query from a Java program. The ...
    Etienne DumoulinEtienne Dumoulin
    May 31, 2012 at 2:35 pm
    Jun 1, 2012 at 9:26 am
  • Hello, I am trying to run Pig in Hadoop mode with 2 clusters. I have installed Hadoop 1.0.3 and Pig 0.10. When I run Pig statements like "foreach" or if I use "MAX or AVG" i get the following error ...
    Nikhil desaiNikhil desai
    May 29, 2012 at 9:52 pm
    Jun 1, 2012 at 9:07 am
  • Hi, I've noticed that I seem to be losing the ordering of my relation after passing the result of an ORDER BY to an EVAL function. For example: D = FOREACH C GENERATE COUNT($1) as countd; E = ORDER D ...
    James NewhavenJames Newhaven
    May 29, 2012 at 7:26 pm
    May 30, 2012 at 9:11 pm
  • Hello all, I'd like to verify output from a pig script that does not sort its results prior to output. Thus the order of the tuples in the output is non-deterministic. I would rather not add sorting ...
    Johannes SchwenkJohannes Schwenk
    May 29, 2012 at 12:36 pm
    May 30, 2012 at 9:03 am
  • For those who are writing pig scripts in kate, I have written a basic syntax highlighting file which can be found here: Installation: # mkdir ...
    Johannes SchwenkJohannes Schwenk
    May 29, 2012 at 12:44 pm
    May 30, 2012 at 8:45 am
  • Hello everyone, I have been trying to debug some macros I've written, but I'm finding that every Diagnostic Operator (describe, dump, explain, and illustrate) are all being recognized as illegal ...
    Daniel DuckworthDaniel Duckworth
    May 24, 2012 at 4:52 pm
    May 29, 2012 at 10:28 pm
  • I am trying to use an EVAL pig function (it's called BagSplit from datafu) which accepts a Bag as a parameter. The problem I have is that my current relation is a single Bag, so I'm not sure how to ...
    James NewhavenJames Newhaven
    May 28, 2012 at 12:12 pm
    May 28, 2012 at 10:16 pm
  • I need to repeatedly CROSS a data set, then FOREACH it, reduce it with a filter, then group/test it to test if it's done yet, then repeat until it is baked. How do I do that with pig, and maybe some ...
    Russell JurneyRussell Jurney
    May 23, 2012 at 6:31 am
    May 26, 2012 at 7:53 am
  • Hi, lets say I have a large tuple or a bag and I want to see if one of the fields match a string. How would one do that? Similarly how do you apply a function to all the fields in a tuple? Thanks, ...
    Fabian AleniusFabian Alenius
    May 25, 2012 at 9:47 am
    May 25, 2012 at 10:29 am
  • Hi everybody, I'm trying to run some unit tests for a custom LOAD function that use MiniCluster. I get the following exception when running on pig 0.10.0 : For pig 0.8.1 ...
    Johannes SchwenkJohannes Schwenk
    May 24, 2012 at 1:36 pm
    May 24, 2012 at 4:56 pm
  • Hi Guys, I am writing data in hadoop using java client. The source of data for java client is a messaging data. The java client rotates files every 15 minutes. I use PigServer to submit map reduce ...
    Rakesh sharmaRakesh sharma
    May 23, 2012 at 6:15 pm
    May 23, 2012 at 7:01 pm
  • Hi All, I have written some UDF in Python script ( Now i want to register this python script with PIG-GRUNT. But when i registering this script i am getting below errors. Case 1: When i ...
    Manish BhogeManish Bhoge
    May 20, 2012 at 3:19 am
    May 20, 2012 at 3:43 am
  • I am trying to use the Jackson JSON mapper in a custom Eval Pig function like this: mapper.writeValueAsString(cityChartData); However, when I use my custom function in a pig script, pig errors out ...
    James NewhavenJames Newhaven
    May 18, 2012 at 4:18 pm
    May 18, 2012 at 4:52 pm
  • I'm trying to run 0.10 on EMR and noticed this open issue: HortonWork's release notes state that s3 is support. Is this the case or not? Regards, Dan
    Dan YoungDan Young
    May 16, 2012 at 6:39 pm
    May 16, 2012 at 7:14 pm
  • I have a few cases where I need to perform conditional statements like an "if then else ". I have used bash to help with that. How are others solving this problem? Thanks, Ranjith
    May 15, 2012 at 3:18 am
    May 15, 2012 at 3:50 am
  • Hi All, I am trying to read a file and manipulate it but I want to read only specific column, in load do I need to mention all the columns? File.txt Col1 Col2 Col3 Col4 load = 'File.txt' as ...
    krishnan Nkrishnan N
    May 14, 2012 at 11:55 pm
    May 15, 2012 at 12:24 am
  • Up to 10 people can skype in to the Pig hackday. Call apachepig :) -- Russell Jurney
    Russell JurneyRussell Jurney
    May 11, 2012 at 5:04 pm
    May 11, 2012 at 8:01 pm
  • Hi, I have data in a file which has schema, say like this: (stud_id, Physics, Chemistry, Bio, CS) I need to generate an output which should contain (stud_id , xml_payload) for each record in my ...
    May 8, 2012 at 5:54 pm
    May 9, 2012 at 1:45 am
  • Given a relation that contains this: ({(11),(9)}) ({(8),(7)}) Is it possible for me to SUM the contents of each bag so I get: (20) (15) Thanks, James
    James NewhavenJames Newhaven
    May 28, 2012 at 10:41 pm
    May 29, 2012 at 4:00 am
Group Navigation
period‹ prev | May 2012 | next ›
Group Overview
groupuser @
categoriespig, hadoop

72 users for May 2012

Jonathan Coveney: 25 posts Prashant Kommireddi: 17 posts Russell Jurney: 14 posts James Newhaven: 12 posts Dan Young: 11 posts Nerius Landys: 11 posts Johannes Schwenk: 10 posts DIPESH KUMAR SINGH: 9 posts Norbert Burger: 8 posts Bill Graham: 7 posts Jeremy Hanna: 7 posts Lulynn_2008: 7 posts Alan Gates: 6 posts Mohammad Tariq: 6 posts Chris Diehl: 5 posts Mohit Anchlia: 5 posts Raghu Angadi: 5 posts Shan s: 5 posts John Morrison: 4 posts krishnan N: 4 posts
show more