Grokbase Groups Pig user March 2011
FAQ

Search Discussions

79 discussions - 321 posts

  • Pig Users and Developers, We are starting to plan the work after Pig 0.9. One thing we need to decide is what name/number to give to the next release: Pig 0.10 or Pig 1.0. I believe that we are ready ...
    Olga NatkovichOlga Natkovich
    Mar 3, 2011 at 12:53 am
    Mar 4, 2011 at 1:49 pm
  • Hi, I am trying to extract data stored in HBase using Pig. I got Pig to work on Hadoop and then also ran a sample M/R job and got correct results. Now when I try to access HBase using the ...
    SULABHSULABH
    Mar 15, 2011 at 12:53 am
    Mar 15, 2011 at 11:39 pm
  • I'm currently getting a really weird error coming from one of my eval functions. It expects a tuple where the first element is a string and then outputs a Map<String,Object as a result. I put in some ...
    Xavier StevensXavier Stevens
    Mar 29, 2011 at 8:35 pm
    Mar 30, 2011 at 5:01 pm
  • Hi all, Does anybody know if a Percentile UDF exists at all, I've searched through the manual and the piggybank project, but can't seem to see one there. Many thanks, Jon.
    Jonathan HollowayJonathan Holloway
    Mar 10, 2011 at 2:58 pm
    Mar 12, 2011 at 12:40 am
  • I'm trying to use InvokeForString to call a simple static method that wraps http://mzsanford.github.com/twitter-text-java/docs/api/index.html https://github.com/twitter/twitter-text-java ... ...
    Dan BrickleyDan Brickley
    Mar 1, 2011 at 5:09 pm
    Mar 1, 2011 at 11:46 pm
  • Hi Guys, I read the sawzall <http://labs.google.com/papers/sawzall.html paper today and wonder if there are any others systems like pig and sawzall? Did anyone know others projects ? Thanks -- ...
    Charles GonçalvesCharles Gonçalves
    Mar 18, 2011 at 1:20 am
    Mar 18, 2011 at 3:34 pm
  • Hello, I read that it is good practice to declare the schema in Pig Script as well as in the UDF (by implementing outputSchema), because of performance reasons. Now in my case I have a EvalFunc that ...
    Lai WillLai Will
    Mar 9, 2011 at 9:55 am
    Mar 18, 2011 at 1:24 am
  • The following pig script runs fine without the 2GB memory setting (see in yellow). But fails with memory setting. I am not sure what's happening. It's a simple operation of joining one tuple(of 1 ...
    Paltheru, SrikanthPaltheru, Srikanth
    Mar 14, 2011 at 9:50 pm
    Mar 15, 2011 at 12:55 am
  • Hi guys, We had a lively discussion last week regarding what version number to assign to the major release following Pig 0.9. The discussion can be seen here: http://tinyurl.com/4ng8upa. Based on the ...
    Olga NatkovichOlga Natkovich
    Mar 7, 2011 at 11:21 pm
    Mar 14, 2011 at 7:57 pm
  • Running Hbase 0.20-0.20.3-1.cloudera - I've tried running this with Pig 0.8 from August 2010 and from trunk on March 25 2011. Do I need to use an older version? My pig script is trying to load from ...
    Jameson LoppJameson Lopp
    Mar 25, 2011 at 2:00 pm
    Mar 30, 2011 at 7:46 am
  • Hi, We recently updated our hadoop from CDH2 to CDH3b4, and had problems using some old python udfs. Runing in local mode still works, but in hadoop mode, it gives errors like "could not instantiate ...
    Xiaomeng WanXiaomeng Wan
    Mar 31, 2011 at 10:07 pm
    Apr 4, 2011 at 3:37 pm
  • Hello, I wrote a EvalFunc implementation that 1) Parses a SQL Query 2) Scans a folder for resource files and creates an index on these files 3) According to certain properties of the SQL Query ...
    Lai WillLai Will
    Mar 2, 2011 at 1:30 pm
    Mar 2, 2011 at 5:30 pm
  • I want to do something really simple: I want to pass a string into a Pig script. The string is either "stdout" or some target file name. Say the string gets bound to $OUTPUT. That all works fine. ...
    Andreas PaepckeAndreas Paepcke
    Mar 15, 2011 at 5:41 am
    Apr 20, 2011 at 9:14 pm
  • I have these "rows" ({(155495400)}) ({(199027860),(199027860),(149167529),(203508790),(198488630)}) ({(174255619),(201077556),(199051606),(198778302)}) I believe the correct way to explain them would ...
    MarkMark
    Mar 31, 2011 at 3:50 pm
    Apr 1, 2011 at 7:14 pm
  • Hi, Below are list of tuples generated after flattening a bag . (day, age, name, address, ['k1#v1','k2#v2']), (12/2,22,deepak,newyork, ['k1#v1','k2#v2']), (12/3,22,deepak,newjersy, ['k1#v1','k2#v2']) ...
    Deepak kumar vDeepak kumar v
    Mar 17, 2011 at 6:29 am
    Mar 29, 2011 at 8:35 am
  • We do some processing in hadoop then as the last step, we write the result to database. Database is not good at handling hundreds of concurrent connections and fast writes. So we need to throttle ...
    Dexin WangDexin Wang
    Mar 17, 2011 at 6:04 pm
    Mar 25, 2011 at 11:04 pm
  • Hi all, I wrote a simple udf DicomParser which read a line and convert it to tuple but when I tried to use like that register H:/apps/mypig/mypigudf.jar; A = load 'dicoms/' using ...
    Baraa MohamadBaraa Mohamad
    Mar 22, 2011 at 6:42 pm
    Mar 22, 2011 at 7:43 pm
  • Is there a way to use STORE with variable or some other way to achieve what I need. I have something like this: grunt DESCRIBE A; A: {f1, f2, f3, ...} grunt DUMP A; (v1, x2, x3, ...) (v2, x4, x5, ...
    Dexin WangDexin Wang
    Mar 8, 2011 at 8:30 pm
    Mar 10, 2011 at 8:22 pm
  • So I queued up a batch of jobs last night to run overnight (and into the day a bit, owing to to a bottleneck on the scheduler the way that things are currently implemented), made sure they were ...
    Kris CowardKris Coward
    Mar 8, 2011 at 10:53 pm
    Mar 10, 2011 at 1:29 am
  • I see that there are a few LoadCaster implementations in pig 0.8. There's the Utf8StorageConverter, the HBaseBinaryConverter, and a couple of others. The HBaseStorage class uses the ...
    Jeremy HannaJeremy Hanna
    Mar 24, 2011 at 6:12 pm
    Mar 24, 2011 at 10:25 pm
  • Hello, I have some LZO files, which i a) indexed via DistributedLzoIndexer to create index files b) did not index, so just some LZO files in a directory. Using both approaches, I tried creating a ...
    Saptarshi GuhaSaptarshi Guha
    Mar 21, 2011 at 8:11 pm
    Mar 22, 2011 at 4:29 am
  • How can I add comments in my pig script? I tried # but whenever that gets ran in the pig console it complains.
    MarkMark
    Mar 11, 2011 at 6:09 am
    Mar 11, 2011 at 5:43 pm
  • I have a rather large query that took quite a while to execute (11hours, probably on the order of 70B rows), and while the job tracker website we have seems to indicate that the query finished, here ...
    Jonathan CoveneyJonathan Coveney
    Mar 8, 2011 at 2:51 pm
    Mar 10, 2011 at 9:36 pm
  • Hello, First of all thank you all, I really appreciate the help I get here :) I've now written some UDFs and did some successful local runs. When copying the date to HDFS and trying to run my pig ...
    Lai WillLai Will
    Mar 3, 2011 at 2:09 pm
    Mar 3, 2011 at 9:00 pm
  • Is there a standard way to get jline and commons-lang into pig? I work around by copying them into my build/ivy/lib/Pig directory but didn't know if there was a simpler way I was just overlooking. ...
    Jeremy HannaJeremy Hanna
    Mar 28, 2011 at 6:33 pm
    Apr 1, 2011 at 2:18 pm
  • Hey all, I am wondering what the dependencies in the lib folder of pig are used for, i.e. automation, hbase, and zookeeper. If we are not working with hbase at all, do we need them to run pig ...
    Jeffrey WangJeffrey Wang
    Mar 31, 2011 at 2:49 am
    Mar 31, 2011 at 7:09 pm
  • Hi, I am trying to write a loader which can append the input path as a field, something like this: hadoop fs -ls a/ 1.txt 2.txt 3.txt a = load 'a/*' using MyLoader() as (id, path); dump a; ...
    Xiaomeng WanXiaomeng Wan
    Mar 28, 2011 at 8:45 pm
    Mar 29, 2011 at 8:11 pm
  • Just trying to run simple example using pig 0.8.0 and Cloudera Hadoop CDH3. I set all my PIG_X variables. Any idea what may be wrong? export PIG_RPC_PORT=9160 export PIG_INITIAL_ADDRESS=localhost ...
    MarkMark
    Mar 17, 2011 at 3:33 am
    Mar 29, 2011 at 6:21 pm
  • Hi, We've seen a strange problem where some Pig jobs would just run fewer mappers concurrently than the mapper capacity. Specifically we have a 10 node cluster and each is configured to have 12 ...
    Dexin WangDexin Wang
    Mar 23, 2011 at 8:40 pm
    Mar 24, 2011 at 12:59 am
  • I've got a general question surrounding the output of various Pig scripts and generally where people are storing that data and in what kind of format? I read Dmitriy's article on Apache log ...
    Jonathan HollowayJonathan Holloway
    Mar 23, 2011 at 7:15 pm
    Mar 23, 2011 at 11:50 pm
  • Hello, The data I want to process is XML. It boils down to <element ... </element <element ... </element According to what I read in the documentation. When loading the file using the default Slicer, ...
    Lai WillLai Will
    Mar 1, 2011 at 8:46 pm
    Mar 23, 2011 at 7:39 pm
  • First off, I am fairly new to both pig and Hadoop. I am having some problems connecting pig to a local hadoop cluster. I am getting the following error in the hadoop namenode logs whenever I try and ...
    Dan HendryDan Hendry
    Mar 21, 2011 at 9:57 pm
    Mar 22, 2011 at 8:12 pm
  • Hi, I tried to use exec to stop multiquery optimizer from combining too many actions together, which will result in heap space problem. But it seems multiquery just ignores exec, and still combines ...
    Xiaomeng WanXiaomeng Wan
    Mar 17, 2011 at 5:48 pm
    Mar 17, 2011 at 8:59 pm
  • Hi, If in a UDF , say in the constructor of the class, i initialize a list (say ArrayList<String namesList) of objects(say names). And in the exec() method , I do some processing. When I am using ...
    Souri dattaSouri datta
    Mar 17, 2011 at 6:13 pm
    Mar 17, 2011 at 6:49 pm
  • Hey folks, i still try to setup elephant bird in pig. I am using the pig-08 branch of dvryaboy. i managed to create my example loader using the pig8.util.ThriftToPig my pig code looks like this.. ...
    Torben BrodtTorben Brodt
    Mar 16, 2011 at 10:24 am
    Mar 17, 2011 at 6:34 pm
  • Hey guys, I'm still seeing references (http://wiki.apache.org/pig/PiggyBank) to PiggyBank being in the contrib module in SVN. What is the official PiggyBank repo at the moment: ...
    Josh DevinsJosh Devins
    Mar 13, 2011 at 9:35 am
    Mar 14, 2011 at 6:32 pm
  • Hi all, I'm working with some data at the moment, for which I needed to generate multiple reports for a given grouped set of data by name. I wasn't initially sure about how to do this, I came across ...
    Jonathan HollowayJonathan Holloway
    Mar 31, 2011 at 4:13 pm
    Mar 31, 2011 at 11:02 pm
  • Is it possible to do a group concat with pig. i've been trying with no success. basically the data is as follows 1234|test1 1234|test2 1234|test3 1244|test4 1244|test5 etc etc i'm trying to come up ...
    Mike st. johnMike st. john
    Mar 28, 2011 at 9:40 pm
    Mar 29, 2011 at 1:48 am
  • Hi all, I have a problem where I need to limit the number of results generated by pig script based on some condition. say, if ( $x == 0 ) then do not limit #results else: limited_result = LIMIT ...
    Souri dattaSouri datta
    Mar 27, 2011 at 7:37 pm
    Mar 28, 2011 at 6:48 pm
  • Hi, Has anyone seen the following? I am getting an error when running ORDER: ERROR 1071: Cannot convert a Unknown to a String The error occurs in DataType.java:885. At the end of that switch ...
    Andreas PaepckeAndreas Paepcke
    Mar 25, 2011 at 5:44 pm
    Mar 25, 2011 at 7:06 pm
  • Wondering if someone has reported this bug in pig 0.8 (maybe it's been fixed?) data.txt (tab seperated file, bad site has no canonical_url populated): badsite.com 127.0.0.1 goodsite.com/1?foo=true ...
    Corbin HoenesCorbin Hoenes
    Mar 24, 2011 at 10:13 pm
    Mar 25, 2011 at 1:34 am
  • Hello, I'm currently encountering following problem. I have a xml file that gets loaded using a custom LoadFunc. Boiled down my xml file could look like: <files <file <id 1 </id <text This is a ...
    Lai WillLai Will
    Mar 22, 2011 at 3:28 pm
    Mar 22, 2011 at 7:23 pm
  • Hello there, I want to write a UDF in java so I tried to add pig to eclipse but I got always thousands of errors most of them are like this one *The declared package ...
    Baraa MohamadBaraa Mohamad
    Mar 22, 2011 at 1:52 pm
    Mar 22, 2011 at 6:24 pm
  • Hi, I want to iterate through the fields in a tuple and then pass each field to a FILTER statement. Does anybody know how I would go about doing this? Many thanks, Jon.
    Jonathan HollowayJonathan Holloway
    Mar 18, 2011 at 7:02 pm
    Mar 18, 2011 at 8:49 pm
  • Hey folks, i am using the pig-0.8 branch from dvryaboy repository and Pig (0.8.0+5-1~maverick-cdh3b4) from the cloudera distribution. i just want to run the example "json_word_count.pig" Well it ...
    Torben BrodtTorben Brodt
    Mar 14, 2011 at 6:34 pm
    Mar 15, 2011 at 8:21 am
  • I've been playing with pig this week and I'm running into an issue that seems like it should be trivial. I'm basically reading data from hbase and and performing a count of sessions associated with a ...
    Keric DonnellyKeric Donnelly
    Mar 11, 2011 at 4:36 pm
    Mar 14, 2011 at 1:17 pm
  • Sorry if butcher the terminology I'm still new to Pig but Ill try my best. Given a bag of tuples how can I create a flattened version of all the tuples? For example say I have {(1), (2), (3)} how can ...
    MarkMark
    Mar 12, 2011 at 5:26 am
    Mar 12, 2011 at 6:25 am
  • I thought I read somewhere that Pig has an output format that can write to Cassandra but I am unable to find any documentation on this. Is this possible and if so can someone please point me in the ...
    MarkMark
    Mar 11, 2011 at 4:18 am
    Mar 11, 2011 at 8:07 pm
  • I ran into an issue tonight with parsing log lines whereby I had to generate a schema in a user defined function. Part of that involved converting various values into their associated data types, but ...
    Jonathan HollowayJonathan Holloway
    Mar 10, 2011 at 11:04 pm
    Mar 10, 2011 at 11:47 pm
  • Hi, I noticed that Pig 0.8 runs on Hadoop 0.20.2. Is there any plan to upgrade to 0.21? Thanks, Jane
    Jane ChenJane Chen
    Mar 2, 2011 at 11:25 pm
    Mar 3, 2011 at 12:42 am
Group Navigation
period‹ prev | Mar 2011 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions79
posts321
users65
websitepig.apache.org

65 users for March 2011

Dmitriy Ryaboy: 46 posts Alan Gates: 23 posts Lai Will: 19 posts Daniel Dai: 16 posts Dexin Wang: 10 posts Jonathan Coveney: 10 posts Jonathan Holloway: 10 posts Kris Coward: 10 posts Baraa Mohamad: 8 posts Mark: 8 posts Thejas M Nair: 8 posts Andreas Paepcke: 7 posts Josh Devins: 7 posts Xiaomeng Wan: 7 posts Charles Gonçalves: 6 posts Deepak kumar v: 6 posts Olga Natkovich: 6 posts Souri datta: 6 posts Sulabh choudhury: 6 posts Xuefu Zhang: 6 posts
show more