Search Discussions

45 discussions - 192 posts

  • Anyone have an implementation or ideas towards a StoreFunc for JDBC or MySQL? It looks like I would need to spawn a thread to read the InputStream and reparse the tuples. Overall, it sounds a little ...
    Anthony UrsoAnthony Urso
    Feb 7, 2010 at 12:24 pm
    Feb 8, 2010 at 8:48 pm
  • Hello, I seem to have broken my Pig install, and I don't know where to look. If I use directly the script (grunt) everything works ok, but every time I try to run a pig script: 'java -cp ...
    Alex ParvulescuAlex Parvulescu
    Feb 12, 2010 at 2:51 pm
    Feb 24, 2010 at 8:10 pm
  • Hello, I ran into a NPE today, which seems to be my fault, but I'm wondering if there anythig that could be done to make the error more clear. What I did it is: 'C = FOREACH B GENERATE group, ...
    Alex ParvulescuAlex Parvulescu
    Feb 9, 2010 at 10:22 am
    Feb 11, 2010 at 11:13 am
  • Is there a way to reuse a pig scripts ( like def:: in python or function calls etc) from inside a calling pig script. I have a set of basic pig script which I would like to call from a high-level ...
    Prasenjit mukherjeePrasenjit mukherjee
    Feb 10, 2010 at 2:49 am
    Feb 11, 2010 at 3:07 pm
  • Guys, I know this must be a common use case, but how do you explode and implode in pig? so, I have a file like this... 1, asdf 2, qewrty 3, zcxvb and I want to apply an explode operation to it: 1, a ...
    Hc busyHc busy
    Feb 19, 2010 at 6:21 pm
    Feb 22, 2010 at 10:41 pm
  • Is there any way I can have a pig statement wait for a condition.This is what I am trying to do : I am first creating and storing a relation in pig, and then I want to upload that relation via ...
    Prasenjit mukherjeePrasenjit mukherjee
    Feb 11, 2010 at 5:12 pm
    Feb 16, 2010 at 11:05 am
  • Hi All, We have a use-case where we want to automatically register certain jars for command-line users. I tried using ­jar, but this switch seems to do absolutely nothing. How do we go about ...
    Chris RiccominiChris Riccomini
    Feb 5, 2010 at 7:27 am
    Feb 5, 2010 at 10:15 pm
  • For some reason, I am unable to filter inside my nested foreach. The basic outline of my script is as follows: 1. Load input 1. 2. Load input 2. 3. Join input1 by key1, input2 by key2; 4. foreach ...
    Zaki rahamanZaki rahaman
    Feb 25, 2010 at 7:27 pm
    Mar 1, 2010 at 8:50 am
  • Hi, Hope this gets to the right list... I'm fairly new to Pig, been playing around with it for a couple of days. Essentially I'm doing a bit of work to evaluate Pig and its ability to simplify the ...
    Guy JefferyGuy Jeffery
    Feb 2, 2010 at 2:24 pm
    Feb 3, 2010 at 5:28 pm
  • Hi, I have this script, how to achive results ? ==================================================================================== A = LOAD 'file:///home/hadoop/1.csv' USING PigStorage(',') AS ...
    Feb 25, 2010 at 6:34 am
    Feb 25, 2010 at 8:09 am
  • Hi, I am running a pig script to process some webapp logs, and got this java heap error. The task logs look like: ... 2010-02-24 10:54:52,147 INFO org.apache.hadoop.mapred.ReduceTask: Read 2186251 ...
    Xiaomeng WanXiaomeng Wan
    Feb 24, 2010 at 6:28 pm
    Feb 24, 2010 at 7:37 pm
  • Excuse me I could have missed important part of PIG document and asked this trivial question here :) What is the best way to find out the total number of tuples (rows) in the bag of data loaded? For ...
    Jiang lichtJiang licht
    Feb 23, 2010 at 11:55 pm
    Feb 24, 2010 at 6:39 am
  • Hey, First off, @Ankur, great work so far on the patch. This probably is not an efficient way of doing mass dumps to DB (but why would you want to do that anyway when you have HDFS?), but it hits the ...
    Zaki rahamanZaki rahaman
    Feb 18, 2010 at 5:38 pm
    Feb 19, 2010 at 11:00 am
  • Just wondering if I can use the DEFINE command to write my custom mapper/reducer functions. Mapper ( I believe) I can, but what not sure about reducer. I guess this depends how the define commands ...
    Prasenjit mukherjeePrasenjit mukherjee
    Feb 18, 2010 at 8:48 am
    Feb 18, 2010 at 10:56 am
  • Any thoughts on this problem ? I am using a DEFINE command ( in PIG ) and hence the actions are not idempotent. Because of which duplicate execution does have an affect on my results. Any way to ...
    Prasenjit mukherjeePrasenjit mukherjee
    Feb 10, 2010 at 2:53 am
    Feb 10, 2010 at 4:50 pm
  • Hi , I have a small doubt in how pig handles queries containing join of more than 2 tables . Suppose we have 3 tables A,B,C .. and the plan is "((AB)C)" .. We can join A,B in a map reduce job and ...
    Bharath vBharath v
    Feb 3, 2010 at 6:53 am
    Feb 4, 2010 at 4:15 am
  • hi, i recently found pig, really like it and want to use it for one of our actual projects. getting the basics running was easy, but now i am struggling one a problem. i am trying to get customers ...
    Jan ZimmekJan Zimmek
    Feb 25, 2010 at 6:18 pm
    Mar 1, 2010 at 8:22 am
  • Any thoughts on including python-based UDFs like the following : http://arnab.org/blog/baconsnake-inlined-python-udfs-pig This will be a big help indeed. -Thanks, Prasen
    Prasenjit mukherjeePrasenjit mukherjee
    Feb 26, 2010 at 3:14 am
    Mar 1, 2010 at 8:09 am
  • I've been up and down the docs, and I see people using GZipped files. But when I try to load them, i get garbage. Basically it loads it as raw data from the local file system. test = LOAD ...
    Cory RadcliffCory Radcliff
    Feb 27, 2010 at 4:17 am
    Feb 27, 2010 at 6:56 pm
  • Hi, I'm facing what seem to be re-entrance errors when using PIG through the Java API. I know that the PigServer object is not reentrant, so I instantiate several PigServers and run them in separated ...
    Vincent BaratVincent Barat
    Feb 25, 2010 at 3:04 pm
    Feb 25, 2010 at 6:19 pm
  • Generally the stderr goes to the file <hadoop /logs/userlog/attempt_XXXX_XXXX_N/stderr in the hadoop node running that script. But it is not practical as it requires user to go and search all the ...
    Prasenjit mukherjeePrasenjit mukherjee
    Feb 23, 2010 at 11:52 am
    Feb 23, 2010 at 4:32 pm
  • I had a pig script which reads a folder of ".gz" files and perform some operation on the data. However, here's a problem. The folder contains some corrupted gz files and this causes the hadoop job ...
    Jiang lichtJiang licht
    Feb 21, 2010 at 6:47 am
    Feb 22, 2010 at 8:45 am
  • Could soemone please point out the mistake in UDF? package UDF; import java.io.IOException; import java.util.Map; import org.apache.pig.EvalFunc; import org.apache.pig.data.Tuple; import ...
    Kelvin MossKelvin Moss
    Feb 16, 2010 at 6:39 pm
    Feb 17, 2010 at 12:16 am
  • Hello, I am trying to FILTER and then ORDER an inner bag, for example: A = LOAD ...blah... AS (first, last, age, kids:bag{kid:tuple(name, age)}); B = FOREACH A { filteredkids = FILTER kids BY age != ...
    Rusty KlophausRusty Klophaus
    Feb 7, 2010 at 5:02 pm
    Feb 9, 2010 at 5:03 pm
  • I'm having a problem getting the SequenceFileLoader, from the Piggybank, to read sequence files whose values are block comressed (gzip'd). I'm using Pig, and Hadoop hadoop-0.20.1+152, via ...
    Derek BrownDerek Brown
    Feb 19, 2010 at 10:45 pm
    Feb 22, 2010 at 3:09 pm
  • Hi, I have a (hopefully) small request regarding JIRA. I quite like the Road Map feature[1] but unfortunately it doesn't work correctly for Pig as all versions (except 0.0.0) are set to ...
    Lars FranckeLars Francke
    Feb 11, 2010 at 3:22 am
    Feb 11, 2010 at 10:41 pm
  • Hello, I have a problem starting the grunt shell. I think this affects the 0.6 branch and forward. This is the error I get when I try to start the shell or when I try to run any script: ...
    Alex ParvulescuAlex Parvulescu
    Feb 9, 2010 at 9:55 am
    Feb 11, 2010 at 8:39 am
  • Ok, this might sound little weird. my schema is f1, f2, f3 ,f4, f5, f6 when group by f1, f2, f3. I need to drop exactly one tuple when I have more than one tuples by grouping f1,f2,f3. Also the ...
    Felix gaoFelix gao
    Feb 8, 2010 at 7:44 pm
    Feb 9, 2010 at 12:49 pm
  • The Seattle Hadoop/Scalability/NoSQL (yeah, we vary the title) meetup is tonight! We're going to have a guest speaker from MongoDB :) As always, it's at the University of Washington, Allen Computer ...
    Bradford StephensBradford Stephens
    Feb 24, 2010 at 10:16 pm
    Feb 25, 2010 at 8:01 am
  • well... I have this data: [key#'1', b#'2', c#'3', key2#5] [key#'2', b#'i', c#'m', key2#6] [key#'3', b#'j', c#'n', key2#7] [key#'4', b#'k', c#'o', key2#8] and I run A= load 'simple_map.data' as ...
    Hc busyHc busy
    Feb 24, 2010 at 8:15 pm
    Feb 24, 2010 at 9:59 pm
  • Hi there I am using the following version of pig: ~/workspace$ pig-test --version Apache Pig version 0.5.0 (r829623) compiled Oct 25 2009, 18:58:38 I expect the following simple script to reduce the ...
    Adil AijazAdil Aijaz
    Feb 23, 2010 at 11:52 pm
    Feb 24, 2010 at 8:19 pm
  • Hi, I would like to welcome Thejas Nair as our newest Pig committer. Thejas has been contributing to Pig for over a year now. He is the main contributor to Pig SQL effort. He also has done ...
    Olga NatkovichOlga Natkovich
    Feb 23, 2010 at 6:19 pm
    Feb 24, 2010 at 12:59 am
  • I'm writing my own LoadFunc which take parameters. I'm finding the only valid parameter type is String. I can't seem to pass an int. Are the parameter types for LoadFunc restricted to strings? I'm ...
    Robert GoodmanRobert Goodman
    Feb 19, 2010 at 11:58 pm
    Feb 20, 2010 at 12:33 am
  • Greetings, It's time for another awesome Seattle Hadoop/Lucene/Scalability/NoSQL Meetup! As always, it's at the University of Washington, Allen Computer Science building, Room 303 at 6:45pm. You can ...
    Bradford StephensBradford Stephens
    Feb 17, 2010 at 2:10 am
    Feb 19, 2010 at 8:24 pm
  • I have a range of values that can have an associated gender like 'm', 'f'. I want to include all distinct values that have the same gender across all records. Like if the records are - abc f abc m ...
    Kelvin MossKelvin Moss
    Feb 11, 2010 at 10:25 pm
    Feb 12, 2010 at 12:39 am
  • Hi Folks This note is to let you know that we'll be kicking off the inaugural Austin Hadoop User Group on March the 18th. At present, we have speakers lined up from IBM and Rackspace and will cover ...
    Stephen WattStephen Watt
    Feb 1, 2010 at 9:44 pm
    Feb 2, 2010 at 9:59 pm
  • Hi, I would like to welcome Dmitriy Ryaboy as yet another committer to Pig project! Dmitriy has been contributing consistently to Pig for the last eight months. He has been very active on the lists ...
    Olga NatkovichOlga Natkovich
    Feb 23, 2010 at 6:57 pm
    Feb 23, 2010 at 6:57 pm
  • I'm fairly new to Pig and am having a problem with a pig script that works fine in local mode, but fails in Hadoop mode. I'm using Cloudera CDH2, which includes Pig 0.5.0 and Hadoop 0.20.1. The line ...
    Jon ArmstrongJon Armstrong
    Feb 22, 2010 at 4:20 am
    Feb 22, 2010 at 4:20 am
  • The merge from load-store-redesign branch to trunk is now completed. New commits can now proceed on trunk. The load-store-redesign branch is deprecated with this merge and no more commits should be ...
    Pradeep KamathPradeep Kamath
    Feb 19, 2010 at 8:07 pm
    Feb 19, 2010 at 8:07 pm
  • Hi, I will begin this activity now - a request to all committers to not commit to trunk or load-store-redesign till I send an all clear message - I am anticipating this will hopefully be completed by ...
    Pradeep KamathPradeep Kamath
    Feb 18, 2010 at 7:21 pm
    Feb 18, 2010 at 7:21 pm
  • Hi, We would like to merge the load-store-redesign branch to trunk tentatively on Thursday. To do this, I would like to request all committers to not commit anything to load-store-redesign branch or ...
    Pradeep KamathPradeep Kamath
    Feb 16, 2010 at 7:34 pm
    Feb 16, 2010 at 7:34 pm
  • Hey guys! As some of you know from my blog (and occasional posts here), Drawn to Scale been building a complete end-to-end platform that makes dealing with data easy and scalable. You can Process, ...
    Bradford StephensBradford Stephens
    Feb 15, 2010 at 10:44 pm
    Feb 15, 2010 at 10:44 pm
  • Hadoop Fans, we have scheduled additional developer sessions in both the bay area and NYC. Also, due to popular demand, we'll be offering a public sysadmin training session immediately following our ...
    Christophe BiscigliaChristophe Bisciglia
    Feb 11, 2010 at 2:01 am
    Feb 11, 2010 at 2:01 am
  • Hi I am wondering if there is a way to make Pig stop writing zero byte files to output. Thanks Swaminathan
    P SwaminathanP Swaminathan
    Feb 10, 2010 at 4:49 pm
    Feb 10, 2010 at 4:49 pm
  • Hi everybody I have lots of logs in LZMA format. By the API documentation I haven't seen any Storage class that handles compressed files, does anyone know of an LZMA implementation? What would I need ...
    Gustavo Enrique Salazar TorresGustavo Enrique Salazar Torres
    Feb 3, 2010 at 6:07 pm
    Feb 3, 2010 at 6:07 pm
Group Navigation
period‹ prev | Feb 2010 | next ›
Group Overview
groupuser @
categoriespig, hadoop

49 users for February 2010

Dmitriy Ryaboy: 26 posts Ankur Goel: 14 posts Alex Parvulescu: 12 posts Jeff Zhang: 11 posts Prasenjit mukherjee: 11 posts Alan Gates: 9 posts Zaki Rahaman: 8 posts Jiang licht: 7 posts Neil Blue: 7 posts Hc busy: 6 posts Prasenjit mukherjee: 5 posts Bradford Stephens: 4 posts Gerrit van Vuuren: 4 posts Jumping: 4 posts Mridul Muralidharan: 4 posts Pradeep Kamath: 4 posts Rekha Joshi: 4 posts Ashutosh Chauhan: 3 posts Bharath v: 3 posts Chris Riccomini: 3 posts
show more