Grokbase Groups Pig user January 2013
FAQ

Search Discussions

46 discussions - 166 posts

  • Hey guys, I am trying to do the following: 1. Launch a pig job asynchronously via Java program 2. Get a notification once the job is complete (something similar to Hadoop callback with a servlet) I ...
    Prashant KommireddiPrashant Kommireddi
    Jan 24, 2013 at 12:49 am
    Feb 6, 2013 at 12:31 am
  • Been coding with the APIs and wondering if there is anything that allows you to only retrieve the operators, I/O paths etc without actually issuing an execute or a store? Basically, being able to get ...
    Prashant KommireddiPrashant Kommireddi
    Jan 22, 2013 at 1:27 am
    Jan 23, 2013 at 7:25 am
  • Hi, Any ideas on how to make Pig run quicker when running it in local mode ? I'm processing 3 files of about 13MB each with 3 group by statements in my script which seem to suck up the time. There's ...
    Malcolm TyeMalcolm Tye
    Jan 4, 2013 at 5:35 pm
    Jan 21, 2013 at 2:02 pm
  • When using JsonLoader with Pig 0.10.0 if I have an input.json file that looks like this: {"date": "2007-08-25", "id": 16} {"date": "2007-09-08", "id": 17} {"date": "2007-09-15", "id": 18} And I use a ...
    Tim SellTim Sell
    Jan 7, 2013 at 7:57 pm
    Apr 5, 2013 at 12:51 am
  • This wasn't a problem in 0.9.2, but in 0.10, when I try to access a key in a map that has a dollar sign in it, I get hammered with errors that I haven't defined the variable. Specifically: blah = ...
    Eli FinkelshteynEli Finkelshteyn
    Jan 11, 2013 at 2:01 am
    Jan 22, 2013 at 10:54 pm
  • Avro Schema with int field { "type" : "record", "name" : "employee", "fields":[ {"name" : "name", "type" : "string", "default" : "NU"}, {"name" : "age", "type" : "int","default" : 0}, {"name" ...
    Milind VaidyaMilind Vaidya
    Jan 10, 2013 at 7:41 pm
    Jan 10, 2013 at 8:36 pm
  • Hi All, I am trying to generate input files using Java. I have raw data in a CSV file, which Java reads and then uses GenericDatum/Record to create Avro files. The avro file is valid as it is parsed ...
    Meghana narasimhanMeghana narasimhan
    Jan 26, 2013 at 3:34 am
    Jan 26, 2013 at 5:21 am
  • Dear all, We are using thrift and elephant-bird to store our logs. And I wanted to use some UDF to do complex processing on a single record, so I write some pig like the following ...
    Stanley XuStanley Xu
    Jan 22, 2013 at 6:51 am
    Jan 24, 2013 at 3:57 am
  • Environment: Pig version: 0.11 Hadoop 0.23.6.0.1301071353 Script: REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar REGISTER /homes/immilind/HadoopLocal/Jars/jackson-all-1.8.10.jar REGISTER ...
    Milind VaidyaMilind Vaidya
    Jan 9, 2013 at 2:50 pm
    Jan 9, 2013 at 10:38 pm
  • Hi, In my own UDF, is reference a field by index the only way to access a field? The fields are all named and typed before passing into UDF but looks like I can only do something like this: String v1 ...
    Dexin WangDexin Wang
    Jan 15, 2013 at 10:16 pm
    Feb 3, 2013 at 6:04 am
  • I'm new to Pig, and it looks like there is no provision to declare relations inline in a Pig script (without LOADing from an external file)? Based on ...
    Michael MalakMichael Malak
    Jan 18, 2013 at 6:49 pm
    Jan 24, 2013 at 10:20 pm
  • Hi there, I have the following data 4 {(1,abc),(2,cde),(5,efg)} 2 {(1,foo),(2,bar),(5,baz)} 7 {(1,bounce),(2,frotz),(5,trotz)} what I finally want to achieve is a list of all strings related to the ...
    Thomas BachThomas Bach
    Jan 22, 2013 at 11:56 am
    Jan 23, 2013 at 2:33 pm
  • Hello users, I have an input file (1.2 MB) which contains list of words/phrases in every new line. I am reading each phrase per line and passing it to udf to correct/check that phrase. The udf ...
    Dipesh Kumar SinghDipesh Kumar Singh
    Jan 13, 2013 at 12:48 pm
    Jan 15, 2013 at 6:23 pm
  • AbhishekAbhishek
    Jan 14, 2013 at 10:22 pm
    Jan 15, 2013 at 5:37 pm
  • REGISTER /homes/immilind/HadoopLocal/Jars/avro-1.7.1.jar REGISTER /homes/immilind/HadoopLocal/Jars/piggybank.jar employee= load '/user/immilind/AvroData' USING ...
    Milind VaidyaMilind Vaidya
    Jan 11, 2013 at 4:13 pm
    Jan 11, 2013 at 9:05 pm
  • Is it possible to declare a schema when doing a LOAD for data in which you do not know the total number of columns? For instance. I know the data contains 6 or more columns. These columns are of the ...
    Chan, TimChan, Tim
    Jan 7, 2013 at 10:19 pm
    Jan 8, 2013 at 2:48 am
  • Hi, I remember a while back that there was a setting introduced to allow a pig job to either insert a null, or drop a row, instead of aborting execution, when a cast failed. Of course, now that I ...
    Kris CowardKris Coward
    Jan 12, 2013 at 12:05 am
    Feb 1, 2013 at 5:57 pm
  • We're looking into using Zebra for merge joins, and noticed that the link for it in the 0.10.0 Pig docs (http://pig.apache.org/docs/r0.10.0/) no longer exists. Is there an implication here, or is ...
    Devon CrouseDevon Crouse
    Jan 29, 2013 at 12:56 am
    Feb 1, 2013 at 5:53 am
  • I've been using DBStorage and noticed something strange with one of my scripts. I had to use -no_multiquery to get the records to commit to mysql. I could see insert statements coming into my server ...
    Corbin HoenesCorbin Hoenes
    Jan 15, 2013 at 5:31 pm
    Feb 1, 2013 at 3:04 am
  • i am trying to use a MAX function of fieldA of a group and return another fieldB associated with the record that the function returned; however from what i have done so far i get the MAX fieldA value ...
    Matthew PurdyMatthew Purdy
    Jan 30, 2013 at 8:14 pm
    Jan 30, 2013 at 10:29 pm
  • Hi All, When there are too many nested bincond operators (more than 10), it's frozen there and will not run. Can not pig parse or is there a limitation? Thanks, Dongliang
    Dongliang SunDongliang Sun
    Jan 29, 2013 at 4:22 am
    Jan 30, 2013 at 10:35 am
  • I have data that looks like this: a e 11 0 b f 2 2 c g 3 3 c h 44 44 c i 75 0 d j 89 0 d k 120 0 d l 3000 0 and I load it like so: data = load 'test.txt' using PigStorage(' ') as (cid:chararray, ...
    Uri LasersonUri Laserson
    Jan 23, 2013 at 1:18 am
    Jan 24, 2013 at 9:01 am
  • I have tuple like so: (a: (b:int, c:int, d:int, e:int)) I would like to call a UDF and pass a range of the nested tuple. This is what I would expect the command to be: FOREACH alias GENERATE ...
    Uri LasersonUri Laserson
    Jan 22, 2013 at 8:42 pm
    Jan 22, 2013 at 10:40 pm
  • Hi everyone, I'm trying to determine the best way for all of my scripts to have shared initialization statements like jar register commands, default variable declarations, etc. and I'm not sure what ...
    Eric CzechEric Czech
    Jan 22, 2013 at 4:18 pm
    Jan 22, 2013 at 10:25 pm
  • Hi, I have two relations - A and B. Both just contain user ids. I want to get a list of users who are in A but not in B. I am running Pig 0.9.1 and think this might be possible with the DIFF ...
    James NewhavenJames Newhaven
    Jan 22, 2013 at 12:46 pm
    Jan 22, 2013 at 4:37 pm
  • Hi, We are using HBaseStorage intensively to load data from tables having more than 100 regions. HBaseStorage generates 1 map par region, and our cluster having 50 map slots, it happens that our PIG ...
    Vincent BaratVincent Barat
    Jan 21, 2013 at 1:27 pm
    Jan 21, 2013 at 2:16 pm
  • Hi, my pig script is going to produce a set of files that will be an input for a different process. The script would be running periodically so the number of files would be growing. I would like to ...
    Jakub GlapaJakub Glapa
    Jan 17, 2013 at 10:11 pm
    Jan 18, 2013 at 10:34 am
  • let's say I have an input dataset, each row has 2 fields, the first field is a value among 100 possible values. I want to just split the input dataset into 100 outputs , based on the value of the ...
    YangYang
    Jan 9, 2013 at 10:38 pm
    Jan 18, 2013 at 12:26 am
  • I have TSVs with a lot of columns, and I would like to address them by name, as specified in the header line (first row), within Pig. The best I can come up with a.t.m is to write a script that ...
    MasonMason
    Jan 15, 2013 at 10:23 pm
    Jan 15, 2013 at 11:18 pm
  • Hi to all, I am looking for a way to load a lucene index with Pig Latin. An old post on internet from 2008 said that it is possible and at yahoo they did it but it is not open source. Do you know if ...
    Zeynep PEHLIVANZeynep PEHLIVAN
    Jan 4, 2013 at 1:39 pm
    Jan 8, 2013 at 7:39 am
  • I need to run these things as Pig boots, and have them work in Grunt: /* Setup for Piggybank */ %default PIGGYBANK_LIB '/me/Software/pig/contrib/piggybank/java' REGISTER $PIBBYBANK_LIB/piggybank.jar ...
    Russell JurneyRussell Jurney
    Jan 5, 2013 at 9:49 pm
    Jan 5, 2013 at 9:53 pm
  • Hi all, Is there any reason why the Piggybank JAR is not available on Maven repositories ? I dug both Maven Central and the Cloudera repo and have not been able to find it. I believe that it would be ...
    Clément MATHIEUClément MATHIEU
    Jan 30, 2013 at 4:32 pm
    Jan 31, 2013 at 1:09 am
  • Offending code: @outputSchema("token:chararray") def remove_punctuation(self, token): punctuation = re.compile(r'[-.@&$#`\'?!, </\\":;()|]') words = list() word = punctuation.sub("", token) if word ...
    Russell JurneyRussell Jurney
    Jan 29, 2013 at 5:11 pm
    Jan 29, 2013 at 7:15 pm
  • I am working on a Pig Script. One of the operations is computing a quantile using Datafu's StreamingQuantile, and then using this value to filter a relation. If I enter the commands one by one in the ...
    Uri LasersonUri Laserson
    Jan 25, 2013 at 9:23 pm
    Jan 25, 2013 at 9:40 pm
  • ... I've narrowed down the problem and it looks like the /tmp is to small. I wanted to set the java.io.tmpdir to point somewhere else like that: pig -Djava.io.tmpdir=/foo/tmp/ script.pig but it ...
    Jakub GlapaJakub Glapa
    Jan 25, 2013 at 3:37 pm
    Jan 25, 2013 at 6:48 pm
  • Hi All, Currently I encounter one problem when I run the python streaming. I import a third-party module 'Pandas'. It's successful when I directly run the python code. Also successful when run the ...
    Dongliang SunDongliang Sun
    Jan 23, 2013 at 5:59 am
    Jan 23, 2013 at 2:20 pm
  • Hi, is there a way to get access to the params passed with the pig command in the python code? pig -p param1=val1 -param_file=filepath script.py Based on this ...
    Jakub GlapaJakub Glapa
    Jan 23, 2013 at 1:57 pm
    Jan 23, 2013 at 2:07 pm
  • Hello Pig Users, As the title indicates, I want to insert a new tuples into existing Zebra table. I have the table patientData/CG0/ which already has some tuples/rows. I tried to use the Table ...
    Baraa MohamadBaraa Mohamad
    Jan 17, 2013 at 1:10 am
    Jan 21, 2013 at 2:24 pm
  • Hi guys, I am having a JodaTime maven version issue. I have a Java UDF in the form of a Maven project with this dependency: <dependency <groupId joda-time</groupId <artifactId joda-time</artifactId ...
    Ruslan Al-FakikhRuslan Al-Fakikh
    Jan 20, 2013 at 4:27 pm
    Jan 20, 2013 at 8:09 pm
  • hi all how to define delimite in hadoop pig stream through? Code is as follows define X `actionList.py`ship('/proxy/macid2uid/actionList.py') us ing PigStorage(','); schemeData = stream rawLog ...
    Centerqi huCenterqi hu
    Jan 8, 2013 at 3:26 am
    Jan 8, 2013 at 9:10 pm
  • Hi, I want to collect the pig job's script context that the jobs commited in the hadoop cluster. And I find the pig.PigContext in the mapreduce userjob log ‘xxx_conf.xml’. My Question: 1,Is this item ...
    Jameson LiJameson Li
    Jan 30, 2013 at 10:25 am
    Jan 30, 2013 at 10:25 am
  • Pig users and contributors, We decide to host a Pig contributor meetup on Feb 7th, 2013 (Thursday) 2pm at Hortonworks (3460 W Bayshore Rd Palo Alto, CA 94303). Please find details below. Thanks, ...
    Daniel DaiDaniel Dai
    Jan 21, 2013 at 4:05 pm
    Jan 21, 2013 at 4:05 pm
  • Hi guys, When runnig Pig I have a lot of WARNs like these: 2013-01-20 19:09:21,318 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS ...
    Ruslan Al-FakikhRuslan Al-Fakikh
    Jan 20, 2013 at 4:13 pm
    Jan 20, 2013 at 4:13 pm
  • http://www.dermoesteticaciavola.it/jdnivg.php
    Jieru ShiJieru Shi
    Jan 15, 2013 at 2:23 pm
    Jan 15, 2013 at 2:23 pm
  • Pig team is happy to announce Pig 0.10.1 release. Apache Pig provides a high-level data-flow language and execution framework for parallel computation on Hadoop clusters. More details about Pig can ...
    Daniel DaiDaniel Dai
    Jan 6, 2013 at 11:47 pm
    Jan 6, 2013 at 11:47 pm
  • When is Pig 0.11 slated for starting the RC process? -- Russell Jurney twitter.com/rjurney <span class="m_body_email_addr" title="6c6205448de1b8ad8d5baf0f3416d899" russell.jurney@gmail.com</span ...
    Russell JurneyRussell Jurney
    Jan 5, 2013 at 8:50 pm
    Jan 5, 2013 at 8:50 pm
Group Navigation
period‹ prev | Jan 2013 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions46
posts166
users54
websitepig.apache.org

54 users for January 2013

Cheolsoo Park: 21 posts Jonathan Coveney: 13 posts Prashant Kommireddi: 10 posts Russell Jurney: 10 posts Dmitriy Ryaboy: 9 posts Milind Vaidya: 7 posts Alan Gates: 6 posts Bill Graham: 6 posts Jakub Glapa: 5 posts Uri Laserson: 5 posts Meghana narasimhan: 4 posts Thomas Bach: 4 posts Dongliang Sun: 3 posts Eli Finkelshteyn: 3 posts Malcolm Tye: 3 posts Tim Sell: 3 posts Vitalii Tymchyshyn: 3 posts Baraa Mohamad: 2 posts Chan, Tim: 2 posts Clément MATHIEU: 2 posts
show more