FAQ

Search Discussions

58 discussions - 223 posts

  • I have some JSON data with a uniform schema. I want to load it in Pig. JsonStorage doesn't work, because the data has no schema. How can I load JSON data in Pig? -- Russell Jurney twitter.com/rjurney ...
    Russell JurneyRussell Jurney
    Nov 17, 2012 at 10:10 pm
    Feb 6, 2013 at 8:47 pm
  • I'd like to call a UDF to evaluate the value of a MACRO parameter, but when I do something like: signals_in = load_recent_signals(TimebucketToDatePartition(1351612800000L), ...
    Timothy PotterTimothy Potter
    Nov 20, 2012 at 9:02 pm
    Nov 26, 2012 at 3:50 pm
  • Hi, I'm running Pig 0.10.0 in local mode on some small text files. There is no intention to run it on Hadoop at all. We have a job that runs every 5 minutes and about 3% of the time, the job fails ...
    Malcolm TyeMalcolm Tye
    Nov 12, 2012 at 3:14 pm
    Nov 21, 2012 at 11:59 am
  • I am trying to use pig 0.11 and pig trunk (currently 0.12) because pig 0.10 seems to be having issues with python udf... According to this ...
    Michał CzerwińskiMichał Czerwiński
    Nov 12, 2012 at 4:48 pm
    Nov 13, 2012 at 5:35 pm
  • Hi all, The whole pig UT last for nearly 8 hours, and TestEvalPipeline2 last for 37 minutes. My questions are: how long pig UT will last in normal? Do we have jenkins for pig UT? If yes, please ...
    Lulynn_2008Lulynn_2008
    Nov 14, 2012 at 2:28 am
    Nov 21, 2012 at 1:12 pm
  • hey all, Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks. I get errors such as Error ...
    Ingvay7Ingvay7
    Nov 13, 2012 at 4:12 pm
    Nov 13, 2012 at 6:50 pm
  • Greetings, Is there a way, other than "su", to run pig script as a different user? Thanks, -- Miki
    Miki TebekaMiki Tebeka
    Nov 29, 2012 at 12:58 am
    Dec 11, 2012 at 5:07 pm
  • hi all, I'm using Pig 0.9.2 (Apache Pig version 0.9.2-cdh4.0.1, precisely) I got a case today on which I needed to clean up some fields before processing. I will need to do the same for all my ...
    PablomarPablomar
    Nov 16, 2012 at 8:48 pm
    Nov 20, 2012 at 12:38 am
  • looks a more intuitive result should be "something" , right? but on my system it gave null
    YangYang
    Nov 2, 2012 at 10:09 pm
    Nov 5, 2012 at 6:02 pm
  • Attached is a tiny testcase illustrating my problem. What I would like to know is how to filter by Pig datatype. e.g. something like: filtered = FILTER some_data BY some_variable IS_MAP_TYPE; Can ...
    Lex HLex H
    Nov 22, 2012 at 1:55 am
    Nov 23, 2012 at 2:15 am
  • hello I just have came across a problem with SpillableMemoryManager. I've searched lots of discussion contained this key, but they are all different from my problem. The problem is When I run a pig ...
    W WW W
    Nov 1, 2012 at 9:59 am
    Nov 18, 2012 at 4:32 pm
  • I'm trying to play around with Amazon EMR, and I currently have self hosted Cassandra as the source of data. I was going to try to do: Cassandra - S3 - EMR. I've traced my problems to PigStorage. At ...
    William ObermanWilliam Oberman
    Nov 6, 2012 at 8:20 pm
    Nov 6, 2012 at 10:01 pm
  • Peace be on you, Is there a way to reconstruct the Physical Plan from the PhyPlanVisitor ? Or I have to do it manually be getting it's keys -and their entrySets (edges)- and leaves ? Any help would ...
    Sarah MohamedSarah Mohamed
    Nov 24, 2012 at 12:54 am
    Nov 29, 2012 at 1:17 am
  • I normally deal with very large tuples with many fields. Its a pain to deal with these in python udfs since I can't figure out a way to input schemas into the udf. I have to hard code the column ...
    Martin GoodsonMartin Goodson
    Nov 14, 2012 at 4:18 pm
    Nov 16, 2012 at 7:21 pm
  • Hi , I used the distributed cache in the hadoop though the "setup" and "static" store an hashset in the mem; and I try to use the distributed cache in the Pig, and I don't know how to store an ...
    Yingnan.maYingnan.ma
    Nov 13, 2012 at 7:46 am
    Nov 16, 2012 at 9:32 am
  • Hi All, In October, I decided not to waste my time longer by reading football news, so that I started to read posts from Apache Pig user mailing lists instead to learn more! ;) My motto was simple ...
    Adam KawaAdam Kawa
    Nov 14, 2012 at 3:46 pm
    Nov 15, 2012 at 8:19 pm
  • Hi I'm using embeded Pig to implement graph algorithm. It is fine when I worked in local mode, but when I worked on hadoop cluster, there always popped up some error message like: (Please see the ...
    Jieru ShiJieru Shi
    Nov 24, 2012 at 3:20 am
    Dec 4, 2012 at 6:26 pm
  • Hi guys, this is not really a question. I want to know is this book (Programming Pig)<http://www.amazon.com/Programming-Pig-Alan-Gates/dp/1449302645/ outdated? The book says it is based on 0.8 with ...
    Majid AzimiMajid Azimi
    Nov 16, 2012 at 9:41 am
    Nov 16, 2012 at 5:23 pm
  • Hi All, Can I pass in a boolean value to Pig UDF constructor with Pig 0.9.2? I have a constructor : public GenStartEndDate(boolean mtdNoGlob) { this.mtdNoGlob = mtdNoGlob; } I am instantiating it in ...
    Meghana narasimhanMeghana narasimhan
    Nov 8, 2012 at 6:30 pm
    Nov 8, 2012 at 6:43 pm
  • Is the Physical Plan binary tree ? (i.e. Could any node have more than two Physical Operators child ?) -- Regards, Sarah M. Hassan
    Sarah MohamedSarah Mohamed
    Nov 18, 2012 at 12:33 am
    Nov 27, 2012 at 12:15 am
  • Hi, We have a scenario where we want a single Hadoop job to create/manage multiple mapper tasks where each mapper task will query a subset of columns in a relational database table. We looked into ...
    SrinivasrajagopalanSrinivasrajagopalan
    Nov 26, 2012 at 7:04 pm
    Nov 26, 2012 at 9:55 pm
  • Hi, I am using org.apache.pig.backend.hadoop.hbase.HBaseStorage to load from hbase table in pig. it works in local mode. But when I was trying do it in mapreduce mode. The mappers got the ...
    Jinyuan ZhouJinyuan Zhou
    Nov 20, 2012 at 9:49 pm
    Nov 25, 2012 at 7:40 am
  • Hi all, I am trying to run pig scripts in on my existing hadoop cluster with 6 nodes. I am trying to follow the tutorial in ...
    Satya Sundeep KambhampatiSatya Sundeep Kambhampati
    Nov 24, 2012 at 7:13 pm
    Nov 24, 2012 at 9:38 pm
  • Hi all, I am new to PIG. I checked out Pig code from svn with svn co http://svn.apache.org/repos/asf/pig/trunk. Then I moved to the trunk directory and ran apt command. I am getting build failed ...
    Satya Sundeep KambhampatiSatya Sundeep Kambhampati
    Nov 24, 2012 at 4:55 pm
    Nov 24, 2012 at 5:14 pm
  • Hi, I am trying to do some tasks with 'if else' inside the pig script, specifically, if the folder exists, we do some statements and join the data into some table, otherwise, just ignore this step ...
    Sheng GuoSheng Guo
    Nov 20, 2012 at 12:36 am
    Nov 20, 2012 at 2:34 am
  • Greetings, Is there a way to dynamically generate (maybe via UDF) the path to load/store data? (something like "A = LOAD InputPath() USING PigStorage();") Currently we calculate the load/store path ...
    Miki TebekaMiki Tebeka
    Nov 14, 2012 at 12:37 am
    Nov 14, 2012 at 12:55 am
  • Dear all, I use pig 0.10.0, hadoop 1.0.3 and hbase 0.94.1. My configuration mode is pseudo-distribution. When I use org.apache.pig.backend.hadoop.hbase.HBaseStorage() method to load data into Hbase, ...
    YonghuYonghu
    Nov 8, 2012 at 3:41 pm
    Nov 8, 2012 at 7:17 pm
  • hadoop@ip-10-245-54-191:~/top50/new$ cat a.pig DEFINE mymacro(blah, zoo) RETURNS foo { x = JOIN $blah BY id, $zoo BY id; y = JOIN x BY $blah::id, $zoo BY id; $foo = foreach y generate x::$blah::id ...
    YangYang
    Nov 7, 2012 at 11:45 pm
    Nov 8, 2012 at 4:05 pm
  • Hi, I have data in form 1,0.2,0.3 1,0.3,0.4 2,0.8,0.2 2,0.9,0.7 and so on.. so id, va1,val2 format.. This id is already sorted based on val 2 I want to select the 2nd element for each id with val2 ...
    Jamal sashaJamal sasha
    Nov 6, 2012 at 2:14 pm
    Nov 7, 2012 at 5:08 am
  • hi, all I think we need to optimize the org.apache.pig.impl.util.ObjectSerializer, because it uses java object serialization, which wastes a lot of space, so that it causes the tasktracker to OOME ...
    Haitao YaoHaitao Yao
    Nov 6, 2012 at 4:08 am
    Nov 6, 2012 at 4:36 am
  • Hi I am using pig 0.9.2. Trying to run embedded pig in python but get this error I have all hadoop lib jars, pig-0.9.2-withouthadoop.jar and jython-2.5.2.jar in class path CLASSPATH= /etc/hadoop ...
    AgateaaaAgateaaa
    Nov 3, 2012 at 7:17 am
    Nov 4, 2012 at 1:12 am
  • Hi, Can you hep me with the syntax of the natural logarithm (base e) of an expression in Pig? According to Help, the syntax is LOG(expression). I am trying to basically perform the following query ...
    Rajesh SrinivasanRajesh Srinivasan
    Nov 15, 2012 at 3:31 am
    Mar 7, 2013 at 8:05 am
  • Hi, I've been compiling some top 25 lists for the frequency with which values appear in certain columns in a relation, and based on some of the counts, am curious to see if some of the values occur ...
    Kris CowardKris Coward
    Nov 28, 2012 at 4:30 pm
    Dec 3, 2012 at 11:40 pm
  • Hello, According to https://issues.apache.org/jira/browse/PIG-1270 the execution engine can push limit operations into loads. If I am writing a custom LoadFunc in Java, what do I need to implement in ...
    Mike DrobMike Drob
    Nov 29, 2012 at 3:21 am
    Nov 29, 2012 at 10:22 pm
  • Hey Folks, I'm getting this exception - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias … - in the PigServer code which shows "Caused by ...
    Suresh SaggarSuresh Saggar
    Nov 26, 2012 at 5:50 pm
    Nov 27, 2012 at 5:50 am
  • Hi, I'm trying to read the Avro file i stored on HDFS, but I seem to be hitting a snag. I'm hoping some of you will be able to shed some light on this and allow me to continue my adventure! REGISTER ...
    Bart VerwilstBart Verwilst
    Nov 19, 2012 at 3:59 pm
    Nov 19, 2012 at 6:37 pm
  • hello In Alan Gates' Programming in Pig , chapter "Making Pig Fly" it was mentioned In testing we did while developing this feature we saw performance improvements of up to 4x when using LZO, and ...
    W WW W
    Nov 18, 2012 at 4:26 pm
    Nov 18, 2012 at 7:55 pm
  • You are right Cheolsoo, Indeed, it doesn't make any sense to write an UDF to compare datatypes. I know its possible, but doesn't sound the right way. Maybe it can be a bug at the JsonLoader I'm using ...
    Arian PasqualiArian Pasquali
    Nov 1, 2012 at 7:48 pm
    Nov 17, 2012 at 5:02 am
  • Hi I have dataset in some form F1, f2......fn Now sometimes f1 is empty sometimes f2 and so on Basically what I want is anytime any field is empty ignore that entry. Now one way to do is using filter ...
    Jamal sashaJamal sasha
    Nov 14, 2012 at 6:03 pm
    Nov 15, 2012 at 6:01 pm
  • Hi, I am trying to replace missing values with a precomputed value. But I am getting an error. So here is my code: Input = LOAD ‘data.txt’ USING PigStorage(‘,) AS (id1:double, id2:double); Ginput = ...
    Jamal sashaJamal sasha
    Nov 15, 2012 at 3:54 pm
    Nov 15, 2012 at 4:38 pm
  • Hi , I am Pooja here. I am working on one of the research projects at University of Houston. I am using PigLatin for my research. I wanted to know is there any way we can find the size of the file ...
    Pooja chitralPooja chitral
    Nov 14, 2012 at 2:32 pm
    Nov 14, 2012 at 3:37 pm
  • involves grouping by some id.. and then using AVG... but since the id is unique.. how do I group them???
    Jamal sashaJamal sasha
    Nov 6, 2012 at 7:19 pm
    Nov 6, 2012 at 7:26 pm
  • I'm trying to run a an embedded Pig script (embeded in Python) where I need to take the output/result of the script and feed it back into script as the input. I'm sure there is an easy way to do this ...
    Jesse JacksonJesse Jackson
    Nov 3, 2012 at 4:01 pm
    Nov 3, 2012 at 10:17 pm
  • Hi all, I install the pig Apache Pig version 0.10.0-SNAPSHOT (rexported) and my shell script cannot run. It give a notice that 'Not Set JAVA_HOME' But If I use the env to check the JAVA_HOME , it is ...
    Yingnan.maYingnan.ma
    Nov 2, 2012 at 2:47 am
    Nov 2, 2012 at 3:04 am
  • Congrats Rohini... -- "...:::Aniket:::... Quetzalco@tl"
    Aniket MokashiAniket Mokashi
    Nov 1, 2012 at 4:43 am
    Nov 1, 2012 at 10:15 am
  • Congrats Cheolsoo... -- "...:::Aniket:::... Quetzalco@tl"
    Aniket MokashiAniket Mokashi
    Nov 1, 2012 at 4:44 am
    Nov 1, 2012 at 4:53 am
  • Redirecting to Apache pig user list On Thu, Nov 29, 2012 at 1:01 AM, Johnny Kowalski wrote:
    Mark GroverMark Grover
    Nov 29, 2012 at 6:12 pm
    Nov 29, 2012 at 6:12 pm
  • -------- Forwarding messages -------- From: lulynn_2008 <<span class="m_body_email_addr" title="52c4f005e377197b37e71dd9f4a16c84" lulynn_2008@163.com</span Date: 2012-11-29 14:52:57 To: <span ...
    Lulynn_2008Lulynn_2008
    Nov 29, 2012 at 7:20 am
    Nov 29, 2012 at 7:20 am
  • https://github.com/mozilla-metrics/akela/tree/master/src/main/java/com/mozilla/pig/eval Lots of good UDFs from Mozilla. Russell Jurney http://datasyndrome.com
    Russell JurneyRussell Jurney
    Nov 26, 2012 at 6:05 pm
    Nov 26, 2012 at 6:05 pm
  • Hi, I want to set some of the job name policy that one commit job should named as '[adhoc][some word desc the job][date][time][freq]', but when I tested, I found that job.name's content that after ...
    Jameson LiJameson Li
    Nov 23, 2012 at 1:15 pm
    Nov 23, 2012 at 1:15 pm
Group Navigation
period‹ prev | Nov 2012 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions58
posts223
users64
websitepig.apache.org

64 users for November 2012

Cheolsoo Park: 20 posts Russell Jurney: 15 posts Jonathan Coveney: 13 posts Bart Verwilst: 10 posts Prashant Kommireddi: 10 posts Pablomar: 9 posts Malcolm Tye: 7 posts Arian Pasquali: 6 posts Jamal sasha: 6 posts Michał Czerwiński: 6 posts Vishwanath: 6 posts W W: 6 posts Lulynn_2008: 5 posts Sarah Mohamed: 5 posts Yang: 5 posts Alan Gates: 4 posts Dmitriy Ryaboy: 4 posts Satya Sundeep Kambhampati: 4 posts William Oberman: 4 posts Yingnan.ma: 4 posts
show more