Search Discussions

58 discussions - 234 posts

  • Hey guys, I encounter java.lang.OutOfMemoryError when using TOP udf. It seems that the udf tries to process all data in memory. Is there a workaround for TOP? Or maybe there is some other way of ...
    Ruslan Al-fakikhRuslan Al-fakikh
    Nov 17, 2011 at 2:14 pm
    Jan 6, 2012 at 4:11 am
  • Hello, First of all, great job creating pig, really a magnificent piece of software. I do have a few questions about UDFs. I have a dataset with a list of url's I want to fetch. Since an EvalFunc can ...
    Daan GeritsDaan Gerits
    Nov 9, 2011 at 1:34 pm
    Nov 10, 2011 at 5:40 pm
  • I have wrote a custom store function that primarily based on the multi-storage store function. They way I use it is store load_log INTO '/Users/felix/Documents/pig/multi_store_output/ns_{0}/site_{1}' ...
    Felix gaoFelix gao
    Nov 2, 2011 at 1:08 am
    Nov 8, 2011 at 8:32 pm
  • Until now we were manually copying our Jars to all machines in a Hadoop cluster. This used to work until our cluster size was small. Now our cluster is getting bigger. What's the best way to start a ...
    Something SomethingSomething Something
    Nov 16, 2011 at 6:25 am
    Nov 17, 2011 at 2:52 pm
  • Hi, all. I've got custom log (csv delimited by comma) with iso dates, sometimes log writing lags and I'm having exceptions with wrong iso date format. Here's exception: ...
    Rauan MaemirovRauan Maemirov
    Nov 8, 2011 at 10:19 am
    Nov 10, 2011 at 2:42 am
  • I'm a newbie running Pig 0.6 on Amazon Elastic Map Reduce. I need to make a change to add additional fields to the log files that I run my pig jobs on and am wondering how do I handle this schema in ...
    B M D GillB M D Gill
    Nov 13, 2011 at 12:08 am
    Dec 12, 2011 at 1:32 pm
  • I have a pig script that I've translated from an old Python job. The old script worked by read a bunch of lines of JSON into sqlite and running queries again that. The sqlite DB ended up being about ...
    David KingDavid King
    Nov 30, 2011 at 6:06 am
    Nov 30, 2011 at 11:57 pm
  • I have a question regarding the pig data types. If I have a UDF, say 'CustomUDF' and I do something like this: REGISTER 'foo.jar'; A = LOAD '/shared/a.dat'; What would be the difference in the data ...
    Prashant KommireddiPrashant Kommireddi
    Nov 24, 2011 at 9:42 pm
    Nov 25, 2011 at 9:28 pm
  • Hello, jython-2.5.0 is included in pig-0.9.1 directory. This file did not appear in pig-0.8.1. My question is: what jython-2.5.0 is used for in pig-0.9.1? Is jython-2.5.0 jar file necessary for ...
    Nov 17, 2011 at 10:32 am
    Nov 21, 2011 at 4:38 pm
  • Thanks.
    Nov 17, 2011 at 7:38 am
    Nov 17, 2011 at 8:13 pm
  • Hi, We're trying to run a PIG script using the PigServer API in Java, but we're having a couple issues. It seems to work well in most cases, but in our case we need to use Pig's dynamic invokers. ...
    Charles MenguyCharles Menguy
    Nov 16, 2011 at 11:02 pm
    Nov 17, 2011 at 8:06 pm
  • Hi, We are trying to brainstorm on how best to integrate hive queries into pig. All suggestions are greatly appreciated! Note, we are trying to use hcatalog but there are a couple of problems with ...
    Stan RosenbergStan Rosenberg
    Nov 14, 2011 at 3:44 pm
    Nov 15, 2011 at 12:10 am
  • I was just working on a pig script to group some data by a field and then generate percentages for each group. Without windowing functions at my disposal, I wound up using a group by on the field for ...
    Doug DanielsDoug Daniels
    Nov 10, 2011 at 2:33 am
    Nov 14, 2011 at 4:03 pm
  • Hi, In the past we have for the most part avoided supporting multiple versions of Hadoop with the same version of Pig. This is about to change with release of Hadoop 23. We need to come up with a ...
    Olga NatkovichOlga Natkovich
    Nov 7, 2011 at 7:16 pm
    Nov 8, 2011 at 6:05 pm
  • It is my pleasure to announce that Apache Board has approved the nomination of Daniel Dai as our new PMC Chair. Congrats Daniel, well deserved! Olga
    Olga NatkovichOlga Natkovich
    Nov 2, 2011 at 10:09 pm
    Nov 4, 2011 at 6:21 am
  • Hey I am trying to extract performance metrics from some of my logs using Pig and have come up with the following. I feel like I might be performing one too many steps and was wondering if there is a ...
    Cameron GandeviaCameron Gandevia
    Nov 2, 2011 at 5:18 pm
    Nov 3, 2011 at 12:18 am
  • Hi I am using java UDF with pig. I want pass file name to UDF. I have done it as follows but it is not working. PIg script REGISTER myjar.jar; DEFINE myfun myjar.PigClass('/home/myfile.txt'); --load ...
    Anil BarfaAnil Barfa
    Nov 28, 2011 at 7:56 am
    Dec 1, 2011 at 9:12 am
  • I understand Pig supports Amazon S3 storage, however in trying to access an S3 bucket configured as 'requester pays', I get access denied 403, "AWS Request ID: ..., AWS Error Code: AccessDenied, AWS ...
    Dan BrickleyDan Brickley
    Nov 9, 2011 at 12:15 am
    Nov 12, 2011 at 4:21 pm
  • Hi, I have a UDF that is: public DataBag exec(Tuple input) throws IOException This bag has tuples with 2 String fields each. How do I tell in Pig to expect a bag{tuple(chararray, chararray)} from ...
    Ayon SinhaAyon Sinha
    Nov 29, 2011 at 11:49 pm
    Nov 30, 2011 at 1:32 am
  • Dear pigs: How can I set job name in pig script, especially in command line. Thanks!
    Nov 25, 2011 at 3:29 am
    Nov 27, 2011 at 10:14 am
  • Hello all, I run "ant test" to pig-0.9.1, and the following test cases failed. My questions are: --Is there any jira for these failed test case? --I assume all test cases should be passed for ...
    Nov 17, 2011 at 2:58 am
    Nov 19, 2011 at 4:30 pm
  • I have parsed a json file structured as: {"id":"xyz", "name":"John", "tags":"apples and oranges"} {"id":"xyz", "name":"John", "tags":"\uac38\uc6b0"}...etc and I'd like to filter out the entries that ...
    Kat HuangKat Huang
    Nov 11, 2011 at 9:46 pm
    Nov 11, 2011 at 11:56 pm
  • Hi I am experiencing the following issues in part of my pig script. *data = FOREACH metricLogLine GENERATE host, REGEX_EXTRACT_ALL(body, '.*gr.perf.metrics.Count\\s*\\-\\s*([A-Za-z\\.]+)\\s*(\\d+)') ...
    Cameron GandeviaCameron Gandevia
    Nov 3, 2011 at 6:33 pm
    Nov 3, 2011 at 7:00 pm
  • Hi friends, I am new to Pig library. I need help on how to read data from solr using pig?. If you have any code samples please provide me. Thanks, swami
    Kumar swamiKumar swami
    Nov 30, 2011 at 3:31 pm
    Nov 30, 2011 at 6:33 pm
  • Hi, I am trying to run pig job to read HAR data from S3 and run the job on ec2 cluster and I am getting the following error: Any ideas on what could be running Error before Pig is launched ...
    Gayatri RaoGayatri Rao
    Nov 15, 2011 at 1:29 am
    Nov 29, 2011 at 9:38 am
  • Hi all Pig Latin script we write is converted to number of map reduce jobs to be executed by Hadoop. So my assumption was that if we write number of lines in Pig Latin, it will generate a equivalent ...
    Paritosh sumraoParitosh sumrao
    Nov 22, 2011 at 5:31 pm
    Nov 22, 2011 at 9:21 pm
  • Dear All, In one of the PoCs, I have to export results generated in a Pig script to mysql. For this I am using DBStorage. While there are no errors during the execution and output logs shows the ...
    Vijaya bhaskar peddintiVijaya bhaskar peddinti
    Nov 18, 2011 at 3:26 pm
    Nov 19, 2011 at 6:28 pm
  • I understand Pig supports only Gzip and Bzip compression algorithms. Would it be fine if I set map output compression (between map and reduce) to SnappyCodec? I am guessing this should not be a ...
    Prashant KommireddiPrashant Kommireddi
    Nov 18, 2011 at 5:03 am
    Nov 18, 2011 at 6:59 pm
  • As per Alan F Gates in "Programming Pig" : *Pig does not know whether integer values in baseball are stored as ASCII strings, Java serialized values, binary coded decimal, or some other format. So it ...
    Nov 17, 2011 at 6:52 pm
    Nov 17, 2011 at 8:13 pm
  • I just wondered about the status of hbase storage, specifically the store part of it. Is it something people are using in production - ready for prime time? I seemed to remember a couple of people ...
    Jeremy HannaJeremy Hanna
    Nov 11, 2011 at 11:28 pm
    Nov 12, 2011 at 12:20 am
  • hi, I just installed pig-0.8.1-cdh3u2 on amazon ec2 and run my pig script, got following error: ERROR 2998: Unhandled internal error. org/jets3t/service/S3ServiceException ...
    Dan YiDan Yi
    Nov 8, 2011 at 7:02 pm
    Nov 9, 2011 at 8:12 pm
  • I have a UDF to output JSON. (PIG v0.9.1, Hadoop 0.20.204) I have tested the setup outside of pig and Jackson will produce a JSON string. However in the UDF I am getting: ERROR 2997: Unable to ...
    Rob parkerRob parker
    Nov 4, 2011 at 3:33 pm
    Nov 4, 2011 at 11:21 pm
  • Hello, I am pulling data from cassandra into pig which means it ends up like key, bag { (name,value),(name,value) }. The info is logfiles so each column is a field in server logfile (like apache). I ...
    Nov 4, 2011 at 12:51 pm
    Nov 4, 2011 at 5:12 pm
  • Hello everyone, I have some issues with the pig flatten statement as I receive several exceptions when trying to flatten a bag. I read in Jira and on the mailiinglists that other people had issues ...
    Daan GeritsDaan Gerits
    Nov 29, 2011 at 9:48 am
    Dec 2, 2011 at 12:13 am
  • I'm running into an issue with pig 0.9.1. My top-level data directory contains several files and directories with restricted permissions, and my LoadFunc and input format ignore these directories if ...
    Adam PortleyAdam Portley
    Nov 22, 2011 at 3:46 am
    Nov 23, 2011 at 1:33 am
  • Hello, I'm new to Pig :) I'm wondering if there is some way to start up grunt and have my UDFs already registered and ready to go? I tried placing them in the PIG_CLASSPATH. They are available to ...
    Nov 22, 2011 at 3:17 pm
    Nov 22, 2011 at 7:31 pm
  • If need to have a test with jsch-0.1.39, please provide test method. Thank you. -------- Forwarding messages -------- From: lulynn_2008 <lulynn_2008@163.com Date: 2011-11-20 23:32:39 To: user ...
    Nov 20, 2011 at 3:34 pm
    Nov 21, 2011 at 4:38 pm
  • stringtemplate-3.2 is included in pig*-0.9.1.jar. My question is: could we use stringtemplate-3.1b.1 instead of stringtemplate-3.2? Thank you.
    Nov 19, 2011 at 3:46 pm
    Nov 19, 2011 at 4:37 pm
  • Hallo, I'm a little confused as to how to load avro data into pig using AvroStorage. I have a map-reduce job that writes an AvroKey<Long /AvroValue<GenericRecord K/V pair, producing a schema that ...
    Andrew KenworthyAndrew Kenworthy
    Nov 16, 2011 at 2:26 pm
    Nov 18, 2011 at 1:22 pm
  • Hello, I am running on machines with 4G, out of which two are allocated for running the OS in memory. It leaves me with 2G, I will be using 1.5 to run the mappers and reducers (machines are ...
    Keren OuaknineKeren Ouaknine
    Nov 17, 2011 at 4:56 pm
    Nov 18, 2011 at 6:17 am
  • Hi, When overloading a UDF with getArgToFuncMapping() the parent/root UDF outputSchema() is being called. *LogFieldValue * @Override public List<FuncSpec getArgToFuncMapping() throws ...
    Nov 17, 2011 at 12:34 am
    Nov 17, 2011 at 2:53 pm
  • hello, is it possible to pass values to a store (or load) function? e.g. %declare JOBID 'JOB-2011-11-15-001'; STORE metrics INTO 'metrics' USING ...
    Geert Van LandeghemGeert Van Landeghem
    Nov 15, 2011 at 3:46 pm
    Nov 15, 2011 at 3:59 pm
  • hi, all i have a big projects, one of my project have all the pig scripts, and other one have all the java file as udfs for the pig scripts. i use eclipse/maven to manage all the projects, just ...
    Dan YiDan Yi
    Nov 14, 2011 at 10:40 pm
    Nov 14, 2011 at 11:23 pm
  • Is there any way of storing in Json format in pig? I haven't been able to find any, but thought I'd put this question out there if anyone has come up with a workaround
    Kat HuangKat Huang
    Nov 14, 2011 at 2:05 pm
    Nov 14, 2011 at 2:09 pm
  • I have parsed a json file structured as: {"id":"xyz", "name":"John", "tags":"apples and oranges"} {"id":"xyz", "name":"John", "tags":"\uac38\uc6b0"}...etc and I'd like to filter out the entries that ...
    Nov 11, 2011 at 9:18 pm
    Nov 11, 2011 at 10:48 pm
  • Hello everyone, Is it possible to update a counter from within an UDF? I know there is some information on updating counters using log messages, but I have never done that before and have no idea if ...
    Daan GeritsDaan Gerits
    Nov 9, 2011 at 7:46 pm
    Nov 9, 2011 at 7:59 pm
  • Hi All, I'd like to get the schema of a relation that is used in conjunction with my custom StorageFunc. I found 'checkSchema' to be useful for this case, however, it seems to work only in local ...
    Stan RosenbergStan Rosenberg
    Nov 8, 2011 at 3:01 am
    Nov 8, 2011 at 3:42 am
  • Does any one have an example of how to use SequenceFileLoader for writable objects other than IntWritable, LongWritable, Text etc. Thanks Gayatri
    Gayatri RaoGayatri Rao
    Nov 1, 2011 at 11:44 pm
    Nov 4, 2011 at 10:08 pm
  • Hi, I have a hadoop archive file of sequence files. I have modified the SequenceFileReader to get the list of files. So I get the list of files that I intend to read But when pig tries to create pig ...
    Gayatri RaoGayatri Rao
    Nov 29, 2011 at 10:59 am
    Nov 29, 2011 at 10:59 am
  • In my PIG script I would like to export the output using STREAM However, my stream command expects LD_LIBRARY_PATH to be set. In normal Hadoop Streaming, I could pass "-cmdenv". Is there any relevant ...
    Rajesh BalamohanRajesh Balamohan
    Nov 23, 2011 at 12:29 pm
    Nov 23, 2011 at 12:29 pm
Group Navigation
period‹ prev | Nov 2011 | next ›
Group Overview
groupuser @
categoriespig, hadoop

65 users for November 2011

Dmitriy Ryaboy: 31 posts Jonathan Coveney: 18 posts Prashant Kommireddi: 14 posts Ashutosh Chauhan: 13 posts Lulynn_2008: 10 posts Alan Gates: 9 posts Pablomar: 8 posts Daan Gerits: 7 posts Stan Rosenberg: 7 posts Cameron Gandevia: 6 posts Daniel Dai: 5 posts Felix gao: 5 posts Ruslan Al-Fakikh: 5 posts Dan Brickley: 4 posts Gayatri Rao: 4 posts Jeremy Hanna: 4 posts Kat Huang: 4 posts Raghu Angadi: 4 posts Rauan Maemirov: 4 posts Something Something: 4 posts
show more