FAQ

Search Discussions

18 discussions - 45 posts

  • Hi! I am new to PIG, so pardon my naïve question. I have a data like this: (A,1) (A,5) (B,4) (C,22) (C,10) I need to calculate maximum value for each distinct value of 1st column: (A,5) (B,4) (C,22) ...
    Vadim ZalivaVadim Zaliva
    Dec 31, 2008 at 9:52 pm
    Jan 5, 2009 at 10:28 pm
  • Sometimes in reasonably large jobs, the last phase of the reduce will get stuck (leaving the reduce fixed at 66%) in GC mode forever, at which point the logs will fill with lines like 2008-12-15 ...
    Kevin WeilKevin Weil
    Dec 16, 2008 at 4:47 am
    Dec 17, 2008 at 7:04 pm
  • If there is a version of pig which I can try to use with hadoop-0.19. Maybe some more or less stable SVN branch... Vadim -- "La perfection est atteinte non quand il ne reste rien a ajouter, mais ...
    Vadim ZalivaVadim Zaliva
    Dec 31, 2008 at 12:48 am
    Jan 5, 2009 at 6:08 pm
  • I upgrade my hadoop from 0.18 to 0.19, and So I wanna to know when the pig can run on hadoop 0.19?
    ParadisehiParadisehi
    Dec 1, 2008 at 7:55 am
    Dec 31, 2008 at 6:42 pm
  • My data is formated as: domain1, domain2, id, ip delimited by '\t'; I wanna filter the records where the domain2 is empty, records like this: google.com'\t''\t'ADIUVSF'\t'192.168.0.1 IsEmpty is just ...
    施兴施兴
    Dec 4, 2008 at 5:12 pm
    Dec 4, 2008 at 5:12 pm
  • Hi, Currently the Algebraic interface allows a UDF writer to have an Initial, Intermediate and Final class (each of which should implement EvalFunc). The idea is that the UDF can be called in stages ...
    Pradeep KamathPradeep Kamath
    Dec 15, 2008 at 6:52 pm
    Dec 16, 2008 at 7:24 pm
  • Some times I will use group by key, maybe some analysis on the webpages divided by domain? But in deed there will be 60% pages came from 20% domains, so the nodes that process these key(domain) will ...
    施兴施兴
    Dec 1, 2008 at 7:25 pm
    Dec 3, 2008 at 9:17 am
  • I am using Pig, and trying to join two files, and having the following problem (attempted the problem in two different ways now) ERROR ...
    Craig MacdonaldCraig Macdonald
    Dec 2, 2008 at 3:56 pm
    Dec 2, 2008 at 4:57 pm
  • Hi, I followed the instructions for making a custom Slicer in http://wiki.apache.org/pig/UDFManual, i.e., I made a class that implements "Slicer" and invoked it via the syntax "load X using ...
    Chris OlstonChris Olston
    Dec 23, 2008 at 4:37 pm
    Jan 8, 2009 at 5:28 pm
  • Hello. I am trying to write a custom group function that places a single tuple into multiple groups so that one row can be counted several times if it belongs to multiple groups. I have read the ...
    Michael HarrisMichael Harris
    Dec 19, 2008 at 6:59 pm
    Jan 2, 2009 at 11:04 pm
  • Data file: {(2985671202194220139)} Pig script: a = load 'data' as (list: bag{t: tuple(value: chararray)}); dump a Output: 2008-12-13 09:08:24,831 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ...
    DagaDaga
    Dec 13, 2008 at 5:34 pm
    Dec 16, 2008 at 7:15 pm
  • T2 = LOAD 'input/cookie_sort.0.log' USING PigStorage('\t') AS (cookie, ip, time, type, pn, p, f, rsp, x1, x2, query, url, x3, x4); T3 = FOREACH T2 GENERATE COUNT(TOKENIZE(url)); STORE T3 INTO 't3'; ...
    ParadisehiParadisehi
    Dec 1, 2008 at 3:17 am
    Dec 1, 2008 at 4:28 am
  • Got a error when tried to concatenate two pieces of data which contain bags. A simple example: a = load 'a' as (list: bag{t: tuple(item: chararray)}); b = load 'b' as (list: bag{t: tuple(item: ...
    DagaDaga
    Dec 31, 2008 at 6:26 pm
    Dec 31, 2008 at 6:26 pm
  • I have a write up on the functional specification and design of the error handling feature in Pig. Please feel free to comment. Functional specification: ...
    Santhosh SrinivasanSanthosh Srinivasan
    Dec 8, 2008 at 8:47 pm
    Dec 8, 2008 at 8:47 pm
  • Hi, Pig team is happy to announce Pig 0.1.1 release. Pig is Hadoop subproject which provides high-level data-flow language and execution framework for parallel computation on Hadoop clusters. More ...
    Olga NatkovichOlga Natkovich
    Dec 8, 2008 at 7:07 pm
    Dec 8, 2008 at 7:07 pm
  • Dear Users, By popular demand, we put together a manual that provides details and examples of how to use and write function in Pig: http://wiki.apache.org/pig/UDFManual. Please note that the manual ...
    Olga NatkovichOlga Natkovich
    Dec 4, 2008 at 11:36 pm
    Dec 4, 2008 at 11:36 pm
  • I have a script that I can run across a varying number of days' worth of data. Over 1 day, the script works perfectly, and runs in about 5 minutes on my cluster. Running it over 7 days, it generally ...
    Kevin WeilKevin Weil
    Dec 3, 2008 at 1:43 am
    Dec 3, 2008 at 1:43 am
  • The documentation mentioned that pig is capable of splitting .bz2 files. Is that done through a InputFileFormat? What is the name of that file format class? Compressed Input Compressed files are ...
    Zheng ShaoZheng Shao
    Dec 2, 2008 at 3:37 pm
    Dec 2, 2008 at 3:37 pm
Group Navigation
period‹ prev | Dec 2008 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions18
posts45
users17
websitepig.apache.org