Grokbase Groups Pig user March 2009

Search Discussions

31 discussions - 124 posts

  • Hi, If you have done some interesting work using Pig, we would like to know! Could you please, send a brief description of the kind of work you are doing with Pig and what have you learned from ...
    Olga NatkovichOlga Natkovich
    Mar 12, 2009 at 1:02 am
    Apr 1, 2009 at 5:25 pm
  • I am considering starting to use compression for data files I process with PIG. I am using trunk version of PIG on Hadoop-0.18.3. Uncompressed files are about 500Mb each, and I plan to have total few ...
    Vadim ZalivaVadim Zaliva
    Mar 14, 2009 at 9:00 am
    Mar 18, 2009 at 11:49 am
  • Hi, Is it possible to filter one file by a key not present in the other (similar to NOT IN or LEFT JOIN & IS NULL that can be done in DB) ? Thanks, Tamir
    Tamir KamaraTamir Kamara
    Mar 11, 2009 at 1:50 pm
    Mar 16, 2009 at 3:02 pm
  • Hello Pig List, I am now taking my (tested) pig script that will produce distinct counts and trying to apply it to real data. I am finding however, that though the map stage completes (100%), the ...
    Avram AelonyAvram Aelony
    Mar 19, 2009 at 9:14 pm
    Mar 20, 2009 at 10:33 pm
  • I have recently upgraded to Hadoop-0.19.1 (from 0.19.0). The pig task which used to work previously on it, does not work anymore, giving me cryptic error: ERROR 2998: Unhandled internal error. ...
    Vadim ZalivaVadim Zaliva
    Mar 13, 2009 at 2:30 am
    Mar 16, 2009 at 3:42 pm
  • Hello, I have a map whit a key that have diffrent values like *: Location, Zone, 35, Place .... *When I execute my script with a filter on this key I a have a Cast error.* source = LOAD 'thesource' ...
    Mathias FrydeMathias Fryde
    Mar 17, 2009 at 2:59 pm
    Mar 19, 2009 at 3:37 pm
  • Hello Pig list, I have looked at the 'distinct' keyword but it does not seem to operate on a particular fields (columns). I have a file with several categorical variables a1-a3 and am seeking to ...
    Avram AelonyAvram Aelony
    Mar 18, 2009 at 8:44 pm
    Mar 18, 2009 at 11:30 pm
  • Hi, Is there a way to specify or write a custom partitioner in pig ? Not split - partition data in a specific way - for some custom job. Thanks, Mridul
    Mridul MuralidharanMridul Muralidharan
    Mar 12, 2009 at 11:50 am
    Mar 17, 2009 at 11:00 am
  • Hi, I have implemented T. Hoffmann's PLSI based on EM algorithm in pig. The E/M login was implemented in pig in ~ 30-35 lines of pig-latin statements. The implementation is available in mahout as a ...
    Prasenjit mukherjeePrasenjit mukherjee
    Mar 2, 2009 at 4:58 pm
    Mar 3, 2009 at 4:07 pm
  • Hello, I am new to Pig and I am trying to figure out how to do this. I have noticed some examples that look like: Split A into B, C; I understand that these examples are probably about adding such ...
    Iman ElghandourIman Elghandour
    Mar 19, 2009 at 4:37 pm
    Mar 19, 2009 at 8:19 pm
  • I'm trying to run Pig from within another Java program using org.apache.pig.PigServer. However, when I instantiate it with: new PigServer("local"); I get the following stack trace: ...
    Gregory HarmanGregory Harman
    Mar 4, 2009 at 12:10 am
    Mar 5, 2009 at 4:07 am
  • Pig folks, Is this a cause of concern?: 09/03/30 22:15:51 WARN mapReduceLayer.PigHadoopLogger: Unable to spill contents to disk The "WARN" status implies that it's ...
    Chris OlstonChris Olston
    Mar 30, 2009 at 9:40 pm
    Mar 30, 2009 at 11:14 pm
  • Hello, I have just noticed that the implicit split is added in the wrong place in this plan. I am just examining the plan for the Pig script that is available in the jira issue: ...
    Iman ElghandourIman Elghandour
    Mar 23, 2009 at 2:41 am
    Mar 24, 2009 at 2:16 pm
  • I have a UDF which I'd like to use native code in via JNI. Is there a way in pig that I can distributed the shared object libraries to the nodes? I could manually push the .so's to the nodes or ...
    Sean TimmSean Timm
    Mar 12, 2009 at 2:16 pm
    Mar 20, 2009 at 9:09 pm
  • Hello, I am having trouble checking the execution plans using the explain command. I sent before this email about errors when using explain in a script: ...
    Iman ElghandourIman Elghandour
    Mar 19, 2009 at 4:27 pm
    Mar 19, 2009 at 9:26 pm
  • Hi, I'm trying to generate a sum of an expression like the following: b = GROUP a by domain; r = FOREACH b generate group, SUM(a.x+a.y); This results in an error that DefaultDataBag cannot be cast to ...
    Tamir KamaraTamir Kamara
    Mar 1, 2009 at 2:35 pm
    Mar 2, 2009 at 4:29 pm
  • Hi all, I investigated the AVG built-in Func, and do not understand why we need the static class: Initial,Intermediate,Final. What's difference between Algebraic UDF and Non-Algebraric, I think we ...
    Mar 29, 2009 at 6:21 am
    Mar 30, 2009 at 2:46 pm
  • Hi, Following a COGROUP I would like to filter results by one of the fields but I'm getting an error: Operand of Regex can be CharArray only. The relevant lines in my script are: x1 = COGROUP p3 BY ...
    Tamir KamaraTamir Kamara
    Mar 26, 2009 at 7:12 am
    Mar 26, 2009 at 4:44 pm
  • I am experiencing strange problem. I am trying to start a PIG task in mapreduce mode and it just hangs there. Job tracker does not show any tasks running and last message on console is: 2009-03-23 ...
    Vadim ZalivaVadim Zaliva
    Mar 24, 2009 at 2:18 am
    Mar 24, 2009 at 5:57 pm
  • Hello, I have been trying to output pig plans into a file as described in I am using the multiquery branch. When I use the syntax ...
    Iman ElghandourIman Elghandour
    Mar 18, 2009 at 11:26 pm
    Mar 19, 2009 at 9:33 pm
  • There was talk on pig-dev about the next stable pig release, and about possibly calling it 1.0 -- what's the final resolution there, and what is the goal date for the release? Thanks! Kevin
    Kevin WeilKevin Weil
    Mar 13, 2009 at 5:50 pm
    Mar 13, 2009 at 5:57 pm
  • Hi, I'm having trouble when trying to filter by a field that is defined as long and my desired value is also long. For example: traffic = load 'traffic.txt' as (domain:chararray, subnet:long); r = ...
    Tamir KamaraTamir Kamara
    Mar 3, 2009 at 3:06 pm
    Mar 3, 2009 at 3:26 pm
  • I made a syntax highlighting bundle for editing Pig scripts in Textmate. It's pretty rough, but it's helped me, so I thought others might like it as well. Feedback is welcome, and feel free to ...
    Kevin WeilKevin Weil
    Mar 1, 2009 at 1:20 am
    Mar 2, 2009 at 4:25 pm
  • Hello, I was wondering if anyone manage to use Pig using an Hbase input or output ? I would really appreciate to have an sample of the script and the configuration of Pig-Hadoop-HBase. Thanks in ...
    Mathias FrydeMathias Fryde
    Mar 18, 2009 at 1:49 pm
    Mar 18, 2009 at 1:49 pm
  • Hey Hadoop Fans, It's been a crazy week here at Cloudera. Today we launched our Distribution for Hadoop. This is targeted at Hadoop users who want to use the most recent stable version of Hadoop and ...
    Christophe BiscigliaChristophe Bisciglia
    Mar 17, 2009 at 12:43 am
    Mar 17, 2009 at 12:43 am
  • The next Bay Area Hadoop User Group meeting is scheduled for Wednesday, March 18th at Yahoo! 2811 Mission College Blvd, Santa Clara, Building 2, Training Rooms 5 & 6 from 6:00-7:30 pm. Agenda: ...
    Ajay AnandAjay Anand
    Mar 12, 2009 at 9:36 pm
    Mar 12, 2009 at 9:36 pm
  • Hi, I started having problems with pigpen after changing computers. I'm using pigpen 0.0.4 with all configuration values set (hadoop-site, pig.propreties). I get this error when using the example ...
    Tamir KamaraTamir Kamara
    Mar 12, 2009 at 7:56 am
    Mar 12, 2009 at 7:56 am
  • Hi, I'm using a compressed file and the following query: --links = LOAD '/user/hadoop/links/links.txt.bz2' AS (target:int, source:int); links = LOAD '/user/hadoop/links/links-gz/*' AS (target:int, ...
    Tamir KamaraTamir Kamara
    Mar 11, 2009 at 8:41 am
    Mar 11, 2009 at 8:41 am
  • Hi, When I am trying to use the "explain" command, I am getting an error ERROR - java.lang.RuntimeException: Serialization error: ...
    Padmashree RavindraPadmashree Ravindra
    Mar 3, 2009 at 10:43 pm
    Mar 3, 2009 at 10:43 pm
  • hi. I need to create a 'last visited' file where we store the last time a user visited a site (so we can do things like repeat visits). I'd like to be able to grab the latest day's access log file, ...
    Ian HolsmanIan Holsman
    Mar 2, 2009 at 4:40 pm
    Mar 2, 2009 at 4:40 pm
Group Navigation
period‹ prev | Mar 2009 | next ›
Group Overview
groupuser @
categoriespig, hadoop

31 users for March 2009

Avram Aelony: 14 posts Tamir Kamara: 13 posts Mridul Muralidharan: 12 posts Alan Gates: 11 posts Santhosh Srinivasan: 9 posts Iman Elghandour: 7 posts Olga Natkovich: 7 posts Mathias Fryde: 6 posts Vadim Zaliva: 6 posts Benjamin Reed: 5 posts Gregory Harman: 3 posts Gunther Hagleitner: 3 posts Kevin Weil: 3 posts Prasenjit mukherjee: 3 posts Yiping Han: 3 posts Chris Olston: 2 posts Sean Timm: 2 posts Ted Dunning: 2 posts Daga: 1 post Ajay Anand: 1 post
show more