Grokbase Groups Pig user July 2009

Search Discussions

31 discussions - 140 posts

  • Hi, I modified a function that I saw on JIRA that filters based on a (small) list of values present in a file in order to avoid another cogroup followed by a filter. The function gets the dfs path to ...
    Tamir KamaraTamir Kamara
    Jul 6, 2009 at 5:55 am
    Jul 7, 2009 at 6:09 am
  • Hi all, Is there any way to use Counter in Pig? I'd like to know how many data is abnormal . So Counter would be a good way in my opinion. Thank you Jeff Zhang
    Zhang jianfengZhang jianfeng
    Jul 13, 2009 at 5:35 am
    Jul 14, 2009 at 6:31 am
  • Hello all, new to this list, new to pig, running into some odd behavior with map[] data types. Please forgive me if these are known issues or problems with my syntax, What am i doing wrong here? ...
    Guy BayesGuy Bayes
    Jul 19, 2009 at 4:36 am
    Jul 19, 2009 at 7:20 pm
  • Hi All, Is there an easy way to clear temp files that Pig creates when a script runs? I tried adding RMF /tmp/tmp* and /tmp/temp*, but it doesn¹t seem to work (although it doesn¹t fail either ....). ...
    Chris RiccominiChris Riccomini
    Jul 8, 2009 at 5:53 pm
    Jul 9, 2009 at 3:05 pm
  • Dear users, My pig 0.3.0 and hadoop-0.18.3 worked fine with single node cluster, but after I change to a 2 - nodes cluster, I got this error as I ran pig: *org.apache.hadoop.ipc.RPC$VersionMismatch: ...
    George PangGeorge Pang
    Jul 20, 2009 at 3:27 am
    Jul 20, 2009 at 7:58 am
  • Hi, Recently my cluster configuration changed from 11 reducers to 12. Since then on every job using pig only 3 reducers do actual work and output results while the others quickly finish and output ...
    Tamir KamaraTamir Kamara
    Jul 2, 2009 at 11:55 am
    Jul 7, 2009 at 4:05 pm
  • Hi there, I'm working with Pig 2.0. and I have the following problem: 1. One pig script writes a tuple in the hdfs using - PigDump() 1.1 the tuple has the following schema ...
    Xavier QuintunaXavier Quintuna
    Jul 30, 2009 at 4:52 pm
    Jul 30, 2009 at 10:10 pm
  • As of release 0.3.0, Pig is still on Hadoop 0.18. Hadoop 0.20 was released 3 months ago and appears to be stable. It also seems to be considered a more solid release than 0.19 (at least by the Hadoop ...
    Alan GatesAlan Gates
    Jul 7, 2009 at 11:19 pm
    Jul 26, 2009 at 4:08 pm
  • Hello, I'm a new pig user, and I'm working on a project using pig over the summer. I've read over the archives and couldn't find any answer to my question so I figured I'd ask myself. I've loaded in ...
    Zach MurphyZach Murphy
    Jul 21, 2009 at 4:27 pm
    Jul 21, 2009 at 7:06 pm
  • Hi, I want to know how to get the list of keys in a map,and list of keys in a set of records whiich contain map type; how to find the count of each key by grouping the key for example i have the data ...
    Venkata ramanaiah anneboinaVenkata ramanaiah anneboina
    Jul 14, 2009 at 3:57 pm
    Jul 14, 2009 at 6:49 pm
  • Hi, I want to query serialized objects stored in hadoop / or atleast read it and print it. When I tried dumping the content to a file I get a binary dump. I require plain text content. Is there a way ...
    Ninad RautNinad Raut
    Jul 23, 2009 at 5:16 am
    Jul 24, 2009 at 3:48 pm
  • Hi, I am filtering the records based on values of inner bag the inner loop as shown in the following sequence of pig scripts. I am getting weird results based on whether the first row satisfies the ...
    Gururaj S MayyaGururaj S Mayya
    Jul 7, 2009 at 1:55 pm
    Jul 8, 2009 at 3:21 am
  • Hi all, We have a facility in hadoop where we can specify multiple input paths. Does this exist in Pig? Essentially, Is it possible to specify multiple paths in load command? For example, I have n ...
    Palleti, PallaviPalleti, Pallavi
    Jul 9, 2009 at 3:34 am
    Jul 9, 2009 at 2:48 pm
  • Hi, I've imported data from a MySQL db thanks to sqoop. However when I try to order this data on 2 fields it does not return the same answer as MySQL does (which is the correct result) Here is the ...
    Cyril ScetbonCyril Scetbon
    Jul 24, 2009 at 2:18 pm
    Jul 28, 2009 at 10:01 am
  • Hello, I am having a problem with PIG storing results from a user defined function. This function takes a chararray, and creates a tuple of 2 values out of it. When this tuple is passed back into pig ...
    Naber, ChadNaber, Chad
    Jul 16, 2009 at 5:45 pm
    Jul 16, 2009 at 5:53 pm
  • Hi, I have uneven amounts of elements in tuples in my relation. I know you can reference elements by index by $0, $1, etc., but is there a way to relatively reference them? As in a "get third from ...
    Turner KunkelTurner Kunkel
    Jul 29, 2009 at 7:53 pm
    Jul 29, 2009 at 7:58 pm
  • Hi, I am interested in querying my hbase tables using PIG-latin. i have come across org.apache.pig.backend.hadoop.hbase API... but there is no documentation for usuage of the API given. Has anyone ...
    Rakhi KhatwaniRakhi Khatwani
    Jul 22, 2009 at 5:56 am
    Jul 27, 2009 at 5:24 pm
  • Dear users, Since the syntax for "join" is *alias* = JOIN *alias* BY *field_alias,* *alias* BY *field_alias* If I need to JOIN relations by more than one *columns* (that is, my primary key consists ...
    George PangGeorge Pang
    Jul 20, 2009 at 7:45 am
    Jul 20, 2009 at 12:04 pm
  • Hi, The following script gives an error because split cannot be used in nested statements: x1 = load 'file' as (a, b, c); x2 = group x1 by a; x3 = foreach x2 { split x1 into y1 if b==1 and c==1, y2 ...
    Tamir KamaraTamir Kamara
    Jul 19, 2009 at 7:46 am
    Jul 19, 2009 at 3:45 pm
  • Hi all, This might be a naive question but we have a cron job that runs a bunch of nightly PIG jobs and ever since we upgraded to PIG 0.3.0 the jobs started failing because PIG tries to connect to ...
    Shrikrishna ShrinShrikrishna Shrin
    Jul 11, 2009 at 12:11 am
    Jul 11, 2009 at 12:19 am
  • Hi all, I found the hadoop18.jar which pig use contains configuration files, such as hadoop-site.xml , hadoop-default.xml. These files will be archived to pig.jar when run ant jar. And when I use pig ...
    Zhang jianfengZhang jianfeng
    Jul 9, 2009 at 7:17 am
    Jul 10, 2009 at 3:03 pm
  • Hi all, Today, I found there’s a way to set the job name for your pig script. PigServer pig=*new* PigServer(ExecType.*MAPREDUCE*); ...
    Zhang jianfengZhang jianfeng
    Jul 9, 2009 at 1:22 am
    Jul 9, 2009 at 1:46 pm
  • Hi all, I found that the following script will be converted into 3 mapreduce jobs: A = *LOAD* '/user/zjffdu/input.txt' *USING* PigStorage(); B = *GROUP* A *BY* $0; B = *FOREACH* B *GENERATE* ...
    Zhang jianfengZhang jianfeng
    Jul 9, 2009 at 1:24 am
    Jul 9, 2009 at 1:34 am
  • Hadoop Fans, we have had a few requests to extend the submission deadline through the weekend. To be fair, rather than make exceptions, we wanted to let everyone know we are accepting proposals until ...
    Christophe BiscigliaChristophe Bisciglia
    Jul 31, 2009 at 12:27 am
    Jul 31, 2009 at 12:27 am
  • Hey Hadoop Fans, after working with the ApacheCon organizers, it's clear there is way more interest in Hadoop than can fit into a single day - even with it's own track. Specifically, there were a lot ...
    Christophe BiscigliaChristophe Bisciglia
    Jul 30, 2009 at 12:02 am
    Jul 30, 2009 at 12:02 am
  • Dear all, I have some data in the following format: //test a {(1,{(a,1),(a,2),(a,7)}),(2,{(a,20),(a,15),(a,12)}),(3,{(a,9),(a,7),(a,8)})} b ...
    Cagri BalkesenCagri Balkesen
    Jul 23, 2009 at 5:30 pm
    Jul 23, 2009 at 5:30 pm
  • Alan GatesAlan Gates
    Jul 20, 2009 at 5:17 pm
    Jul 20, 2009 at 5:17 pm
  • I created a simple Emacs mode that highlights PigLatin syntax. It's very basic, but it does make life more pleasant if you are an Emacs user. Apache license, patches accepted :-) ...
    Dmitriy RyaboyDmitriy Ryaboy
    Jul 10, 2009 at 5:09 pm
    Jul 10, 2009 at 5:09 pm
  • Hadoop Fans, several of you have asked us to come to LA. We've heard you. We'll be teaming up with our friends at Fox Interactive Media to offer Hadoop Training at what might just be the coolest ...
    Christophe BiscigliaChristophe Bisciglia
    Jul 9, 2009 at 1:20 am
    Jul 9, 2009 at 1:20 am
  • Hi, The current implementation of COUNT and AVG in Pig counts null values. This is inconsistent with SQL semantics and also with semantics of other aggregated functions such as SUM, MIN, and MAX. ...
    Olga NatkovichOlga Natkovich
    Jul 6, 2009 at 5:47 pm
    Jul 6, 2009 at 5:47 pm
  • Another thing to confirm: Did you have cygwin, as mentioned in the Yahoo tutorial? 2009/7/2 mayank yadav <
    George PangGeorge Pang
    Jul 2, 2009 at 7:34 pm
    Jul 2, 2009 at 7:34 pm
Group Navigation
period‹ prev | Jul 2009 | next ›
Group Overview
groupuser @
categoriespig, hadoop

35 users for July 2009

Dmitriy Ryaboy: 21 posts Alan Gates: 13 posts Tamir Kamara: 12 posts Zjffdu: 11 posts Olga Natkovich: 9 posts Santhosh Srinivasan: 8 posts George Pang: 6 posts Guy Bayes: 5 posts Naber, Chad: 5 posts Parmod Mehta: 5 posts Xavier Quintuna: 3 posts Chris Riccomini: 3 posts Christophe Bisciglia: 3 posts Ted Dunning: 3 posts Zach Murphy: 3 posts Cyril Scetbon: 2 posts Daniel Dai: 2 posts Gururaj S Mayya: 2 posts Kevin Weil: 2 posts Ninad Raut: 2 posts
show more