Grokbase Groups Pig user May 2011

Search Discussions

60 discussions - 239 posts

  • Is there a way to store the headers (titles of each) column using the Store command in Pig Script (STORE out3 INTO '$OUTPUT' USING PigStorage();. Right now it stores only the data. Somewhere I read ...
    Subhramanian, DeepakSubhramanian, Deepak
    May 25, 2011 at 12:28 pm
    May 27, 2011 at 1:47 pm
  • I am sorry if this has been asked in the past. I can't seem to find information on it. I have two questions, but they are somewhat related. #1) Let's say you are tracking messages and extracting the ...
    May 6, 2011 at 9:14 pm
    May 10, 2011 at 9:34 pm
  • Hi all, I have the below pig code: register /home/uu/project/lib/pigudfs.jar ruls = load 'testurl' as (url:chararray); b = foreach ruls generate$0,3,0.1); here when ...
    Jameson LiJameson Li
    May 23, 2011 at 2:07 pm
    Jun 2, 2011 at 7:00 pm
  • Hi Guys, I am running following Pig script in Pig 0.8 version page_events = LOAD '/user/sgehlot/day=2011-05-10' as (event_dt_ht:chararray,event_dt_ut:chararray,event_rec_num:int,event_type:int, ...
    Sonia gehlotSonia gehlot
    May 12, 2011 at 5:44 pm
    May 23, 2011 at 11:02 pm
  • i'm trying to run this code on Amazon Elastic MapReduce ... score = FOREACH a GENERATE A, B, C,D, 1*v + 20*c + 30*a + 100*p AS score; gg = group score by (A, B, C); topResults = FOREACH gg { result = ...
    Shai HarelShai Harel
    May 5, 2011 at 1:46 pm
    May 8, 2011 at 10:59 pm
  • Hello dear Pig users, *I am loading a file with the following format:* *$ cat peoples.txt tom;1234,4567,6 anna;27894* First field is a name, second field is a concatenation of an unknown number of ...
    May 10, 2011 at 9:31 am
    May 13, 2011 at 2:22 pm
  • Hi, I got a cluster with seven Cassandra nodes. The ring is formed using the private ips of each of the nodes. The rpc_address of the nodes is set to private and listen_address of the nodes set to ...
    Badrinarayanan SBadrinarayanan S
    May 6, 2011 at 4:40 pm
    May 13, 2011 at 6:15 am
  • Hi, I have a sequence of jobs which are run daily and usually the logs and results are erased every time they have to be re-run. Now we want to keep those logs and results, but if the results already ...
    Renato Marroquín MogrovejoRenato Marroquín Mogrovejo
    May 21, 2011 at 1:51 am
    May 23, 2011 at 4:19 pm
  • When I run a pig job the hadoop job tracker gui (the one on port 50030) shows ‘PigLatin:myscript.pig’ as the name of the job. How can I configure that to show a different name than the name of the ...
    William DowlingWilliam Dowling
    May 26, 2011 at 6:04 pm
    May 26, 2011 at 9:36 pm
  • Hi , I am trying to use pig to aggregate data from an applications log lines. Most of the data in the input file have the following format: A B C D E F I am aggregating the data as follows: A= load ...
    Arun Chandy ThomasArun Chandy Thomas
    May 25, 2011 at 10:22 pm
    May 25, 2011 at 11:58 pm
  • Hi, While testing a very simple PIG 0.8.0 script counting the nb of rows of one of my HBase tables, I got a strange result: the nb of rows reported was only half it should have been (compared to a ...
    Vincent BaratVincent Barat
    May 23, 2011 at 2:59 pm
    May 24, 2011 at 7:03 am
  • Hi, Anyone using Twitter's elephantbird library? I was using its JsonLoader and got this error: WARN com.twitter.elephantbird.pig.load.JsonLoader - Could not json-decode string Unexpected character ...
    Dexin WangDexin Wang
    May 18, 2011 at 6:13 pm
    May 19, 2011 at 2:26 pm
  • Hi, I'm stuck on a query for counting distinct users. Say I have data that looks like this: book, user1 book, user2 book, user1 movie, user1 movie, user2 movie, user3 music, user4 I want to group by ...
    Kim VogtKim Vogt
    May 6, 2011 at 6:08 pm
    May 6, 2011 at 6:34 pm
  • I am in process of installing and learning pig. I have a hadoop cluster and when I try to run pig in mapreduce mode it errors out: Error before Pig is launched ---------------------------- ERROR ...
    Mohit AnchliaMohit Anchlia
    May 26, 2011 at 12:06 am
    May 26, 2011 at 2:34 pm
  • Hi, I have a LoadFunc that loads data using a complex schema. I don't want to have to specify the schema every time. LoadFunc used to have a method "determineSchema". The current docs reference this: ...
    Sweet, NateSweet, Nate
    May 20, 2011 at 9:21 pm
    May 23, 2011 at 7:49 pm
  • Hey all, I have one file A with a 'day' column like "2011/3/2" and another B with a column 'timestamp' like "2011/3/2 12:32" ... I want to join on these two field in these records. I do something ...
    Daniel EklundDaniel Eklund
    May 17, 2011 at 3:33 pm
    May 17, 2011 at 9:18 pm
  • I'm trying to embed pig into java program. I tried two approaches, none of them works. Approach 1: I followed and then ran into the ...
    Jianting CaoJianting Cao
    May 12, 2011 at 12:46 am
    May 17, 2011 at 7:28 pm
  • Is there a change log for the 0.8.1 release? release notes.txt just mentions "bug fixes"
    Corbin HoenesCorbin Hoenes
    May 13, 2011 at 7:37 pm
    May 14, 2011 at 3:10 am
  • This is more of a "how can I do this" question. Imagine you have an sql query like the following: select a.f1, a.f2, a.f3, b.f1, b.f2, c.f1, c.f2, d.f1, d.f2 from tableA a, tableB b, tableC c, tableD ...
    Mark LaczinMark Laczin
    May 5, 2011 at 7:06 pm
    May 9, 2011 at 5:06 pm
  • I was wondering if there was any documentation around (or if anyone simply knew) which versions of pig work with which versions of Hadoop? We are now using 0.20.2 CDH3 and it is not compatible with ...
    Jonathan CoveneyJonathan Coveney
    May 3, 2011 at 3:14 pm
    May 5, 2011 at 7:05 pm
  • Our production environment has undergone software upgrades and now I'm working with: Hadoop 0.20.2-cdh3u0 Apache Pig version 0.8.0-cdh3u0 HBase 0.90.1-cdh3u0 My research indicates that these all ...
    Jameson LoppJameson Lopp
    May 25, 2011 at 9:04 pm
    May 26, 2011 at 2:50 pm
  • Hello, I have been trying to set up pig 0.8.1 to work with hadoop 0.20.203 without success. At the moment, if I run pig -x local I correctly get access to the grunt shell but when I try to run the ...
    Rui Miguel ForteRui Miguel Forte
    May 23, 2011 at 4:50 pm
    May 26, 2011 at 2:06 pm
  • Is any of the products within Hadoop projects can be used for Data Visualization (Creating graphs , pivot tables etc ) from a pig output CSV file ? -- "Please consider the environment before printing ...
    Subhramanian, DeepakSubhramanian, Deepak
    May 25, 2011 at 7:54 pm
    May 26, 2011 at 11:15 am
  • If I can access the implicit 'group' column from within FOREACH like this: GROUPED = GROUP InputRelVar by (firstDim,secondDim); B = FOREACH GROUPED GENERATE group.firstDim; ... then should I not be ...
    Daniel EklundDaniel Eklund
    May 20, 2011 at 7:26 pm
    May 22, 2011 at 9:38 pm
  • Hi all, are there any tools that visualize the data flow in a Pig script, such as the joins, unions etc.? Maybe somewhat like this: A B \ / C I imagine that would be very helpful during development ...
    Thomas KapplerThomas Kappler
    May 17, 2011 at 9:13 am
    May 17, 2011 at 3:39 pm
  • Hi, Is there only one way to load data into pig, i.e. using load command to load data from files? Can I load data from memory, for example in embedded code create a table and store data into it? ...
    Jianting CaoJianting Cao
    May 13, 2011 at 5:09 pm
    May 13, 2011 at 5:49 pm
  • Hi Guys, Can anyone please tell me how to read Explain plan in pig? When I do explain plan for any of my pig query it gives me really good flow diagram, but it uses some Pig functions so didn' t ...
    Sonia gehlotSonia gehlot
    May 12, 2011 at 11:33 pm
    May 13, 2011 at 5:27 pm
  • I am trying to download pig 0.7 versions from the mirror sites. But none of the mirror sites working. Any suggestions. ? Thanks , Deepak -- "Please consider the environment before printing this ...
    Subhramanian, DeepakSubhramanian, Deepak
    May 12, 2011 at 4:14 pm
    May 13, 2011 at 8:30 am
  • Context: I have a bunch of files living in HDFS, and I think my jobs are failing on one of them... I want to output the files that the job is failing on. I thought that I could just make my own ...
    Jonathan CoveneyJonathan Coveney
    May 31, 2011 at 5:08 pm
    May 31, 2011 at 5:52 pm
  • My goal is to be able to make functions like GREATER(a,b,c...) which can take any number of columns, and for each row will give the greater of them. I also want to detect what type of columns they ...
    Jonathan CoveneyJonathan Coveney
    May 20, 2011 at 5:42 pm
    May 20, 2011 at 6:42 pm
  • Hi, Can we do set difference in pig ? The set difference is defined by: A-B = {x: x element of A and x is not element of B } Thanks Deepak
    Deepak SinghDeepak Singh
    May 12, 2011 at 4:11 pm
    May 12, 2011 at 6:38 pm
  • I am having problems getting the simple example Python UDFs from to work. Stack trace: Pig Stack Trace --------------- ERROR 2998: Unhandled ...
    Andrea LeistraAndrea Leistra
    May 10, 2011 at 7:44 pm
    May 11, 2011 at 4:21 am
  • Hi, I run hadoop in grid5000,I like to know how I can use pig in hadoop mode in this case and thank you ?
    Hiba houimliHiba houimli
    May 10, 2011 at 4:18 pm
    May 10, 2011 at 9:35 pm
  • Hello, I am using PigStorageSchema for my output so I can use the .pig_header file that is generated. I¹ve noticed the output header is always tab delimited although my output delimiter is set to ...
    Mads MoellerMads Moeller
    May 3, 2011 at 8:38 pm
    May 4, 2011 at 6:57 pm
  • Hello, I got wrong information when I using pig -x local. My pig-version is 0.81. When I issue the command pig -x local. The error message occurs: 2011-04-30 11:14:31,739 [main] INFO ...
    May 2, 2011 at 6:08 am
    May 2, 2011 at 3:57 pm
  • Hi, I'm trying to run PIG 0.8.1 in loval mode from Java. My code used to work with PIG 0.6.1. Now, I got the following exception: File ...
    Vincent BaratVincent Barat
    May 17, 2011 at 3:03 pm
    Jun 27, 2011 at 10:33 am
  • Hello everyone. I’m currently doing some research involving Hadoop and Pig, evaluating the cost of data replication vs. the penalty of node failure with respect to job completion time. I’m current ...
    David A Boyuka, IIDavid A Boyuka, II
    May 31, 2011 at 10:23 pm
    Jun 1, 2011 at 12:24 am
  • I am new to hadoop and from what I understand by default hadoop splits the input into blocks. Now this might result in splitting a line of record into 2 pieces and getting spread accross 2 maps. For ...
    Mohit AnchliaMohit Anchlia
    May 27, 2011 at 4:59 pm
    May 27, 2011 at 7:19 pm
  • Hey, I have a file similar to syslog output. It is 1 tuple per line, space seperated, but the tuple can have variable number of arguments if you use the standard PigStorage function to load the file. ...
    Sridhar basamSridhar basam
    May 26, 2011 at 4:10 pm
    May 27, 2011 at 6:05 am
  • FYI, folks: as part of making my branch of EB for pig 0.8 "real" and preparing to merge it into EB master, I will be pushing a change soon that will rename the pig8 package back to pig (so, ...
    Dmitriy RyaboyDmitriy Ryaboy
    May 25, 2011 at 12:56 am
    May 25, 2011 at 6:03 pm
  • Alan GatesAlan Gates
    May 24, 2011 at 8:32 pm
    May 25, 2011 at 2:44 pm
  • Please join me in welcoming Aniket Mokashi as a new committer on Pig. Aniket has been contributing to Pig since last summer. He wrote or helped shepherd several major features in 0.8, including the ...
    Alan GatesAlan Gates
    May 19, 2011 at 5:09 pm
    May 22, 2011 at 6:53 pm
  • Hi all, Sorry if this is a trivial question, but i wanted to clarify it before starting to use pig. How exactly does pig exploit the mapreduce framework? Is it through the standard pig commands? and ...
    George KousiourisGeorge Kousiouris
    May 20, 2011 at 12:17 pm
    May 20, 2011 at 2:06 pm
  • Hello sirs, I'm having a bit of trouble with a pig script that loads a data range from hbase using org.apache.pig.backend.hadoop.hbase.HBaseStorage. Functionally it works great, gives me the answer ...
    Young MaengYoung Maeng
    May 14, 2011 at 2:09 pm
    May 15, 2011 at 3:19 am
  • Hello, I'm running into a weird problem that I'm hoping you can help me with. I'm basically just loading a access log, grouping, ordering and then dumping the data. I can load the log, group and ...
    May 13, 2011 at 7:37 pm
    May 13, 2011 at 11:15 pm
  • Is there any tool (or even just a good starting point in the logs) for measuring the amount of cluster used by a pig job (i.e. something that can be checked after a job was run on a cluster that was ...
    Kris CowardKris Coward
    May 12, 2011 at 1:07 am
    May 12, 2011 at 3:38 am
  • I am looking at the jython UDF function capabilities. Is it fair to say that the jython UDFs are only for filter, and aggregate functions and _not_ load/store UDFs? I am assuming so as JythonFunction ...
    Daniel EklundDaniel Eklund
    May 10, 2011 at 7:53 pm
    May 10, 2011 at 8:14 pm
  • I have a pig script that is tested and working in local mode. But when I try to run it in mapreduce mode on a non-local hadoop cluster I get an error with this stack trace: ERROR 2999: Unexpected ...
    William DowlingWilliam Dowling
    May 6, 2011 at 8:17 pm
    May 6, 2011 at 10:06 pm
  • Hello, I am curious as to how PIG implements sampling for order by: Are there things I could when storing my data which ...
    Brock NolandBrock Noland
    May 4, 2011 at 11:08 pm
    May 6, 2011 at 7:34 pm
  • 1line.txt is simply a 1 line file that has "123" as it's only line. A = LOAD '1line.txt' AS (a:chararray); B = FOREACH A GENERATE ( REGEX_EXTRACT(a,'2',1) is null ? 1 : 0 ); DUMP B; [main] ERROR ...
    Jonathan CoveneyJonathan Coveney
    May 4, 2011 at 8:24 pm
    May 4, 2011 at 8:36 pm
Group Navigation
period‹ prev | May 2011 | next ›
Group Overview
groupuser @
categoriespig, hadoop

66 users for May 2011

Dmitriy Ryaboy: 30 posts Alan Gates: 14 posts Jonathan Coveney: 14 posts Subhramanian, Deepak: 12 posts Daniel Dai: 9 posts Mark Laczin: 9 posts Jacob Perkins: 8 posts William Dowling: 7 posts Daniel Eklund: 7 posts Dexin Wang: 7 posts Renato Marroquín Mogrovejo: 7 posts Sonia gehlot: 7 posts Thejas M Nair: 6 posts Vincent: 5 posts Badrinarayanan S: 4 posts Jameson Lopp: 4 posts Kim Vogt: 4 posts Shai Harel: 4 posts Souri datta: 4 posts Xiaomeng Wan: 4 posts
show more