Grokbase Groups Pig user October 2010

Search Discussions

46 discussions - 251 posts

  • Hey folks, I've been trying HBaseStorage 0.8.0 trunk with hbase-0.89 and it does not seem to work. It gets stuck at: [...] 2010-10-13 14:58:44,064 [Thread-4] INFO org.apache.zookeeper.ClientCnxn - ...
    George StathisGeorge Stathis
    Oct 13, 2010 at 8:09 pm
    Oct 14, 2010 at 5:15 pm
  • Hi All, I'm quite new to Pig/Hadoop. So maybe my cluster size will make you laugh. I wrote a script on Pig handling 1.5GB of logs in less than one hour in pig local mode on a Intel core 2 duo with ...
    Oct 8, 2010 at 6:53 am
    Oct 8, 2010 at 6:31 pm
  • Hi all! I am struggling to find a working solution to load data from HBase directly. I am using Cloudera CDH3b3 which comes with Pig 0.7. What would be the easiest way to load data from HBase? If it ...
    Oct 25, 2010 at 2:01 pm
    Oct 29, 2010 at 3:46 pm
  • I propose that we adopt the bylaws proposed at as the bylaws for the Pig project. In a self referential use of these bylaws I further propose that this vote ...
    Alan GatesAlan Gates
    Oct 7, 2010 at 4:23 pm
    Oct 15, 2010 at 8:23 pm
  • All, I have a solution for writing unit test in Java to test pig scripts including stats and output if anyone is interested.
    Dave WellmanDave Wellman
    Oct 20, 2010 at 4:19 pm
    Oct 20, 2010 at 11:32 pm
  • My PIG script that is roughly like this: A = LOAD input1 USING JsonLoader AS (x:map[]); B = LOAD input2 USING JsonLoader AS (x:map[]); A = FOREACH A GENERATE x, x#'item' AS item:chararray; B = ...
    Rakesh kothariRakesh kothari
    Oct 21, 2010 at 6:30 am
    Oct 25, 2010 at 4:10 pm
  • Hi, If I have bags that have a dynamic number of fields that look something like this: ("park", "building", "office") ("store", "school") ("building", "school", "restaurant", "hotel) Is it possible ...
    Kim VogtKim Vogt
    Oct 14, 2010 at 5:30 pm
    Oct 15, 2010 at 6:05 pm
  • Hi again! :) I am trying to run Pig on a local machine, but I want it to connect to a remote cluster. I can't make it use my settings - whatever I do, I get this: ----- $ pig -x mapreduce 10/10/16 ...
    Oct 16, 2010 at 8:53 pm
    Oct 25, 2010 at 5:57 am
  • Hi, I'm currently working on a simple Cassandra Loader that reads an index and then works on that data. Now whenever I try to work on the retrieved data I get a strange error: ...
    Christian DeckerChristian Decker
    Oct 12, 2010 at 7:38 pm
    Oct 15, 2010 at 10:57 am
  • Hey guys - I have a script that loads a list of ~800,000 category hierarchies, filters them a bit and streams them through a PHP script for some quick procedural work. The file contains one column ...
    Rob WilkersonRob Wilkerson
    Oct 1, 2010 at 11:33 am
    Oct 1, 2010 at 4:57 pm
  • Hi Pig Users, I am currently writing a UDF loader. In one of my use case, one line in the input stream results in multiple tuples. Has anyone encounter or solve this issue on their end. The current ...
    John HuiJohn Hui
    Oct 27, 2010 at 10:39 pm
    Oct 28, 2010 at 3:52 pm
  • Hello, I face an issue with PIG temporary files: they are not deleted once a job is terminated. I got my HDFS storage full of PIG temporary files. I use PIG from Java using a PigServer object. Is ...
    Vincent BaratVincent Barat
    Oct 23, 2010 at 11:30 am
    Nov 29, 2010 at 9:59 pm
  • What's the best way to do something like this in PIG: JOIN A with B where (A.property1 = B.property1 OR A.property2 = B.property2) ? Thanks, -Rakesh
    Rakesh kothariRakesh kothari
    Oct 18, 2010 at 12:03 am
    Oct 27, 2010 at 9:20 pm
  • Hello, I want to set more heap space to my scripts, but I can't make Pig support this, when I call: "pig" It fails (just prints help), and option --help doesn't show ...
    Wojciech LangiewiczWojciech Langiewicz
    Oct 22, 2010 at 4:17 pm
    Oct 25, 2010 at 4:09 pm
  • Hi, I have a pig script that needs certain parameters (passed using "-p" in pig shell) to execute. Is there a way to pass these parameters if I want to execute this script using "PigServer" after ...
    Rakesh kothariRakesh kothari
    Oct 7, 2010 at 6:47 pm
    Oct 8, 2010 at 5:05 pm
  • Hi there, I have some doubts about zebra usage. The thing is that all my data is already in HDFS, and want to use the zebra storers and loaders, but I don't want to reprocess all my data just to get ...
    Renato Marroquín MogrovejoRenato Marroquín Mogrovejo
    Oct 24, 2010 at 8:15 pm
    Oct 28, 2010 at 3:51 am
  • Hi, What's the best way to diagnose which M/R step PIG is executing ? I was hoping if name of the PIG job can have some relationship with the operator it is executing. It gets difficult to diagnose ...
    Rakesh kothariRakesh kothari
    Oct 27, 2010 at 9:25 pm
    Oct 28, 2010 at 12:41 am
  • I've seen a few threads about counters, PigStats, Elephant-Bird's stats utility class, etc. ...
    Josh DevinsJosh Devins
    Oct 17, 2010 at 4:15 pm
    Oct 18, 2010 at 6:15 pm
  • Hi, Couple of Questions: 1. What's the best way to get "Counters" out of a Pig Job execution ? (e.g. Counters object exposed by "org.apache.mapreduce.hadoop.Job") 2. Is there anything special needs ...
    Rakesh kothariRakesh kothari
    Oct 7, 2010 at 11:48 pm
    Oct 8, 2010 at 8:26 pm
  • Hi all! I'm trying to create a replacement for Pig shell - for running Pig batches from within our control program (dashboard). I'm having problems with calling Pig and would appreciate some help. I ...
    Oct 16, 2010 at 5:58 pm
    Oct 16, 2010 at 7:08 pm
  • Hi, I came across this patch ( which supports multifile input format from Pig 0.8 version on wards. A patch is also available for Pig 0.7. I was ...
    Uppuluri, RohiniUppuluri, Rohini
    Oct 29, 2010 at 3:46 pm
    Oct 30, 2010 at 5:28 am
  • Hi! I hope this is not too newbie question, but it's driving me crazy... How do you count the records in a relation? Like DUMP, but instead of list of records, I would like their count. Thanks, Anze
    Oct 29, 2010 at 11:01 am
    Oct 29, 2010 at 12:44 pm
  • Hi all,Facing a weird problem and wondering if anyone has run into this before. I've been playing with PigServer to programmatically run some simple pig scripts and it does not seem to be connecting ...
    Zach BaileyZach Bailey
    Oct 27, 2010 at 11:26 pm
    Oct 28, 2010 at 5:53 am
  • Hello, I am having an error that is driving me crazy. Any help will be appreciated. First, I have configured hadoop and hdfs according to this tutorial (I did not created an account hadoop, used mine ...
    Ruth GarciaRuth Garcia
    Oct 25, 2010 at 3:53 pm
    Oct 25, 2010 at 4:17 pm
  • Hi, I met a headache about using UDFs with many dependence, adding them using register command is very painful and not extensible. I can make self-contained jar for hadoop job using maven (a jar with ...
    Yong-gang CaoYong-gang Cao
    Oct 22, 2010 at 12:09 am
    Oct 22, 2010 at 3:28 pm
  • Hi everybody, I'm trying to use vanilla Pig 0.7.0 to generate monthly consolidations of log files with relatively long lines: 95 fields and growing, of which I'll be using just 7. Just so I didn't ...
    Marcos Medrado RubinelliMarcos Medrado Rubinelli
    Oct 20, 2010 at 2:28 pm
    Oct 22, 2010 at 2:30 am
  • Hi, Our data contain tuples one of whose fields is a tuple containing a bag field and we've seen the following exceptions when we access the bag field: java.lang.ClassCastException: ...
    Lin GuoLin Guo
    Oct 12, 2010 at 10:38 pm
    Oct 14, 2010 at 10:54 pm
  • Hey everyone! In the Pig Journal page ( says something about getting statistics for Pig's optimizer. Is there any work being done on that? Or are there any other ...
    Renato Marroquín MogrovejoRenato Marroquín Mogrovejo
    Oct 14, 2010 at 2:47 am
    Oct 14, 2010 at 6:32 pm
  • I'm trying to count N-gram occurrences as a percentage of total tuples, and I'm running into a problem that I assume has a simple solution I'm not thinking of. My script basically looks like: log = ...
    Mark StetzerMark Stetzer
    Oct 8, 2010 at 7:47 pm
    Oct 11, 2010 at 7:04 pm
  • I have a simple schema that contains an inner bag. What I need to essentially do is that for each tuple in the inner bag, I need to create a new tuple in a new outer bag. This is easier shown than ...
    Josh DevinsJosh Devins
    Oct 8, 2010 at 4:13 pm
    Oct 8, 2010 at 9:00 pm
  • Hi, I need to integrate existing pig scripts to java application. the scripts have been run in command line. I'm wondering how to do this? I just want to run pig file(*.pig) in java code. Any advice ...
    Oct 7, 2010 at 1:51 am
    Oct 7, 2010 at 7:41 am
  • I have seen an pig error reported in a .processed file. I have not been able to find the documentation about what a .processed file is. Is it akin to a .substituted file?
    Dave WellmanDave Wellman
    Oct 28, 2010 at 9:06 pm
    Oct 28, 2010 at 9:43 pm
  • Hi All, Is there a need/mechanism to report progress in a storage UDF? Thanks in advance. - Sandesh
    Sandesh DevarajuSandesh Devaraju
    Oct 27, 2010 at 4:41 pm
    Oct 28, 2010 at 4:25 pm
  • Hi, I have this pig script. 1 data = LOAD '$INPUT' USING PigStorage(',') AS (app:chararray, user:chararray , timestamp:int, duration:int); 2 3 appUserIn = FOREACH data GENERATE app, user; 5 ...
    John HuiJohn Hui
    Oct 20, 2010 at 8:13 pm
    Oct 20, 2010 at 9:32 pm
  • I need to use the output of one alias in a future calculation: Suppose I have: C=(5) and then later, I have G=(1,A) (3,B) (5,C) then I want to do a foreach on G where I multiply each G.$0 by C.$0, ...
    Matt TanquaryMatt Tanquary
    Oct 20, 2010 at 7:41 pm
    Oct 20, 2010 at 7:56 pm
  • Hi, Does anyone know if Bash shell works with Pig streaming the same way as Python? I've been struggling with it without success. Here is the bash code ( #!/usr/bin/env bash command ...
    Alex WangAlex Wang
    Oct 18, 2010 at 10:22 pm
    Oct 18, 2010 at 10:34 pm
  • $cat a.out [key1#val1,key2#val2]*[key3#[val31,val32]] grunt a = load 'a.out' using PigStorage('*') AS (A:[], B:[]); grunt dump a; here is the output : ([key2#val2,key1#val1],) I guess it is not ...
    Prasenjit mukherjeePrasenjit mukherjee
    Oct 13, 2010 at 6:06 am
    Oct 14, 2010 at 10:39 pm
  • What would be the right syntax to stream through a python script? This does not seem to work, as pig complains about the syntax: DEFINE force_layout `` SHIP ...
    Sal UryasevSal Uryasev
    Oct 12, 2010 at 2:04 am
    Oct 12, 2010 at 10:04 pm
  • I have a python script defined as import sys for line in sys.stdin: if not line: break sys.stdout.write(line) my data test looks like ...
    Felix gaoFelix gao
    Oct 7, 2010 at 1:09 am
    Oct 8, 2010 at 7:11 pm
  • anyone ever read a pig output file with bags/tuples into a java map reduce program?
    Corbin HoenesCorbin Hoenes
    Oct 7, 2010 at 7:04 pm
    Oct 8, 2010 at 1:30 am
  • Assume that I would like to write this pig script: REGISTER myudfs.jar; A = LOAD 'hist_data' AS (id: chararray, word: chararray, count : float ); B = GROUP A BY id C = CROSS B, B D = FOREACH C ...
    Paolo D'albertoPaolo D'alberto
    Oct 22, 2010 at 9:49 pm
    Oct 22, 2010 at 9:49 pm
  • Hi, It is my pleasure to welcome Corinne Chandel as our newest Pig committer. Corinne has been responsible for documentation for all Pig releases 0.3.0 and later. We are very happy to have her on ...
    Olga NatkovichOlga Natkovich
    Oct 22, 2010 at 6:56 pm
    Oct 22, 2010 at 6:56 pm
  • Hi, I haven't been able to stream pig data to a command line script, can someone help out? I want to execute a command line script called GMTFilter (all stdin, stdout, and stderr work) from a pig ...
    Alex WangAlex Wang
    Oct 20, 2010 at 7:14 pm
    Oct 20, 2010 at 7:14 pm
  • Wrong list...
    Anthony UrsoAnthony Urso
    Oct 15, 2010 at 1:13 am
    Oct 15, 2010 at 1:13 am
  • Anyone have any pointers on how to test against ZK outside of the source distribution? All the fun classes (e.g. ClientBase) do not make it into the ZK release jar. Right now I am manually running a ...
    Anthony UrsoAnthony Urso
    Oct 15, 2010 at 1:12 am
    Oct 15, 2010 at 1:12 am
  • I'm using Pig 0.6.0 and a fix for bug PIG-619 is causing a performance issue with some of my Jobs. In Pig 0.3.0 a fix was added to create an empty slice for any file with a zero file length. In some ...
    Robert GoodmanRobert Goodman
    Oct 12, 2010 at 12:39 am
    Oct 12, 2010 at 12:39 am
Group Navigation
period‹ prev | Oct 2010 | next ›
Group Overview
groupuser @
categoriespig, hadoop

67 users for October 2010

Dmitriy Ryaboy: 31 posts Jeff Zhang: 20 posts Anze: 14 posts Rakesh kothari: 13 posts Alan Gates: 9 posts George P. Stathis: 9 posts Olga Natkovich: 9 posts Thejas M Nair: 9 posts Vincent: 9 posts Josh Devins: 7 posts Renato Marroquín Mogrovejo: 7 posts John Hui: 5 posts Kim Vogt: 5 posts Romain Rigaux: 5 posts Ashutosh Chauhan: 4 posts Christian Decker: 4 posts Dave Wellman: 4 posts Gerrit van Vuuren: 4 posts Guy Bayes: 4 posts Rob Wilkerson: 4 posts
show more