Grokbase Groups Pig user August 2012

Search Discussions

72 discussions - 249 posts

  • I wrote a Pig tutorial to publish data with Mongo and Node.js. Is it possible to reblog on the Pig blog? Russell Jurney ...
    Russell JurneyRussell Jurney
    Aug 16, 2012 at 10:07 pm
    Aug 24, 2012 at 3:37 pm
  • Hi All, In TestJobSubmission, testReducerNumEstimation failed with following error information. Please give a glance to check what the problem is. Thanks. Error information: Testcase ...
    Aug 15, 2012 at 7:57 am
    Sep 13, 2012 at 9:24 am
  • Hi All, Just wanted to follow-up on Chun's question. Several of our Pig users have been experiencing slow start-ups with Pig 0.10.0, when the same script runs fine with 0.9.1. Anyone else facing ...
    Prashant KommireddiPrashant Kommireddi
    Aug 7, 2012 at 10:44 pm
    Aug 13, 2012 at 11:45 pm
  • Hi, I am processing huge dataset and need to aggregate data using on multiple levels ( columns ). for example A,B,C,D,E,F, CalculateDistinctinctOnValue1, CalculateDistinctinctOnValue2, Sum(value3) I ...
    Deepak TiwariDeepak Tiwari
    Aug 28, 2012 at 8:35 pm
    Sep 28, 2012 at 11:15 pm
  • How do I get count of all the rows? All the examples of COUNT use group by.
    Mohit AnchliaMohit Anchlia
    Aug 29, 2012 at 10:52 pm
    Sep 4, 2012 at 4:20 pm
  • Has anyone poked around to see if there is there a way to create / increment counters from a Python UDFs? Thanks. Will Duckworth Senior Vice President, Software Engineering | comScore, Inc ...
    Duckworth, WillDuckworth, Will
    Aug 17, 2012 at 2:04 pm
    Aug 27, 2012 at 9:10 pm
  • Hello! Considering the following two relations... grunt querys = load 'query' as (id:int, token:chararray); grunt dump querys (11,foo) (12,bar) (13,frog) and grunt documents = load 'document' as ...
    Mat KelceyMat Kelcey
    Aug 29, 2012 at 11:56 pm
    Aug 30, 2012 at 12:49 am
  • Hi There, What is the policy on using the Apache Blogs for projects. In the Apache Pig user mailing list we had a discussion on reposting corporate blogs on the Apache Blog for Pig and then link it ...
    Santhosh M SSanthosh M S
    Aug 23, 2012 at 2:13 am
    Aug 24, 2012 at 6:01 am
  • The input schema is: *{name:chararray, ids:chararray}*, and the format of *ids* is like: id1,id2,id3,...,idn Now, I want to split *ids* and change the input into the below format: name id1 name id2 ...
    Leon TownLeon Town
    Aug 9, 2012 at 5:38 am
    Aug 22, 2012 at 6:24 am
  • Hi, all, I got an OOME , on org.apache.hadoop.mapreduce.Reducer$Context, here's the snapshot of the heap dump: Well, does pig have to report so many data through the Reducer$Context? Can this be ...
    Haitao YaoHaitao Yao
    Aug 20, 2012 at 1:49 am
    Aug 22, 2012 at 2:20 am
  • Hi, I'm trying to do updates of records in hadoop using Pig ( I know this is not ideal but trying out POC ).. data looks like the below: *feed1:* -- here trade key is unique for each order/record -- ...
    Srinivas SurasaniSrinivas Surasani
    Aug 28, 2012 at 4:37 am
    Aug 29, 2012 at 11:05 am
  • I'm running pig 0.9.2 and seeing this: grunt describe cxels; cxels: {messageId: chararray,celstart: int,celend: int,notcellabel: chararray,notcelstart: int,notcelend: int} grunt gcxels = group cxels ...
    Lauren BlauLauren Blau
    Aug 24, 2012 at 6:29 pm
    Aug 27, 2012 at 9:14 pm
  • I run into this strange problem when try to load multiple text formatted files and convert them into avro format using pig. However, if I read and convert one file at a time in separated runs, ...
    Danfeng LiDanfeng Li
    Aug 21, 2012 at 11:38 pm
    Aug 22, 2012 at 5:43 am
  • I am trying to read records from HBase using HBaseStorage. When I execute simple load I get this error. I think I am missing some property, but I am running pig on the cluster where hadoop and hbase ...
    Mohit AnchliaMohit Anchlia
    Aug 6, 2012 at 6:32 pm
    Aug 7, 2012 at 5:46 am
  • I have the following foreach: foo := foreach bar {
    Lauren BlauLauren Blau
    Aug 30, 2012 at 10:00 pm
    Sep 4, 2012 at 8:28 pm
  • I am having trouble with bincond in pig 11. Sample input: 1234 1234 Sample pig script: a = LOAD 'input.txt' as (col1:int); b = FOREACH a GENERATE col1, (col1 == null ? 'null' : 'not-null') as col2 ...
    Alex RovnerAlex Rovner
    Aug 22, 2012 at 6:28 pm
    Aug 22, 2012 at 9:35 pm
  • Hello everyone, I am coding one pig UDF function, like MyUDF: public class MyUDF extends EvalFunc<Object { public Object exec(Tuple input) throws IOException() public Schema outputSchema(Schema ...
    Zhang JianfengZhang Jianfeng
    Aug 15, 2012 at 6:20 pm
    Aug 16, 2012 at 4:59 pm
  • Hi All, I am running pig-0.10.0 e2e test with pig-0.9.2 and hadoop-1.0.3. There are 12 tests failed with "Sort check failed" error. I list Order_6 as a example: Error info: Going to run sort check ...
    Aug 6, 2012 at 1:57 pm
    Aug 15, 2012 at 7:58 am
  • I am running basic pig script but it's failing. There is other job that loads data that works. Pig Stack Trace --------------- ERROR 2017: Internal error creating job configuration ...
    Mohit AnchliaMohit Anchlia
    Aug 11, 2012 at 1:01 am
    Aug 14, 2012 at 12:18 am
  • When I load a range of data from HBase simply using row key range in HBaseStorageHandler, I find that the speed is acceptable when I'm trying to load some tens of millions rows or more, while the ...
    Aug 28, 2012 at 6:50 am
    Sep 4, 2012 at 11:55 am
  • Hi all, I'm trying to execute the following pig script with pig-0.10.0 and yarn (cdh4.0.0): -- DEFINE AvroStorage; loaded_data = LOAD '$input' ...
    Johannes SchwenkJohannes Schwenk
    Aug 23, 2012 at 3:49 pm
    Sep 3, 2012 at 11:17 am
  • Hi, I'm trying to load the following records using PigStorage(',') as (val:int, m:map[ ])... When I see dump of output I only get first column and empty value.. 151364,[id#812,pref#secondary] ...
    Aug 30, 2012 at 1:50 pm
    Aug 30, 2012 at 5:45 pm
  • Hi there, I do have some user session which look something on the following lines: id:chararray, start:long(unix timestamp), end:long(unix timestamp) xxx,1,3 xxx,4,7 yyy,1,2 yyy,5,7 zzz,6,7 zzz,7,10 ...
    Marco CadetgMarco Cadetg
    Aug 30, 2012 at 8:01 am
    Aug 30, 2012 at 4:03 pm
  • I'm getting an error instantiating HBaseStorage ONLY when run on a cluster. Running in local mode with -x local does not produce the error and my pig script runs successfully and the data is properly ...
    Dan TherrienDan Therrien
    Aug 25, 2012 at 4:03 am
    Aug 29, 2012 at 5:47 pm
  • I want to match up tuples from 2 relations. For each key, the 2 relations will always have the same number of tuples and match by position (the first tuple in each are a match, the second tuple in ...
    Lauren BlauLauren Blau
    Aug 14, 2012 at 9:56 am
    Aug 23, 2012 at 8:44 pm
  • I'm having problems with understanding storage structures. Here's what I did: on the cluster I loaded some data and created a relation with one row. I output the row using store relation into '/file' ...
    Lauren BlauLauren Blau
    Aug 15, 2012 at 11:44 am
    Aug 21, 2012 at 12:14 pm
  • I have a big, e.g. A: {(name: chararray,age: int)}, I wrote a udf which adds 1 more field in the tuple inside the bag. E.g. B: {(name: chararray,age: int, rank:int)}. Because the number of fields in ...
    Danfeng LiDanfeng Li
    Aug 13, 2012 at 10:43 pm
    Aug 14, 2012 at 1:09 am
  • Hi, I'm seeking for a way to load data from a SQL database. Something like: It seems that the SQLLoader (yet mentioned in the Wiki) no longer exists. Sqoop seems to be a good solution, but it is not ...
    Vincent BaratVincent Barat
    Aug 8, 2012 at 1:31 pm
    Aug 10, 2012 at 1:39 pm
  • Hi All, Is there anybody who tried to run org.apache.pig.Main in Eclipse on windows OS? Can we run pig project in Eclipse on windows OS? If yes, how to run? Thanks.
    Aug 10, 2012 at 7:14 am
    Aug 10, 2012 at 12:13 pm
  • Hi everyone, we are planing to put our aggregations result into an external data base. To handle a connection failure to that external resource properly we currently store the result onto the hdfs ...
    Markus ReschMarkus Resch
    Aug 23, 2012 at 11:38 am
    Sep 10, 2012 at 7:32 am
  • Hi all, I have a bag, clickstreams: {clickStream: {pageName: chararray}}, for which each row represents a sequence of pages and events in a single session on a website. The interior bag, clickstream, ...
    Steve BernsteinSteve Bernstein
    Aug 29, 2012 at 11:28 pm
    Aug 30, 2012 at 4:23 pm
  • I have this simple pig script but when I run I get: 2012-08-28 17:50:24,924 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2012-08-28 ...
    Mohit AnchliaMohit Anchlia
    Aug 29, 2012 at 12:53 am
    Aug 29, 2012 at 6:51 am
  • hi, all I want to add GeoIP.dat to my pig scripts. Does Pig have the "add file XXX" command like hive? I want to distribute the data file GeoIP.dat with Pig. Or is there any other work around? I ...
    Haitao YaoHaitao Yao
    Aug 28, 2012 at 7:21 am
    Aug 28, 2012 at 6:44 pm
  • I am trying to use a parameter as the expression in a filter. Assuming: colors_in = load ‘$in_path’ as (color:chararray); flt = filter colors_in by color == ‘blue’ or color == ‘green’; I would like ...
    Duckworth, WillDuckworth, Will
    Aug 27, 2012 at 8:50 pm
    Aug 28, 2012 at 6:34 am
  • Hi there, What is the best way to retrieve duplicates from a bag. I basically would like to do something like the opposite of DISTINCT. A: {userid: long,foo: long,bar: long} dump A (1,2,3) (1,2,3) ...
    Marco CadetgMarco Cadetg
    Aug 24, 2012 at 9:35 am
    Aug 24, 2012 at 10:26 am
  • Hi, is there anyway to project the last field of a tuple (when you don't know how many fields there are) without creating a UDF? Thanks, Fabian
    Fabian AleniusFabian Alenius
    Aug 23, 2012 at 8:53 am
    Aug 23, 2012 at 8:53 pm
  • I'm a new-ish pig user querying data on an hbase cluster. I have a question about accumulator-style functions. When writing an accumulator-style UDF, is all of the data shipped to a single machine ...
    Benjamin SmedbergBenjamin Smedberg
    Aug 13, 2012 at 4:06 pm
    Aug 13, 2012 at 11:51 pm
  • Greetings, I am new to pig. I am trying to get to know it on a laptop with hadoop 20.2 installed in local mode. I have prior experience with hadoop, but I figure my error is so weird I blew the pig ...
    Jeremiah roundsJeremiah rounds
    Aug 13, 2012 at 9:49 pm
    Aug 13, 2012 at 11:30 pm
  • Hi Users, Im new to pig, Can anyone provide pig installation & learning material web links. Thanks, Prabhu.
    Prabhu kPrabhu k
    Aug 11, 2012 at 3:13 am
    Aug 11, 2012 at 3:54 am
  • hi, all I got this while running pig script: 997: Unable to recreate exception from backend error: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory ...
    Haitao YaoHaitao Yao
    Aug 10, 2012 at 2:43 am
    Aug 10, 2012 at 6:52 pm
  • Hi - I've been having problems with running Pig(0.9.2) scripts with HBase(0.92.1) as source and target. I'm running a 3 node cluster. Node 1 has the NameNode, JobTracker, Zookeeper server and HBase ...
    Hari PrasannaHari Prasanna
    Aug 6, 2012 at 7:20 am
    Aug 7, 2012 at 5:18 am
  • Hi All, I can not run pig command with "-e" parameter, could you please help to figure out what the problem is? Thanks. ./pig -x local -e "a = load '/user/pig/tests/data/singlefile/studenttab10k';" ...
    Aug 7, 2012 at 12:50 pm
    Sep 26, 2012 at 1:21 pm
  • HI, I was wondering if it is possible validate records by checking the tuple length. I expect every record to have 14 fields, but some records might be corrupt. I want to filter those out . I tried ...
    Sam WilliamSam William
    Aug 30, 2012 at 7:40 pm
    Aug 30, 2012 at 8:19 pm
  • am new to hadoop and all its derivatives. And I am really getting intimidated by the abundance of information available. But one thing I have realized is that to start implementing/using hadoop or ...
    Mohit SinghMohit Singh
    Aug 30, 2012 at 4:21 am
    Aug 30, 2012 at 4:51 am
  • Siddharth TiwariSiddharth Tiwari
    Aug 20, 2012 at 7:43 am
    Aug 25, 2012 at 4:06 am
  • if I group records into a huge bag, and hand over to a Udf, would the input tuple actually create a bag with all the records? that way it may generate a OOM ?? if indeed there is such an issue, I ...
    Aug 24, 2012 at 2:01 am
    Aug 24, 2012 at 4:38 am
  • Hello. We are starting to use pig for our data analysis. To be exact, actual work will be performed by amazon elastic map reduce. That's why we are using 0.9.2 for now. Everything works more or less ...
    Віталій ТимчишинВіталій Тимчишин
    Aug 15, 2012 at 10:49 am
    Aug 22, 2012 at 4:57 pm
  • I used pig to do some ETL job, but met with a strange bug of the built-in REPLACE function. After I replace '[' with '' in '[02/Aug/2012:05:01:17' , the whole string just went blank. Here I posted ...
    Aug 17, 2012 at 9:06 pm
    Aug 18, 2012 at 12:05 am
  • H < I All, I am using pig 0.10.0 I want to parse a json file and I am having the following error: raws-events = load 'file:/Users/joao.salcedo/Cloudera/test/test.json' using ...
    Joao SalcedoJoao Salcedo
    Aug 16, 2012 at 6:53 pm
    Aug 17, 2012 at 8:11 am
  • Cross posting in hopes a user has this working... Has anyone gotten JavaScript UDFs working in pig 0.10.0? The hello world example doesn't work. I added debug to the code, and the rhino class doesn't ...
    Russell JurneyRussell Jurney
    Aug 15, 2012 at 4:46 pm
    Aug 15, 2012 at 7:10 pm
Group Navigation
period‹ prev | Aug 2012 | next ›
Group Overview
groupuser @
categoriespig, hadoop

64 users for August 2012

Jonathan Coveney: 22 posts Lulynn_2008: 15 posts Alan Gates: 14 posts Cheolsoo Park: 14 posts Dmitriy Ryaboy: 13 posts Russell Jurney: 12 posts Lauren Blau: 11 posts Mohit Anchlia: 11 posts Haitao Yao: 10 posts Bill Graham: 9 posts Santhosh M S: 9 posts Danfeng Li: 6 posts Duckworth, Will: 5 posts Mat Kelcey: 5 posts Subir S: 5 posts Chun Yang: 4 posts Marco Cadetg: 4 posts Prashant Kommireddi: 4 posts Srinivas Surasani: 4 posts Steve Bernstein: 4 posts
show more