FAQ

Search Discussions

90 discussions - 323 posts

  • Hey guys, I am new to Pig. I was wondering is it possible to pass schema in pig load statement while loading it first time. Suppose if I have a huge dataset.. containing around 100 cols.. Is there a ...
    Praveenesh kumarPraveenesh kumar
    Feb 3, 2012 at 12:35 pm
    Feb 6, 2012 at 9:20 pm
  • Hi, I'm trying to use Pig 0.9.2 with HBase 0.93 (i.e. the latest from HBase trunk) and following the tutorial. This line loads the sample file from HDFS successfully: raw = LOAD ...
    Royston SellmanRoyston Sellman
    Feb 2, 2012 at 8:49 pm
    Feb 7, 2012 at 6:53 pm
  • This is probably easy, but my PigLatin is rusty, and I don't seem to be able to find an answer on Google. If I have a record of the form: 98812 3 {(48567859),(15996334),(15897772)} How can I flatten ...
    Eli FinkelshteynEli Finkelshteyn
    Feb 9, 2012 at 6:27 pm
    Feb 13, 2012 at 6:36 am
  • Running pig 0.9.1 in local mode, STRSPLIT doesn't seem to split on ','. I have the following data user2 hosting9 user1 hosting1,hosting2,hosting3,hosting4 user1 hosting2,hosting4,hosting5 searches = ...
    Flo LeibertFlo Leibert
    Feb 23, 2012 at 2:14 am
    Feb 29, 2012 at 8:17 am
  • Hi, Is there any mechanism of retaining state between PIG UDF invocations? Thanks Shibu Thomas MSCIS-IS Office : +91 (40) 669 32660 Mobile: +91 95811 51116
    Shibu ThomasShibu Thomas
    Feb 23, 2012 at 4:58 am
    Feb 23, 2012 at 10:12 pm
  • Consider this scenario: I have a column named City and it takes 3 possible values: A,B,C City A B C A C C I want to convert it into A B C 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 0 1 I am trying to write a ...
    Austin ChungathAustin Chungath
    Feb 20, 2012 at 7:44 pm
    Feb 21, 2012 at 4:17 pm
  • Hi, I am doing e2e testing to pig-0.9.1. Here is my reference: https://cwiki.apache.org/confluence/display/PIG/HowToTest Please give your suggestions to following questions: 1. The tests needs a ...
    Lulynn_2008Lulynn_2008
    Feb 14, 2012 at 2:17 pm
    Feb 16, 2012 at 4:21 pm
  • Hi - Is there a good way to see if a bag contains a tuple? Given a record that has a tuple T and a Bag (a bag could contain 0 to n tuples), how do I filter the records where the bag contains the ...
    Flo LeibertFlo Leibert
    Feb 13, 2012 at 5:42 pm
    Feb 14, 2012 at 7:42 am
  • Did ONERROR ever get built? I have a few bad datetimes out of many failing to parse, and I don't want my entire pig script dying because I lost a few rows. ...
    Russell JurneyRussell Jurney
    Feb 5, 2012 at 3:12 am
    Feb 7, 2012 at 3:03 am
  • Hi, I am trying to learn how can I store records in tuples ? Suppose I have a txt file $ cat tmp.txt 1,2,3,4 2,3,4,5 4,5,5,6 I am doing this $ pig A = Load 'tmp.txt' using PigStorage(',') AS ...
    Praveenesh kumarPraveenesh kumar
    Feb 2, 2012 at 9:06 am
    Feb 6, 2012 at 7:41 am
  • In my Pig script I have something like this... %default MY_SCHEMA '/user/xyz/my-schema.json'; %default MY_AVRO 'org.apache.pig.piggybank.storage.avro.AvroStorage(\'$MY_SCHEMA\')'; my_files = LOAD ...
    Something SomethingSomething Something
    Feb 3, 2012 at 6:08 am
    Feb 3, 2012 at 9:50 pm
  • I'm writing a pig script that will read a file of records and pass them to a custom EvalFunc. This EvalFunc has a side-effect; it updates data in a separate datastore. In the simplest example, my pig ...
    Stuart WhiteStuart White
    Feb 27, 2012 at 5:52 pm
    Feb 28, 2012 at 7:37 am
  • I need to write unit tests that start with raw data on HDFS and plumb all the way through to a web browser. Writing Java isn't desirable, so PigUnit isn't right. Anyone have any ideas? I'd like it to ...
    Russell JurneyRussell Jurney
    Feb 21, 2012 at 1:00 am
    Feb 22, 2012 at 12:16 am
  • I am trying to use XMLLoader to process the files but it doesn't seem to be quite working. For the first pass I am just trying to dump all the contents but it's saying 0 records found: bash-3.2$ ...
    Mohit AnchliaMohit Anchlia
    Feb 21, 2012 at 5:32 pm
    Feb 24, 2012 at 6:06 am
  • Environment: Hadoop-0.20.2: 4 nodes(1 namenode+3 datanode) ant: 1.8.2 java: sun 1.6_27 Questions: 1. what is your pig e2e test result? If all passed, please give your environment setting(ant, ...
    Lulynn_2008Lulynn_2008
    Feb 21, 2012 at 5:52 am
    Feb 24, 2012 at 5:08 am
  • Consider describe A; A: {New York: chararray, Delhi: chararray} B = foreach a generate New York; error: mismatched input 'York' expecting SEMI_COLON Is there an escape sequence for space in pig ...
    Austin ChungathAustin Chungath
    Feb 22, 2012 at 4:44 am
    Feb 22, 2012 at 10:47 am
  • Hi, I want to fetch the records from the HBase tables using pig language. Details of HBase table: HBASE TABLE NAME : sample_names COLUMN FAMILY NAME : cf COLUMN NAME : fname ROWKEY VALUES : 1,2,3,4 I ...
    ChethanChethan
    Feb 16, 2012 at 10:38 am
    Feb 17, 2012 at 7:48 am
  • hi, all here's my pig script: A = load 'input' as (b:bag{t:(x:int, y:int)}); B = foreach A generate AVG(b.x); describe B; it works well. if the b.x is char array, the problems arise: A = load 'input' ...
    Haitao YaoHaitao Yao
    Feb 15, 2012 at 6:20 am
    Feb 15, 2012 at 4:42 pm
  • I feel like the answer is that it is not safe, but I'd like to make sure. IE is the following ok, and if it is not, why not? public DataBag exec(Tuple input) throws IOException { DataBag bag = ...
    Jonathan CoveneyJonathan Coveney
    Feb 14, 2012 at 2:19 am
    Feb 14, 2012 at 7:39 pm
  • ----- Forwarded Message ----- From: jagaran das <jagaran_das@yahoo.co.in To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org Sent: Sunday, 12 February 2012 9:33 PM Subject: Hadoop ...
    Jagaran dasJagaran das
    Feb 13, 2012 at 5:37 am
    Feb 13, 2012 at 8:02 pm
  • Is it possible to specify regex expressions in FOREACH statement to generate only selected columns as specified by the regex ? Suppose I want to generate only those columns that ends with 'XYZ' , Is ...
    Praveenesh kumarPraveenesh kumar
    Feb 10, 2012 at 11:22 am
    Feb 11, 2012 at 6:29 pm
  • Hello, REGEX_EXTRACT is using Matcher.find() instead of Matcher.matches() and so does not work with some non greedy regular expression. Is it the wanted behavior? Thanks, Romain ...
    Romain RigauxRomain Rigaux
    Feb 4, 2012 at 1:30 am
    Feb 7, 2012 at 1:49 am
  • Is there a show functions equivalent in pig that can display all available functions to the user? I couldnt find this with a quick search on jira. Thanks, Aniket
    Aniket MokashiAniket Mokashi
    Feb 5, 2012 at 2:26 am
    Feb 5, 2012 at 3:01 am
  • Hi All, How do I represent hierarchical information in flat file and process it in Pig? Let’s say I have objects of type A. I want to have a Tree representation with their parent-child relationships. ...
    Prash987 prash987Prash987 prash987
    Feb 29, 2012 at 2:03 pm
    Mar 2, 2012 at 11:48 am
  • Hi All Is there solution for issue discussed here https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/e99bf7a5ebdfa6fc/6e532aad085799b9?pli=1 -- ...
    JAGANADH GJAGANADH G
    Feb 24, 2012 at 7:27 am
    Feb 24, 2012 at 6:22 pm
  • Hi, I want to issue command inside command in PIG Script. command 1 : result = load 'hbase://sample_names' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:fname','-loadKey true -gt 1 -lt ...
    ChethanChethan
    Feb 17, 2012 at 8:19 am
    Feb 21, 2012 at 10:33 am
  • So the current releases of pig are 0.8.1 and 0.9.2. However, in the apache mvn repo (and mirrored repos) there is a pig 0.8.3. I find no release on it, no svn tag for it, and no user mailing list ...
    Jeremy HannaJeremy Hanna
    Feb 17, 2012 at 8:06 pm
    Feb 21, 2012 at 7:24 am
  • Hi I need to look for a special string in a field and am using matches operator to do so. Snippet of my pig script where 'matches' is being used: input = LOAD 'test.txt' AS (field1:chararray); *A = ...
    Arun AutuchirayllArun Autuchirayll
    Feb 14, 2012 at 4:13 am
    Feb 16, 2012 at 6:06 pm
  • As I couldn't find any and was figuring them out, I wrote an example of using complex types (bags/tuples) as input and output for Jython UDFs: ...
    Russell JurneyRussell Jurney
    Feb 14, 2012 at 2:21 am
    Feb 14, 2012 at 5:03 pm
  • Hi folks, I was wondering if it's possible to submit register and run a script using PigServer in MAPREDUCE mode in an asynchronous manner; compared to how a script is executed right now whereby the ...
    Michael LokMichael Lok
    Feb 3, 2012 at 8:35 am
    Feb 11, 2012 at 12:54 am
  • Is it possible to generate maps in Pig? Is this type castable in any context? -- Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
    Russell JurneyRussell Jurney
    Feb 9, 2012 at 3:12 am
    Feb 9, 2012 at 9:02 am
  • Hi, all out data format for map is Key:Value|Key:Value , how can I load the data into map type? Can pig define the map delimiter like hive? thanks.
    Haitao YaoHaitao Yao
    Feb 6, 2012 at 7:13 am
    Feb 7, 2012 at 7:07 pm
  • Hello guys, I have a Clouderas CDH3U2 package installed on a 3 node cluster and I've added to mapred-site: <property <name mapred.compress.map.output</name <value true</value </property <property ...
    Marek MiglinskiMarek Miglinski
    Feb 1, 2012 at 8:35 am
    Feb 6, 2012 at 1:14 pm
  • Hi there, I try to retrieve the group of 'rich' userids which are not 'happy' . Something like retrieve all ids which are not in the other bags.ids. Is there a better way to exclude some rows from a ...
    Marco CadetgMarco Cadetg
    Feb 28, 2012 at 3:50 pm
    Feb 28, 2012 at 4:05 pm
  • Hi I would like to create a bag of tuples using an eval UDF. I wrote a simple eval method but when I use it pig cannot figure out the schema of the UDF's output. When I call "describe" on the output ...
    ManuManu
    Feb 25, 2012 at 8:26 pm
    Feb 26, 2012 at 7:32 am
  • I am using STORE A into 'abc'; How can I re-use 'abc' dir in second run without having to first delete that directory? Is it possible
    Mohit AnchliaMohit Anchlia
    Feb 25, 2012 at 12:55 am
    Feb 26, 2012 at 6:45 am
  • Dear, Maybe it is silly to ask, but I am always writing user defined functions for Pig in Java. Is there a way of write such UDFs in Pig Latin? Thanks. Andy
    Hao LinHao Lin
    Feb 23, 2012 at 12:40 am
    Feb 23, 2012 at 1:33 am
  • Hi - I was wondering if there was an easy way to split a bag into multiple records. Let's say I have the following: {(foo), (bar), (bam)}, 5 I'd like to generate foo,5 bar,5 bam,5 A UDF would be easy ...
    Flo LeibertFlo Leibert
    Feb 23, 2012 at 1:18 am
    Feb 23, 2012 at 1:33 am
  • I would like to remove the first two columns of data from data with varying column lengths. For example: row1: $0 $1 $2 $3 $4 row2: $0 $1 $2 row3: $0 $1 $2 $3 $4 $5 I would like to get rid of $0 and ...
    Chan, TimChan, Tim
    Feb 22, 2012 at 11:47 pm
    Feb 23, 2012 at 12:36 am
  • Hi all, I am trying to run one of the examples from the pig 0.9.2 distribution and am running into the following exception: [junit] Unable to open iterator for alias queries_limit [junit] ...
    Joe GutierrezJoe Gutierrez
    Feb 21, 2012 at 11:22 pm
    Feb 22, 2012 at 11:21 am
  • hi, all I have tens of simple pig scripts to run. While there's no parameter name collision, I merged them into a large pig script which is about 4000 lines. But the merged pig script takes pig a lot ...
    Haitao YaoHaitao Yao
    Feb 20, 2012 at 6:59 am
    Feb 21, 2012 at 1:54 am
  • I am unable to cp, ls or do anything against s3:// or s3n:// data. I am unable to LOAD it either. In their Pig, or mine. Is anyone doing this? How? -- Russell Jurney twitter.com/rjurney ...
    Russell JurneyRussell Jurney
    Feb 17, 2012 at 3:57 am
    Feb 17, 2012 at 8:19 pm
  • Hi, Please give your suggestion on resolve this error. Thank you. The error log: ERROR 2999: Unexpected internal error. Failed to create DataStorage java.lang.RuntimeException: Failed to create ...
    Lulynn_2008Lulynn_2008
    Feb 16, 2012 at 11:03 am
    Feb 17, 2012 at 6:31 am
  • Hi, I'm trying to do a pretty simple regex test in PIG right now and getting a weird error. All I'm doing is: orig_set = load '/data/dictionaries/Eng-Spa.dic' USING PigStorage('\t') AS (orig: ...
    Eli FinkelshteynEli Finkelshteyn
    Feb 16, 2012 at 5:51 pm
    Feb 16, 2012 at 6:08 pm
  • Hi - is there a way to do a fs -rmr s3://foo/bar that doesn't result in an error if the given directory doesn't exist? I can imagine this is because the underlying hadoop tool gives some return code ...
    Flo LeibertFlo Leibert
    Feb 15, 2012 at 1:19 am
    Feb 15, 2012 at 1:31 am
  • x_grp = group x by $18; How do I output the first tuple in the bag generated by the group?
    Chan, TimChan, Tim
    Feb 10, 2012 at 10:29 pm
    Feb 10, 2012 at 11:35 pm
  • Hi, In many cases I am close to be able to use a replicated join (100-150 MB of data) but it still blows up despite upping the Java heap to a few GB. e.g. A = LOAD 'bigdata' B = LOAD 'smalldata' C = ...
    Romain RigauxRomain Rigaux
    Feb 10, 2012 at 10:39 pm
    Feb 10, 2012 at 11:30 pm
  • Hello, I have a question about pig-0.9.1: Could pig-0.9.1 work with hadoop-1.0.0 and hbase-0.90.5? I planed to verify this by running UT. Please give your suggestions. Besides, I found zookeeper in ...
    Lulynn_2008Lulynn_2008
    Feb 6, 2012 at 6:37 am
    Feb 6, 2012 at 6:57 am
  • Hi, I've a bunch of [for example] apache logfiles that I'm searching through. I can process them with: logs = load 's3://bucket/directory/*' USING LogLoader as (remoteAddr, remoteLogname, user, time ...
    Ranjan BagchiRanjan Bagchi
    Feb 3, 2012 at 1:11 am
    Feb 6, 2012 at 6:30 am
  • Why am I having tuple objects in my python udfs? This isn't how the examples work. Error: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error executing function at ...
    Russell JurneyRussell Jurney
    Feb 5, 2012 at 5:40 am
    Feb 6, 2012 at 5:04 am
Group Navigation
period‹ prev | Feb 2012 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions90
posts323
users65
websitepig.apache.org

65 users for February 2012

Russell Jurney: 32 posts Dmitriy Ryaboy: 27 posts Alan Gates: 22 posts Jonathan Coveney: 20 posts Prashant Kommireddi: 20 posts Praveenesh kumar: 20 posts Lulynn_2008: 12 posts Bill Graham: 11 posts Chethan: 11 posts Haitao Yao: 11 posts Daniel Dai: 9 posts Eli Finkelshteyn: 9 posts Flo Leibert: 9 posts Austin Chungath: 8 posts Mohit Anchlia: 5 posts Norbert Burger: 5 posts Royston Sellman: 5 posts Stan Rosenberg: 5 posts Aniket Mokashi: 4 posts Chan, Tim: 4 posts
show more