Grokbase Groups Pig user May 2010

Search Discussions

53 discussions - 205 posts

  • I am trying the SQL ³NVL(city, ŒU¹) city² in pig I am using the bincond operator, ³(city is null?'U': city) AS city², which is of chararray type, the result file shows Œ\N¹ instead of U. Any ideas ?
    Syed WastiSyed Wasti
    May 16, 2010 at 8:16 pm
    May 17, 2010 at 9:08 pm
  • At the Bay Area HUG on Wednesday someone (Eli I think, though I might be remembering incorrectly) asked if there was a migration guide for moving Pig load and store functions from 0.6 to 0.7. I said ...
    Alan GatesAlan Gates
    May 21, 2010 at 6:34 pm
    Jun 18, 2010 at 7:48 pm
  • Hi, anyone has seen this error before? normally our script runs fine, but sometime recently it began to throw this exception. also usually it will go away if I rerun it. Caused by: ...
    Yonggang QiaoYonggang Qiao
    May 17, 2010 at 8:58 pm
    May 19, 2010 at 8:05 pm
  • I have this working , so seeking validation and corrections. We have SequentialFiles with various CustomWritables in hadoop and we want to able to work with them from within pig I have taken ...
    Vishal SantoshiVishal Santoshi
    May 24, 2010 at 4:42 pm
    May 24, 2010 at 9:09 pm
  • To: Olga Natkovich Subject: Re: SpillableMemoryManager - low memory handler called I have attached the script... please let me know if you have more questions.
    Olga NatkovichOlga Natkovich
    May 6, 2010 at 9:18 pm
    Oct 19, 2011 at 7:00 pm
  • Hi, Is there any operator or UDF in Pig similar to the IN operator of SQL? Specifically, given a large bag A and a very small single-column bag B, I want to select tuples in A with a field a1 that ...
    May 31, 2010 at 10:03 am
    Jun 3, 2010 at 10:28 am
  • Hello, I am trying to create a full address and full location field in Pig by combining multiple fields. file = LOAD 'file.txt' USING PigStorage() AS (house:chararray, predir:chararray, ...
    Scott WineScott Wine
    May 12, 2010 at 11:00 pm
    May 14, 2010 at 11:11 pm
  • Hi all, Is it possible to specify multiple HDFS directories in 'Load' function. Ex: raw = LOAD '/input_data/dir1', '/input_data/dir2', '/input_data/dir3' USING PigStorage ('\t') AS (.....); Thanks, ...
    Katukuri, JayKatukuri, Jay
    May 7, 2010 at 10:53 pm
    May 11, 2010 at 12:41 am
  • Hello I need some help to get started with using Pig UDF. I have time series data (time, magA, errA, magB, errB) e.g. (2345.59777,19.875,0.481,20.225,0.482) (2347.59568,19.371,0.3,20.227,0.743) ...
    Asif JanAsif Jan
    May 28, 2010 at 2:50 pm
    Jun 1, 2010 at 8:29 pm
  • Hi, Is there a way to read a collection (of unknown size) of tab-delimited values into a single data type (tuple?) during the LOAD phase? Here's specifically what I'm looking to do. I have a given ...
    Bill GrahamBill Graham
    May 19, 2010 at 10:37 pm
    May 20, 2010 at 9:29 pm
  • Not sure I am clean on how I can debug stuff on a cluster. I currently have a long running reducer that attempts to run 4 times before finally giving up I get 4 of these: Task ...
    Corbin HoenesCorbin Hoenes
    May 11, 2010 at 6:10 pm
    May 14, 2010 at 5:41 am
  • Hi all, I am new to Pig/Hadoop and I am trying to figure out how I can merge two (or more) input files, based on the value in one of the data fields. E.g. from the below input files (INPUT 1 and ...
    Mads MoellerMads Moeller
    May 11, 2010 at 4:50 am
    May 11, 2010 at 6:26 am
  • I have a bunch of grouped datasets that I need to union and store. When I union them, they lose their schema. I need the schema for my output storage function to work. How do I recreate my a schema ...
    Russell JurneyRussell Jurney
    May 7, 2010 at 12:23 am
    May 7, 2010 at 10:23 pm
  • Hi, I am new to Hadoop and Pig Latin Language. I am trying to convert the below Hive QL to Pig Latin. Any suggestions please. INSERT OVERWRITE TABLE A SELECT id, org_type, dept_type, cnt, ...
    Syed WastiSyed Wasti
    May 4, 2010 at 11:37 pm
    May 6, 2010 at 4:50 pm
  • Does anyone know if rev 909116 can be applied to pig 0.6.0 or is it dependent on 0.7.0? Alan?
    Corbin HoenesCorbin Hoenes
    May 14, 2010 at 10:55 pm
    May 17, 2010 at 5:13 pm
  • Hi, Trying to make my script execute happily, it wont stop throwing errors. It works if I don¹t group my data. But once I group, It starts with; ERROR 1000: Error during parsing. Invalid alias: id in ...
    Syed WastiSyed Wasti
    May 7, 2010 at 4:33 am
    May 7, 2010 at 6:05 pm
  • At an intermediate point in my processing, I have these tuples: DUMP X; (A,1L,1L) (A,2L,2L) (A,3L,6L) (A,5L,1L) The middle element of these tuples can have any integer value from 1-5, and the third ...
    Greg LangmeadGreg Langmead
    May 5, 2010 at 9:08 pm
    May 6, 2010 at 9:25 pm
  • Hi Piggers - Seeing an issue with a particular script where our job is taking 6hrs 42min to complete. syslogs are showing loads of these: INFO : org.apache.pig.impl.util.SpillableMemoryManager - low ...
    Corbin HoenesCorbin Hoenes
    May 6, 2010 at 5:31 pm
    May 6, 2010 at 9:05 pm
  • Hi all: As indicated in the sigmod'09 paper, there is an "ILLUSTRATE" command from the Pig Shell to generate example data for dataflow programs. But I did not find that either in the tutorial page ...
    Sai ZhangSai Zhang
    May 25, 2010 at 5:13 pm
    May 25, 2010 at 6:44 pm
  • Hi, I often get this error message when executing a Join over big data (~ 160 GB): "Task attempt failed to report status for 602 seconds. Killing!" The job finally finishes but a lot of reduce tasks ...
    Alexander SchätzleAlexander Schätzle
    May 20, 2010 at 8:09 am
    May 20, 2010 at 3:01 pm
  • Right now I have a pig script to rollup timeseries data, The current format of the data is in the following tab separated value list. ts service-uuid service-name type value So the first step is to ...
    Dan Di SpaltroDan Di Spaltro
    May 7, 2010 at 5:14 am
    May 14, 2010 at 9:20 pm
  • okay, I have to blow some steam here, did you know that if describe A; A: {id: int, bad: (a: int,b: int,z: int)} and I do B = foreach A generate id, FLATTEN(bad) as c; That this would actually run ...
    Hc busyHc busy
    May 6, 2010 at 6:24 am
    May 7, 2010 at 1:05 am
  • With Pig 0.6, as per , I was able to write to side-files. However, I am unable to find an obvious way to ...
    Sandesh DevarajuSandesh Devaraju
    May 5, 2010 at 5:48 pm
    May 5, 2010 at 7:16 pm
  • Hey all, I've seen this question asked on the mailing list in the past, but not recently. Does anyone know of a way to read data from HDFS within a UDF? I saw some discussion about a year ago that ...
    Mark StetzerMark Stetzer
    May 28, 2010 at 10:04 pm
    Jun 15, 2010 at 4:04 pm
  • I am tryign this load statement to load the following data : r = load 'tmp.dat' AS (f1:int, f2:int, B: bag { T: tuple (g1:int, g2:int) }); 1,2,{(3,4),(5,6)} 1,2,{(3,4),(5,6),(10,12)} 7,8,{(3,4)} But ...
    Prasenjit mukherjeePrasenjit mukherjee
    May 26, 2010 at 5:48 pm
    May 27, 2010 at 3:32 am
  • Hi all, in the exec(Tuple input) method of an EvalFunc I get the tuple to be processed by the Eval Function. Is there any secure possibility to get the Aliases of the fields in the input Tuple? My ...
    Alexander SchätzleAlexander Schätzle
    May 22, 2010 at 4:11 pm
    May 24, 2010 at 9:45 pm
  • I keep seeing this warning message while running my scripts, is this a concern ? Any info please. How can I get rid of this ? WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for ...
    Syed WastiSyed Wasti
    May 11, 2010 at 12:13 am
    May 12, 2010 at 4:33 pm
  • Hello, I am new to Hadoop, Pig and have just been reading whatever I could lay my hands on. If I needed to sort a dataset using Pig is just the ORDER syntax sufficient? For eg here is what I came up ...
    Vijay RaoVijay Rao
    May 10, 2010 at 7:17 pm
    May 10, 2010 at 7:49 pm
  • HI all, I had a Pig script that worked completely fine. I called a memory intensive UDF that brought some 600 MB data into each mapper. However, I was able to process and write results. My mapper ...
    Kelvin MossKelvin Moss
    May 7, 2010 at 8:56 am
    May 10, 2010 at 8:13 am
  • This might be well known already, but I just got kicked in the behind by a temporary file that I generate and load that starts it's file name with a period. Apparently PigStorage will not load any ...
    Hc busyHc busy
    May 9, 2010 at 7:03 am
    May 9, 2010 at 12:54 pm
  • Hi, some months ago Paul propose this script: log = LOAD 'request*' AS (ts:chararray, vid:chararray, rh:chararray, rid:chararray); group_req = GROUP log BY vid PARALLEL 12; group_sort_req = FOREACH ...
    Jordi Deu-PonsJordi Deu-Pons
    May 6, 2010 at 10:30 am
    May 6, 2010 at 11:13 am
  • Hi PIG users! I was reading about PIG and PNUTS and started wondering how this two are related. I mean are there any application where these technologies are used together? Or any project on how ...
    Renato Marroquín MogrovejoRenato Marroquín Mogrovejo
    May 28, 2010 at 3:29 pm
    May 28, 2010 at 4:30 pm
  • Hi, I’m porting our Load/Store funcs from pig 0.6 to 0.7. Currently we’re storing data in serialized binary JSON. The format requires that the meta data for the schema is stored in the header of the ...
    Richard ParkRichard Park
    May 26, 2010 at 2:08 am
    May 26, 2010 at 2:29 am
  • Hi, does anybody know where to find the source code of the flatten() function of Pig? I can't find it in the package "org.apache.pig.builtin" of pig.jar. Thx in advance, Alex
    Alexander SchätzleAlexander Schätzle
    May 22, 2010 at 10:48 am
    May 24, 2010 at 4:24 pm
  • Hi all, I'm playing around with pig 0.7 and am getting an error trying to "register file:/piggybank.jar". I built piggybank.jar successfully from the contrib folder, but I'm getting this error when I ...
    Kim VogtKim Vogt
    May 20, 2010 at 5:24 pm
    May 20, 2010 at 5:36 pm
  • I get this error message: 2010-05-18 16:20:30,490 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:8020 2010-05-18 ...
    Brian DonaldsonBrian Donaldson
    May 18, 2010 at 11:23 pm
    May 18, 2010 at 11:36 pm
  • It doesn't surprise me, but the fact that it doesn't scream an error or a very loud warning is annoying. consider this sequence of changes timestamp 1: describe A; A: {id: int, bad: (a: int,b: int)} ...
    Hc busyHc busy
    May 6, 2010 at 7:30 pm
    May 6, 2010 at 7:35 pm
  • Hi, where can I find informations about how the Logging of PIG works? Does PIG generate Log-Files like how long the execution of a query took and where can I find them? Thx in advance. Alex
    Alexander SchätzleAlexander Schätzle
    May 4, 2010 at 12:48 pm
    May 5, 2010 at 5:49 pm
  • Hello all, I've reached an impasse in my attempts to learn Pig Latin. When running my script in local mode I get the results I expect. However, when I the same script in mapreduce mode the resulting ...
    Mark ChurchMark Church
    May 1, 2010 at 3:10 pm
    May 3, 2010 at 4:22 pm
  • Found this little gem in the tests: private Schema parseSchema(String schemaString) throws ParseException { ByteArrayInputStream stream = new ByteArrayInputStream(schemaString.getBytes()) ; ...
    Corbin HoenesCorbin Hoenes
    May 28, 2010 at 9:52 pm
    May 28, 2010 at 9:52 pm
  • Hello, Greetings from IIIT, Hyderabad, We are delighted to inform you that IIIT-H is proudly conducting the first ever nationwide Workshop on Embedded Systems and its application in Robotics and ...
    IIIT-H RoboticsIIIT-H Robotics
    May 27, 2010 at 4:17 pm
    May 27, 2010 at 4:17 pm
  • Hey, guys, how are Bags passed to EvalFunc stored? I was looking at the Accumulator interface and it says that the reason why this needed for COUNT and SUM is because EvalFunc always gives you the ...
    Hc busyHc busy
    May 26, 2010 at 6:59 pm
    May 26, 2010 at 6:59 pm
  • I am wanting to make a writer that will, using the field names, look into an HBase table and compare fields using field names: example: ------------- REGISTER ./myUDF.jar; raw = LOAD 'My_File' USING ...
    Nathan HoultNathan Hoult
    May 24, 2010 at 4:09 pm
    May 24, 2010 at 4:09 pm
  • Hi out-there! Is there any other documentation like papers or articles about Zebra and / or its use? Thanks in advance. Renato M.
    Renato Marroquín MogrovejoRenato Marroquín Mogrovejo
    May 21, 2010 at 1:16 pm
    May 21, 2010 at 1:16 pm
  • The Travel Assistance Committee is now taking in applications for those wanting to attend ApacheCon North America (NA) 2010, which is taking place between the 1st and 5th November in Atlanta. The ...
    Alan GatesAlan Gates
    May 17, 2010 at 6:05 pm
    May 17, 2010 at 6:05 pm
  • I am trying the SQL ³NVL(city, ŒU¹) city² in pig I am using the bincond operator, ³(city is null?'U': city) AS city², which is of chararray type, the result file shows Œ\N¹ instead of U. Any ideas ?
    Wasti, SyedWasti, Syed
    May 17, 2010 at 4:58 pm
    May 17, 2010 at 4:58 pm
  • Hi folks, we proudly present the Berlin Buzzwords talks and presentations. There are tracks specific to the three tags search, store and scale. We have a fantastic mixture of developers and users of ...
    Isabel DrostIsabel Drost
    May 14, 2010 at 4:14 pm
    May 14, 2010 at 4:14 pm
  • Pig team is happy to announce Pig 0.7.0 release. Pig is Hadoop subproject which provides high-level data-flow language and execution framework for parallel computation on Hadoop clusters. More ...
    Daniel DaiDaniel Dai
    May 14, 2010 at 4:13 pm
    May 14, 2010 at 4:13 pm
  • We've heard your feedback from the last meetup: we're having less speakers and more discussion. Yay! We're expecting: 1. Facebook will talk ...
    Bradford StephensBradford Stephens
    May 13, 2010 at 11:47 pm
    May 13, 2010 at 11:47 pm
  • After more than one year since previous release I am proud to announce a new version of HAMAKE. Based on our experience of using we rewrote it in Java, added support for Amazon EMR. We also ...
    May 13, 2010 at 6:53 pm
    May 13, 2010 at 6:53 pm
Group Navigation
period‹ prev | May 2010 | next ›
Group Overview
groupuser @
categoriespig, hadoop

59 users for May 2010

Dmitriy Ryaboy: 29 posts Syed Wasti: 17 posts Corbin Hoenes: 13 posts Alan Gates: 11 posts Russell Jurney: 10 posts Hc busy: 9 posts Jeff Zhang: 8 posts Richard Ding: 7 posts Vishal Santoshi: 6 posts Yonggang Qiao: 6 posts Alexander Schätzle: 5 posts Ashutosh Chauhan: 5 posts Mridul Muralidharan: 5 posts Olga Natkovich: 5 posts Katukuri, Jay: 4 posts Bill Graham: 3 posts Jordi Deu-Pons: 3 posts Rekha Joshi: 3 posts Scott Carey: 3 posts Asif Jan: 2 posts
show more