Search Discussions
-
Sent for Renato, since Apache's mail system has decided it doesn't like him. Alan. I am getting an error while trying to execute a simple fragment replicated join on two files (one of 77MB and the ...
Alan Gates
Apr 26, 2011 at 4:24 pm
May 2, 2011 at 6:01 pm -
I noticed that there is a Pig JSON Loader (which might or might not be in piggbank). Could anyone confirm the existence or absence of a JSONToTuple UDF? (not a loader) I am inspired by the UDF ...
Daniel Eklund
Apr 19, 2011 at 5:09 pm
Apr 19, 2011 at 7:20 pm -
This question might be better diagnosed as an Hbase issue, but since it's ultimately a Pig script I want to use, I figure someone on this group could help me out. I tried asking the IRC channel, but ...
Daniel Eklund
Apr 12, 2011 at 1:54 pm
Apr 12, 2011 at 11:59 pm -
Is it possible to dereference a column part of a nested bag. In the schema given below, I am trying to dereference the columns Key and Value which is part of visit bag which is part of visits bag. ...
Badrinarayanan S
Apr 8, 2011 at 10:10 am
Apr 11, 2011 at 5:15 pm -
Hello ... I have a cluster with 11 nodes each of them have 16 GB RAM, 6 core CPU, ! TB HDD and i use cloudera distribution CHD4b with Pig. I have two Pig Join queries which are a Parallel and a ...
Byambajargal
Apr 17, 2011 at 1:03 pm
Aug 23, 2011 at 7:08 am -
Hi, I have a pig udf.My requirement is , on meeting certain criteria, I want to return from Pig udf.Is there any way I can early exit from Pig udf? Also, how can it be done in a Map/Reduce job? ...
Souri datta
Apr 29, 2011 at 12:02 pm
May 26, 2011 at 10:27 am -
I'm trying to do something like this: (if 'data' is a set of tuples loaded from a file containing fields a, b and c) (if 'M' is another set of tuples loaded from a file) data = FOREACH data GENERATE ...
Mark Laczin
Apr 20, 2011 at 1:27 pm
Apr 25, 2011 at 4:09 pm -
I'm trying to replace a couple of fields in a relation with values looked up in another relation. Here's an example; let's say I have a relation mapping each integer to its square: -----map.txt----- ...
Jay Hacker
Apr 15, 2011 at 8:46 pm
Apr 22, 2011 at 10:48 pm -
Hi, All. When I do a pig query on Cassandra, and the Cassandra is updated by application at the same time, what will happen? I may get inconsistent results, right? -- Bing Graduate Student Computer ...
Bing Wei
Apr 20, 2011 at 11:00 pm
Apr 21, 2011 at 4:04 pm -
I am a new pig user and have run into “Internal error 2999” . 2011-04-05 15:59:57,445 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. null Details at logfile: ...
William Dowling
Apr 5, 2011 at 8:10 pm
Apr 6, 2011 at 7:51 pm -
Hello, I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test: grunt A = LOAD 'cassandra://msg_keyspace/messages' USING ...
Fabio Souto
Apr 5, 2011 at 1:42 pm
Apr 6, 2011 at 2:29 pm -
Currently using Pig 0.8 and Hadoop 0.20.2. I'm able to run things in local mode as well as run fs -* commands from within Grunt running in MapReduce mode. I can't seem to be able to execute any Pig ...
Dimitris Iliopoulos
Apr 1, 2011 at 9:17 pm
Apr 4, 2011 at 10:26 pm -
So I've recently added a protocol/schema to a collection I got from someone else, recompiled it, and added it to my scripts and am having problems. More specifically, it built just fine, and when ...
Kris Coward
Apr 29, 2011 at 6:38 pm
May 2, 2011 at 8:43 am -
Hi, How to get the actual time spent in doing all the map-reduce operations while executing a pig script. It should exclude the time wait for the scheduler - and any other waiting time. Please help. ...
Sumit ghosh
Apr 22, 2011 at 5:19 am
Apr 28, 2011 at 10:33 am -
Hello guys I am running cloudere distribution cdh3u0 on my cluster with Pig and Hbase. i can read data from hbase using the following pig query: my_data = LOAD 'hbase://table1' using ...
Byambajargal
Apr 25, 2011 at 12:33 pm
Apr 27, 2011 at 3:34 pm -
Hi. I am looking for a way to get the result of top ordered. Is it possible ? Example: A = LOAD 'datatest' USING PigStorage(';') as (first: chararray, second: int); D = GROUP A BY first; topResults = ...
Ugo jardonnet
Apr 26, 2011 at 1:11 pm
Apr 26, 2011 at 5:46 pm -
Hi, When I run the below pig codes: a = load '/logs/2011-03-31'; b = filter a by $1=='a' and $2=='b'; store b into '20110331-ab'; It runs a M/R that have thousands maps, and then create a output ...
Jameson Li
Apr 1, 2011 at 7:58 am
Apr 2, 2011 at 8:26 pm -
Hi I have the following input relation: Name Score Jack 25 Jimmy 30 Sam 20 Hick 35 Tampa 22 My goal is to rank the tuples by score. Pig script: sample_data = LOAD 'sample.txt' USING PigStorage() AS ...
Arun A K
Apr 27, 2011 at 2:08 am
Apr 27, 2011 at 4:15 am -
Hi all, I have a pig script that produces a complex nested data structure: result: {child: chararray, childTraces: {action: int, time: long}, legacy: {parent: chararray, parentTraces: {action: int, ...
Gianmarco
Apr 15, 2011 at 4:19 pm
May 25, 2011 at 5:32 pm -
Hi Folks I've done a load of a dataset and I am attempting to filter out unwanted records by checking that one of my tuple fields contains a particular string. I've distilled this issue down to the ...
Steve Watt
Apr 22, 2011 at 9:26 pm
Apr 23, 2011 at 1:07 am -
Hi, First, I group 2 tables using a key (named sid): rich_sessions = GROUP sessions BY sid, activities BY sid; After this operation, all the tuples in the bag "activities" start with the same "sid" ...
Vincent Barat
Apr 20, 2011 at 4:05 pm
Apr 21, 2011 at 7:23 am -
Hi, What would be the best way to write this script? I have two datasets - huge (hkey, hdata), small(skey). I want to filter all the data from huge dataset for which F(hdata, skey) is true. Please ...
Aniket Mokashi
Apr 15, 2011 at 3:21 am
Apr 15, 2011 at 4:13 pm -
Hi, I am trying to run a filter against a column which is the result of a flatten operation. But the filter clause throws an exception as org.apache.pig.data.DataByteArray cannot be cast to ...
Badrinarayanan S
Apr 7, 2011 at 12:28 pm
Apr 8, 2011 at 6:50 pm -
Hi, I have a file which has records having fixed length fields (and spaces appended to fill the field length) How can I load these records using Pig specifying field lengths and also auto trimming ...
Shantian Purkad
Apr 5, 2011 at 6:19 am
Apr 6, 2011 at 7:41 pm -
I wish to have a Hadoop cluster with DR (Disaster Recovery), for which I need to have the data backup at a geographically different location. (if there's an eqrthquake or tsunami that hits one ...
Deepak N85
Apr 5, 2011 at 1:14 pm
Apr 5, 2011 at 9:24 pm -
If I have a tuple of values, is there a way to eliminate duplicate values per tuple? Example: (5,5,4,7,2,3,4,9) = (5,4,7,2,3,9) Thanks
Mark
Apr 3, 2011 at 2:45 pm
Apr 3, 2011 at 8:38 pm -
Hi Badri, http://mail-archives.apache.org/mod_mbox/pig-user/201104.mbox/%3C009a01cbf5ba$232d61d0$69882570$@com%3E Did you get a resolution on the above email post yet? Thanks Himanshu Garg +1 203 308 ...
Himanshu
Apr 28, 2011 at 8:58 pm
Apr 29, 2011 at 6:54 pm -
Hi all, A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra. Currently there are a few handy UDFs in there like: ...
Jeremy Hanna
Apr 27, 2011 at 6:57 pm
Apr 27, 2011 at 10:00 pm -
Hello, is it possible to return a bag from UDF? When I def. my python UDF like this... it simply doesnt work.... @outputSchema("y:bag{key:int, t:tuple(len:int,word:chararray)}") def toTuple(bag): ...
Pob
Apr 24, 2011 at 2:06 pm
Apr 24, 2011 at 4:01 pm -
Hi there, I'm planning to do some performance measurements of my hadoop pig code in order to see how it scales. Does anyone have some suggestions on how to do that? I thought of measuring the time ...
Lai Will
Apr 20, 2011 at 7:58 pm
Apr 20, 2011 at 10:05 pm -
Hi, I recently for Pig to work with Lzo compression, with pig loaders from Elephant Bird. But, from my understanding my work flow is turning out to be: Step 1 : lzo-compress the raw input file. Step ...
Chaitanya Sharma
Apr 19, 2011 at 7:45 pm
Apr 19, 2011 at 9:22 pm -
Hi, I am trying to get LZO support for my little pig - 0.8 project , I'm using https://github.com/gerritjvv/elephant-bird.git for the pig-lzo-loaders; and https://github.com/kevinweil/hadoop-lzo for ...
Chaitanya Sharma
Apr 18, 2011 at 6:24 pm
Apr 18, 2011 at 8:07 pm -
Hi, is it possible to create an aggregating function with 2 parameters one of which is bag and another one is not? In particular, i want to use that to work around lack of function invocation ...
Dmitriy Lyubimov
Apr 16, 2011 at 1:27 am
Apr 17, 2011 at 7:07 pm -
Hi there, I have some questions about how PIG performs joins. The site says there are three types of specialized joins: Replicated, skew, and merge joins. I wanted to know these implementations. For ...
Renato Marroquín Mogrovejo
Apr 13, 2011 at 4:49 am
Apr 13, 2011 at 1:02 pm -
I am having trouble getting Pig to see my Hadoop configuration files despite following the "Classpath in MapReduce Mode" instructions in the ...
W.P. McNeill
Apr 12, 2011 at 8:39 pm
Apr 13, 2011 at 12:21 am -
I have a relation built by grouping the join (TCRaw) of a pair of basic relations (SrcFuid and NewCitationRel): grunt describe TCGroupedByFuid; TCGroupedByFuid: { group: (SrcFuid::citingdocid: int, ...
William Dowling
Apr 7, 2011 at 3:33 pm
Apr 7, 2011 at 11:09 pm -
No matter what I try, I end up losing the tuples after the initial flatten. I'm using some auto-generated test data with firstn, last and a concatanation for the key. The script and outputs. . . rows ...
Bob
Apr 6, 2011 at 10:40 pm
Apr 6, 2011 at 11:20 pm -
Hi There. We need as part of our start-up product to compute "similar user feature". And we've decided to go with pig for it. I've been learning pig for a few days now and understand how it work. So ...
Diallo Mamadou Bobo
Apr 4, 2011 at 4:12 pm
Apr 5, 2011 at 5:01 pm -
Hi there Let's say I have DUMP A (user1, date1, {(item1), (item2)}, {(skill1), (skill2)}) (user1, date2, {(item3), (item4), (item5)}, {(skill1), (skill3)}) (user2, date2, {(item2), (item5)}, ...
Lai Will
Apr 28, 2011 at 4:06 pm
Apr 28, 2011 at 6:48 pm -
I'm sure this is well known, I'm just curious why it is documented as such... perhaps I am missing something obvious, but I see: /** * A load function that parses a line of input into fields using a ...
Jonathan Coveney
Apr 28, 2011 at 3:31 pm
Apr 28, 2011 at 3:51 pm -
hey, is there a way to write UnitTests to pig scripts?
Shai Harel
Apr 26, 2011 at 8:06 am
Apr 27, 2011 at 6:57 am -
So I'm running into something strange. Consider the following code: tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, weight:double); grouped = GROUP tfidf_all BY doc_id; vectors = ...
Jacob Perkins
Apr 24, 2011 at 5:41 pm
Apr 26, 2011 at 2:44 pm -
I have a bag of items (a result after a group operation) (key1, {1, 2, 3}) (key2, {1, 4, 5}) ... ect i want to generate a CROSS product on each entry (key1, {(1,1), (1,2), (1,3), (2,2), (2,3), ...
Shai Harel
Apr 26, 2011 at 12:04 pm
Apr 26, 2011 at 2:13 pm -
Hello guys I am running cloudere distribution cdh3u0 on my cluster and i am trying to connect pig with Hbase. I have 11 nodes on my cluster so i have configured one machine as HBaseMaster and rest ...
Byambajargal
Apr 24, 2011 at 5:40 pm
Apr 24, 2011 at 8:04 pm -
Hello, I installed jython + jruby on debian. export PIG_CLASSPATH=/path/cassandra-0.7/contrib/pig:/usr/share/java/jython.jar /path/cassandra-0.7/contrib/pig - here is my udf, myFunc.py ...
Pob
Apr 23, 2011 at 9:46 pm
Apr 23, 2011 at 10:37 pm -
Hi there, I have a pig script and that has hardcoded some input/output file paths as well as some parameters. It's relatively tedious to change these... Is there a way to define constants at the ...
Lai Will
Apr 18, 2011 at 9:29 am
Apr 18, 2011 at 10:35 am -
I have been getting strange errors in my pig script and narrowed it down a bit and found that when I do a COUNT, sometimes it returns a float, but most of the time it returns a long. Some example ...
Jeremy Hanna
Apr 15, 2011 at 9:44 pm
Apr 16, 2011 at 1:22 am -
wondering whether this is a bug or originally designed for. When I register my python udf file like this: Register 'a/b/mypyudfs.py' using jython as mypyudfs; I got an error saying "could not ...
Xiaomeng Wan
Apr 14, 2011 at 7:59 pm
Apr 14, 2011 at 10:48 pm -
I am going through a lot of processing with my data and then I reformat it to go back into my data store using the storefunc. I store it out to hdfs and it visually looks just fine. However when I ...
Jeremy Hanna
Apr 8, 2011 at 4:31 pm
Apr 8, 2011 at 10:39 pm
Group Overview
group | user |
categories | pig, hadoop |
discussions | 73 |
posts | 341 |
users | 70 |
website | pig.apache.org |
70 users for April 2011
Archives
- May 2013 (92)
- April 2013 (226)
- March 2013 (362)
- February 2013 (192)
- January 2013 (166)
- December 2012 (115)
- November 2012 (223)
- October 2012 (249)
- September 2012 (275)
- August 2012 (249)
- July 2012 (219)
- June 2012 (371)
- May 2012 (281)
- April 2012 (377)
- March 2012 (341)
- February 2012 (323)
- January 2012 (364)
- December 2011 (266)
- November 2011 (234)
- October 2011 (207)
- September 2011 (321)
- August 2011 (271)
- July 2011 (253)
- June 2011 (249)
- May 2011 (239)
- April 2011 (341)
- March 2011 (321)
- February 2011 (276)
- January 2011 (320)
- December 2010 (244)
- November 2010 (136)
- October 2010 (251)
- September 2010 (161)
- August 2010 (201)
- July 2010 (198)
- June 2010 (171)
- May 2010 (205)
- April 2010 (192)
- March 2010 (237)
- February 2010 (192)
- January 2010 (182)
- December 2009 (106)
- November 2009 (169)
- October 2009 (105)
- September 2009 (134)
- August 2009 (108)
- July 2009 (140)
- June 2009 (151)
- May 2009 (150)
- April 2009 (133)
- March 2009 (124)
- February 2009 (119)
- January 2009 (66)
- December 2008 (45)
- November 2008 (80)
- October 2008 (102)
- September 2008 (112)
- August 2008 (32)
- July 2008 (46)
- June 2008 (78)
- May 2008 (79)
- April 2008 (26)
- March 2008 (42)
- February 2008 (30)
- January 2008 (15)
- December 2007 (31)
- November 2007 (13)
- October 2007 (9)