Search Discussions
-
In partnership with Berkeley, Twitter employees are giving a number of lectures to a class on big data. I did one on Pig. I thought some might find it useful. Slides: http://db.tt/VJfy1Ixz Full ...
Jon Coveney
Sep 5, 2012 at 6:34 am
Sep 6, 2012 at 11:30 pm -
Hi, I am try to parse this data using Pig parser org.apache.pig.piggybank.storage.JsonLoader ...
Deepak Tiwari
Sep 24, 2012 at 7:04 pm
Apr 9, 2013 at 10:53 am -
Hello Everyone, I am wondering if anyone has run into an issue that I am having using SAMPLE in a pig script to create a subsample of 0.001% from the orignal relation. Assume the relation "A" ...
Brian Choi
Sep 11, 2012 at 10:37 pm
Sep 19, 2012 at 4:15 am -
Hi, I have a requirement to access HBase in UDF. But the HBase is configured to be secure, which needs a credential when being connected in a mapreduce job. I see you have added support of secure ...
Ray
Sep 25, 2012 at 8:52 am
Nov 5, 2012 at 9:49 pm -
Hi, I 'v encountered a problem: the job failed because of POSplit retained too much memory in the reducer. How can I specify more reducers for the spill? Here's the screen snapshot of the Heap dump ...
Haitao Yao
Sep 16, 2012 at 2:09 am
Sep 17, 2012 at 9:27 am -
Is there a way to read BytesWritable using sequence file loader from piggybank? If not then how should I go about implementing one?
Mohit Anchlia
Sep 11, 2012 at 6:22 pm
Oct 22, 2012 at 4:04 pm -
Our input path is something like YYYY/MM/DD/HH/input and we like to write to YYYY/MM/DD/HH/output . Is it possible to get the input path as a String and convert it to YYYY/MM/DD/HH/output that I can ...
Mohit Anchlia
Sep 10, 2012 at 11:11 pm
Sep 17, 2012 at 5:12 am -
Hi, I am trying to give multiple paths to a pig script using path globbing in HAR file format and it does not seem to work. I wanted to know if this is expected or a bug / feature request. Command ...
Mohnish Kodnani
Sep 25, 2012 at 7:44 pm
Sep 27, 2012 at 4:38 pm -
I have a Json something like: { user{ id : 1 name: user1 } product { id: 1 name: product1 } } I want to be able to read this file and create 2 files as follows: user file: key,1,user1 product file ...
Mohit Anchlia
Sep 5, 2012 at 3:38 am
Sep 13, 2012 at 2:01 pm -
Hello, I'm having some trouble doing something I thought would be easy: I'd like to use matches to generate a boolean flag but this seems to not compile: FOREACH html_pages GENERATE portal_id, html ...
James Kebinger
Sep 27, 2012 at 3:54 pm
Sep 28, 2012 at 9:53 pm -
Hi all, I'm currently working with Pig 0.10.0. I'd like to load some data from an HBase table, but I encountered some problems. When I try to load the data it seems to work: grunt raw = LOAD ...
Alberto Cordioli
Sep 12, 2012 at 9:36 am
Sep 13, 2012 at 9:09 am -
Hello, My setup is Pig + Hadoop + Cassandra for my "big data" and MySql for my "relational/meta data". Up until now that has been fine, but now I need to start creating metrics that "cross the ...
William Oberman
Sep 11, 2012 at 3:18 pm
Sep 12, 2012 at 2:42 pm -
Hi, I have a huge text file of form data is saved in directory data/data1.txt, data2.txt and so on merchant_id, user_id, amount 1234, 9123, 299.2 1233, 9199, 203.2 1234, 0124, 230 and so on.. What I ...
Jamal sasha
Sep 26, 2012 at 1:36 am
Sep 26, 2012 at 8:53 am -
Forgive me for asking a FAQ but what is the current IDE of choice for Pig? I used to use a text editor and command line. I understand that PigPen (eclipse plugin) is no longer supported and does not ...
Alex McLintock
Sep 25, 2012 at 8:24 am
Sep 25, 2012 at 6:26 pm -
Hi All, I would like to share my slides from the presentation about Apache Pig that I gave at the 3rd meeting of WHUG (Warsaw Hadoop User Group) a couple of months ago. Here is a link ...
Adam Kawa
Sep 15, 2012 at 10:16 am
Sep 19, 2012 at 1:48 pm -
Probably an easy one but... After processing a file through a series of groupings, aggreagtions and projections using flatten I end up with long concatenated names for each field shown in this ...
Robert Yerex
Sep 17, 2012 at 11:00 pm
Sep 18, 2012 at 2:58 pm -
Is it ok to reuse the same Tuple and List of inputs from RecordReader across all getNext calls in a LoadFunc? I notice that PigStorage creates a new List, mProtoTuple, for every record along with a ...
Jim Donofrio
Sep 17, 2012 at 4:35 am
Sep 17, 2012 at 1:15 pm -
Hi, I'm little bit puzzled about REPLACE when there is backslash involved. I want to replace all the "dir" in the string with "\\test\sub", After a lot of try and error, I finally got it done, but ...
Danfeng Li
Sep 7, 2012 at 9:08 pm
Sep 9, 2012 at 5:28 pm -
i recently meet this problem in my work, it's about pig flatten. i use a simple example to express it two files ===file1=== 1_a 2_b 4_d ===file2 (tab seperated)=== 1 a 2 b 3 c i tried three scripts ...
Huo Zhu
Sep 4, 2012 at 11:17 am
Sep 6, 2012 at 8:15 am -
Hello list, I have a file in my Hdfs and I am reading this file and trying to store the data into an HBase table through Pig Shell. Here are the commands I am using :i z = load ...
Mohammad Tariq
Sep 3, 2012 at 7:31 am
Sep 3, 2012 at 2:45 pm -
Hi all, I'm defining a udf to store Library description information. Here is the pig script: register fasta.jar; register /usr/lib/zookeeper/zookeeper-3.3.5-cdh3u4.jar; register ...
HAJIHASHEMI, ZAHRA (AG/1000)
Sep 25, 2012 at 5:31 pm
Sep 29, 2012 at 2:20 am -
I have two strings that I want to concatenate. The first one holds a number and coming from this set of commands: library = LOAD 'discovery_library' USING ...
HAJIHASHEMI, ZAHRA (AG/1000)
Sep 26, 2012 at 8:56 pm
Sep 27, 2012 at 2:41 am -
Hi, all I forgot the keyword which force Pig to finish the job and then continue the following script. My job failed because of OOME, so I want to split the jobs into smaller ones but still written ...
Haitao Yao
Sep 16, 2012 at 2:53 am
Sep 24, 2012 at 10:35 pm -
Looking for an elegant way to do this: Suppose there is a bag with names { James, John, Lisa, Larry, Amanda, Amanda, John, James, Lisa, John} I'd like to get something back along the lines of a tuple ...
Arun Ahuja
Sep 19, 2012 at 5:11 pm
Sep 21, 2012 at 6:13 pm -
I need to pass two or more arguments in my udf to process the data in those arguments. I am unable to map the input data schema and getting the *ERROR 1045: Could not infer the matching function ...
Dipesh Kumar Singh
Sep 19, 2012 at 6:13 pm
Sep 20, 2012 at 1:54 am -
Is there anyway within a LoadFunc to access the schema that a user defines after AS in a LOAD statement? Is there some property I can access in the UDFContext or ? pushProjection provides the schema ...
Jim Donofrio
Sep 15, 2012 at 5:14 am
Sep 17, 2012 at 5:07 am -
Hi, I've a very simple script that try to import a PIG file: set pig.import.search.path '/tmp' import 'event.pig'; Even if the file /tmp/event.pig exists, it cannot be found. It seems that the ...
Vincent Barat
Sep 5, 2012 at 10:38 am
Sep 10, 2012 at 5:28 pm -
Hi all, I have this data, having fields (Date, symbol, rate) and I want it to be group by Months, and to find out the maximum rate value for each month. like: for month (08, 36.3), (09, 36.4), (10, ...
Yogesh dhari
Sep 29, 2012 at 10:02 pm
Sep 29, 2012 at 11:33 pm -
I have a requirement to propagate field values from one row to another given type of record for example my raw input is 1,firefox,p 1,,q 1,,r 1,,s 2,ie,p 2,,s 3,chrome,p 3,,r 3,,s 4,netscape,p the ...
Richipal Singh
Sep 27, 2012 at 8:50 pm
Sep 27, 2012 at 9:15 pm -
hi, i recently find this problem when i using parameter substitution in pig script my test.pig a = load ‘data' as (ch:chararray, num:int) i execute this script with, ( i need to use positional ...
Huo Zhu
Sep 25, 2012 at 2:52 am
Sep 25, 2012 at 6:07 am -
Hi, during execution of the following PIG script i ran into the class cast exception mentioned in the title of this mail. The log indicates, that the error is happening in the reduce process and i ...
Björn-Elmar Macek
Sep 19, 2012 at 4:52 pm
Sep 20, 2012 at 9:29 am -
Hello All, Is there any Build-in Load function for loading ".Z" files ? Regards, Srini
Srini
Sep 19, 2012 at 7:23 am
Sep 20, 2012 at 3:02 am -
Hi all, I have run the script. and dump it . 1.) Bcount = foreach Bgroup generate group, COUNT(Btop) as number ; 2.) Border = order Bcount by number desc; 3.) Dump Border ; and the its Output is ...
Yogesh Kumar13
Sep 19, 2012 at 8:26 pm
Sep 20, 2012 at 12:19 am -
I'm trying to group tuples by a key, sort by another key within each group, and then pass the sorted list of tuples for each group to a perl script. I need to use the perl script because I need to ...
Kannan Shah
Sep 17, 2012 at 7:55 pm
Sep 18, 2012 at 10:28 pm -
Hey all, I've starting using SequenceFiles more and more (in particular the elephant bird load and storage functions) and am wondering what's the best approach is for marshaling between a schema from ...
Mat Kelcey
Sep 16, 2012 at 12:16 am
Sep 16, 2012 at 3:27 am -
Hi all, I'm wondering if anyone has experience with my following scenario: I have a HBase table loaded with millions of records. I want to load these records into Pig, process each batch of 1000 by ...
Terry Siu
Sep 12, 2012 at 6:55 pm
Sep 13, 2012 at 4:39 pm -
I'm trying to do a UNION on two datasets with identical schemas (k:bytearray, v:chararray). When using the UNION operator like so: combined_data = UNION dataset1, dataset2; I get the following ...
Xavier Stevens
Sep 4, 2012 at 3:53 pm
Sep 11, 2012 at 5:27 pm -
I am trying to store field in a bag command but it fails with store b.page into '/flume_vol/flume/input/page.dat'; store b.network into '/flume_vol/flume/input/network.dat'; B: {b: {(page ...
Mohit Anchlia
Sep 10, 2012 at 5:53 pm
Sep 11, 2012 at 1:57 am -
http://hortonworks.com/blog/twitter-analytics-presents-hadoop-and-pig-at-uc-berkeley/ I think these lectures were posted before, but I thought Pig users might find this as amusing as me :) You can ...
Russell Jurney
Sep 10, 2012 at 9:10 pm
Sep 10, 2012 at 11:12 pm -
Hi, I'd appreciate if anyone has some ideas/pointers regarding a pig script and custom UDF I have written. I've found it runs too slowly on my hadoop cluster to be useful....... I have two million ...
James Newhaven
Sep 3, 2012 at 4:33 pm
Sep 3, 2012 at 8:32 pm -
Hello. I found 26 errors on "Pig Latin Basics" while translating the document to Japanese. (https://github.com/miyakawataku/pig/issues) Should I report each of them as an individual issue? Or should ...
Miyakawa Taku
Sep 1, 2012 at 5:02 am
Sep 1, 2012 at 5:44 am -
Hi, Is it possible to use a regular expression as a delimiter to load a data, say sth. like A = load 'data' using PigStorage('\s+'); However, by checking the doc, it seems that only one character is ...
Lei tang
Sep 28, 2012 at 11:06 pm
Sep 28, 2012 at 11:27 pm -
Hi I have two files.. File 1 contains following data. Id, amount 1234, 22.7 1158,88 1234, 280 File 2 contains following data Id, min, max 1234, 8, 150 Now I want to calculate the mean (avg) but ...
Jamal sasha
Sep 28, 2012 at 4:52 pm
Sep 28, 2012 at 5:41 pm -
Hi, I am new to pig. In pig, I want to load multiple files with date variables at their names. If I load files between 2012/02/12 to 2012/02/19, the following works $START = "12" $END = "19" raw_data ...
Jerry Jiang
Sep 28, 2012 at 1:23 pm
Sep 28, 2012 at 3:39 pm -
Hi all, I am not getting the idea to use distinct for this, or how else we can perform this I have a file DATAFILE have content like. Date, time, ssitename, sip, csusername, cip ...
Yogesh Kumar13
Sep 26, 2012 at 5:56 am
Sep 26, 2012 at 8:46 am -
Hello, Need help with finding the distinct count. Would appreciate if you could please help. Here's my data file: id , dept, budget 1, Marketing, 9000 2, Marketing, 1000 3, Finance, 9000 4, Sales, ...
Hadoop Learner
Sep 26, 2012 at 1:55 am
Sep 26, 2012 at 2:20 am -
Hi, I have a quick question. I have two type of files. dirA/ -- file_a , file_b, file_c dirB/ -- another_file_a, another_file_b... Files in directory A contains tranascation information. So something ...
Fraz man
Sep 26, 2012 at 12:39 am
Sep 26, 2012 at 1:08 am -
I have a input files that I need to split into multiple rows but with the same key. For example: 1,a\nb\n 2,a\nc\n After the split this looks like: 1,a 1,b 2,a 2,c Is this possible without writing an ...
Mohit Anchlia
Sep 25, 2012 at 10:47 pm
Sep 25, 2012 at 11:10 pm -
Hi, I have a text file which has my hbase table information. It is comma separated. The first is attribute name (which I want it to be as column qualifier) and the second is attribute value. The file ...
HAJIHASHEMI, ZAHRA (AG/1000)
Sep 24, 2012 at 2:31 pm
Sep 24, 2012 at 5:06 pm -
Hi all, I'm new to pig and need to format my file. I have fasta file with this fomat: CGACACGACTCTCGGCAACGGATA CGACACGACTCTCGGCAACGGATAC GACACGACTCTCGGCAACGGATA CGACACGACTCTCGGCAACGGA ...
HAJIHASHEMI, ZAHRA (AG/1000)
Sep 23, 2012 at 6:42 pm
Sep 23, 2012 at 11:19 pm
Group Overview
group | user |
categories | pig, hadoop |
discussions | 70 |
posts | 275 |
users | 83 |
website | pig.apache.org |
83 users for September 2012
Archives
- May 2013 (92)
- April 2013 (226)
- March 2013 (362)
- February 2013 (192)
- January 2013 (166)
- December 2012 (115)
- November 2012 (223)
- October 2012 (249)
- September 2012 (275)
- August 2012 (249)
- July 2012 (219)
- June 2012 (371)
- May 2012 (281)
- April 2012 (377)
- March 2012 (341)
- February 2012 (323)
- January 2012 (364)
- December 2011 (266)
- November 2011 (234)
- October 2011 (207)
- September 2011 (321)
- August 2011 (271)
- July 2011 (253)
- June 2011 (249)
- May 2011 (239)
- April 2011 (341)
- March 2011 (321)
- February 2011 (276)
- January 2011 (320)
- December 2010 (244)
- November 2010 (136)
- October 2010 (251)
- September 2010 (161)
- August 2010 (201)
- July 2010 (198)
- June 2010 (171)
- May 2010 (205)
- April 2010 (192)
- March 2010 (237)
- February 2010 (192)
- January 2010 (182)
- December 2009 (106)
- November 2009 (169)
- October 2009 (105)
- September 2009 (134)
- August 2009 (108)
- July 2009 (140)
- June 2009 (151)
- May 2009 (150)
- April 2009 (133)
- March 2009 (124)
- February 2009 (119)
- January 2009 (66)
- December 2008 (45)
- November 2008 (80)
- October 2008 (102)
- September 2008 (112)
- August 2008 (32)
- July 2008 (46)
- June 2008 (78)
- May 2008 (79)
- April 2008 (26)
- March 2008 (42)
- February 2008 (30)
- January 2008 (15)
- December 2007 (31)
- November 2007 (13)
- October 2007 (9)