Search Discussions

70 discussions - 275 posts

  • In partnership with Berkeley, Twitter employees are giving a number of lectures to a class on big data. I did one on Pig. I thought some might find it useful. Slides: http://db.tt/VJfy1Ixz Full ...
    Jon CoveneyJon Coveney
    Sep 5, 2012 at 6:34 am
    Sep 6, 2012 at 11:30 pm
  • Hi, I am try to parse this data using Pig parser org.apache.pig.piggybank.storage.JsonLoader ...
    Deepak TiwariDeepak Tiwari
    Sep 24, 2012 at 7:04 pm
    Apr 9, 2013 at 10:53 am
  • Hello Everyone, I am wondering if anyone has run into an issue that I am having using SAMPLE in a pig script to create a subsample of 0.001% from the orignal relation. Assume the relation "A" ...
    Brian ChoiBrian Choi
    Sep 11, 2012 at 10:37 pm
    Sep 19, 2012 at 4:15 am
  • Hi, I have a requirement to access HBase in UDF. But the HBase is configured to be secure, which needs a credential when being connected in a mapreduce job. I see you have added support of secure ...
    Sep 25, 2012 at 8:52 am
    Nov 5, 2012 at 9:49 pm
  • Hi, I 'v encountered a problem: the job failed because of POSplit retained too much memory in the reducer. How can I specify more reducers for the spill? Here's the screen snapshot of the Heap dump ...
    Haitao YaoHaitao Yao
    Sep 16, 2012 at 2:09 am
    Sep 17, 2012 at 9:27 am
  • Is there a way to read BytesWritable using sequence file loader from piggybank? If not then how should I go about implementing one?
    Mohit AnchliaMohit Anchlia
    Sep 11, 2012 at 6:22 pm
    Oct 22, 2012 at 4:04 pm
  • Our input path is something like YYYY/MM/DD/HH/input and we like to write to YYYY/MM/DD/HH/output . Is it possible to get the input path as a String and convert it to YYYY/MM/DD/HH/output that I can ...
    Mohit AnchliaMohit Anchlia
    Sep 10, 2012 at 11:11 pm
    Sep 17, 2012 at 5:12 am
  • Hi, I am trying to give multiple paths to a pig script using path globbing in HAR file format and it does not seem to work. I wanted to know if this is expected or a bug / feature request. Command ...
    Mohnish KodnaniMohnish Kodnani
    Sep 25, 2012 at 7:44 pm
    Sep 27, 2012 at 4:38 pm
  • I have a Json something like: { user{ id : 1 name: user1 } product { id: 1 name: product1 } } I want to be able to read this file and create 2 files as follows: user file: key,1,user1 product file ...
    Mohit AnchliaMohit Anchlia
    Sep 5, 2012 at 3:38 am
    Sep 13, 2012 at 2:01 pm
  • Hello, I'm having some trouble doing something I thought would be easy: I'd like to use matches to generate a boolean flag but this seems to not compile: FOREACH html_pages GENERATE portal_id, html ...
    James KebingerJames Kebinger
    Sep 27, 2012 at 3:54 pm
    Sep 28, 2012 at 9:53 pm
  • Hi all, I'm currently working with Pig 0.10.0. I'd like to load some data from an HBase table, but I encountered some problems. When I try to load the data it seems to work: grunt raw = LOAD ...
    Alberto CordioliAlberto Cordioli
    Sep 12, 2012 at 9:36 am
    Sep 13, 2012 at 9:09 am
  • Hello, My setup is Pig + Hadoop + Cassandra for my "big data" and MySql for my "relational/meta data". Up until now that has been fine, but now I need to start creating metrics that "cross the ...
    William ObermanWilliam Oberman
    Sep 11, 2012 at 3:18 pm
    Sep 12, 2012 at 2:42 pm
  • Hi, I have a huge text file of form data is saved in directory data/data1.txt, data2.txt and so on merchant_id, user_id, amount 1234, 9123, 299.2 1233, 9199, 203.2 1234, 0124, 230 and so on.. What I ...
    Jamal sashaJamal sasha
    Sep 26, 2012 at 1:36 am
    Sep 26, 2012 at 8:53 am
  • Forgive me for asking a FAQ but what is the current IDE of choice for Pig? I used to use a text editor and command line. I understand that PigPen (eclipse plugin) is no longer supported and does not ...
    Alex McLintockAlex McLintock
    Sep 25, 2012 at 8:24 am
    Sep 25, 2012 at 6:26 pm
  • Hi All, I would like to share my slides from the presentation about Apache Pig that I gave at the 3rd meeting of WHUG (Warsaw Hadoop User Group) a couple of months ago. Here is a link ...
    Adam KawaAdam Kawa
    Sep 15, 2012 at 10:16 am
    Sep 19, 2012 at 1:48 pm
  • Probably an easy one but... After processing a file through a series of groupings, aggreagtions and projections using flatten I end up with long concatenated names for each field shown in this ...
    Robert YerexRobert Yerex
    Sep 17, 2012 at 11:00 pm
    Sep 18, 2012 at 2:58 pm
  • Is it ok to reuse the same Tuple and List of inputs from RecordReader across all getNext calls in a LoadFunc? I notice that PigStorage creates a new List, mProtoTuple, for every record along with a ...
    Jim DonofrioJim Donofrio
    Sep 17, 2012 at 4:35 am
    Sep 17, 2012 at 1:15 pm
  • Hi, I'm little bit puzzled about REPLACE when there is backslash involved. I want to replace all the "dir" in the string with "\\test\sub", After a lot of try and error, I finally got it done, but ...
    Danfeng LiDanfeng Li
    Sep 7, 2012 at 9:08 pm
    Sep 9, 2012 at 5:28 pm
  • i recently meet this problem in my work, it's about pig flatten. i use a simple example to express it two files ===file1=== 1_a 2_b 4_d ===file2 (tab seperated)=== 1 a 2 b 3 c i tried three scripts ...
    Huo ZhuHuo Zhu
    Sep 4, 2012 at 11:17 am
    Sep 6, 2012 at 8:15 am
  • Hello list, I have a file in my Hdfs and I am reading this file and trying to store the data into an HBase table through Pig Shell. Here are the commands I am using :i z = load ...
    Mohammad TariqMohammad Tariq
    Sep 3, 2012 at 7:31 am
    Sep 3, 2012 at 2:45 pm
  • Hi all, I'm defining a udf to store Library description information. Here is the pig script: register fasta.jar; register /usr/lib/zookeeper/zookeeper-3.3.5-cdh3u4.jar; register ...
    Sep 25, 2012 at 5:31 pm
    Sep 29, 2012 at 2:20 am
  • I have two strings that I want to concatenate. The first one holds a number and coming from this set of commands: library = LOAD 'discovery_library' USING ...
    Sep 26, 2012 at 8:56 pm
    Sep 27, 2012 at 2:41 am
  • Hi, all I forgot the keyword which force Pig to finish the job and then continue the following script. My job failed because of OOME, so I want to split the jobs into smaller ones but still written ...
    Haitao YaoHaitao Yao
    Sep 16, 2012 at 2:53 am
    Sep 24, 2012 at 10:35 pm
  • Looking for an elegant way to do this: Suppose there is a bag with names { James, John, Lisa, Larry, Amanda, Amanda, John, James, Lisa, John} I'd like to get something back along the lines of a tuple ...
    Arun AhujaArun Ahuja
    Sep 19, 2012 at 5:11 pm
    Sep 21, 2012 at 6:13 pm
  • I need to pass two or more arguments in my udf to process the data in those arguments. I am unable to map the input data schema and getting the *ERROR 1045: Could not infer the matching function ...
    Dipesh Kumar SinghDipesh Kumar Singh
    Sep 19, 2012 at 6:13 pm
    Sep 20, 2012 at 1:54 am
  • Is there anyway within a LoadFunc to access the schema that a user defines after AS in a LOAD statement? Is there some property I can access in the UDFContext or ? pushProjection provides the schema ...
    Jim DonofrioJim Donofrio
    Sep 15, 2012 at 5:14 am
    Sep 17, 2012 at 5:07 am
  • Hi, I've a very simple script that try to import a PIG file: set pig.import.search.path '/tmp' import 'event.pig'; Even if the file /tmp/event.pig exists, it cannot be found. It seems that the ...
    Vincent BaratVincent Barat
    Sep 5, 2012 at 10:38 am
    Sep 10, 2012 at 5:28 pm
  • Hi all, I have this data, having fields (Date, symbol, rate) and I want it to be group by Months, and to find out the maximum rate value for each month. like: for month (08, 36.3), (09, 36.4), (10, ...
    Yogesh dhariYogesh dhari
    Sep 29, 2012 at 10:02 pm
    Sep 29, 2012 at 11:33 pm
  • I have a requirement to propagate field values from one row to another given type of record for example my raw input is 1,firefox,p 1,,q 1,,r 1,,s 2,ie,p 2,,s 3,chrome,p 3,,r 3,,s 4,netscape,p the ...
    Richipal SinghRichipal Singh
    Sep 27, 2012 at 8:50 pm
    Sep 27, 2012 at 9:15 pm
  • hi, i recently find this problem when i using parameter substitution in pig script my test.pig a = load ‘data' as (ch:chararray, num:int) i execute this script with, ( i need to use positional ...
    Huo ZhuHuo Zhu
    Sep 25, 2012 at 2:52 am
    Sep 25, 2012 at 6:07 am
  • Hi, during execution of the following PIG script i ran into the class cast exception mentioned in the title of this mail. The log indicates, that the error is happening in the reduce process and i ...
    Björn-Elmar MacekBjörn-Elmar Macek
    Sep 19, 2012 at 4:52 pm
    Sep 20, 2012 at 9:29 am
  • Hello All, Is there any Build-in Load function for loading ".Z" files ? Regards, Srini
    Sep 19, 2012 at 7:23 am
    Sep 20, 2012 at 3:02 am
  • Hi all, I have run the script. and dump it . 1.) Bcount = foreach Bgroup generate group, COUNT(Btop) as number ; 2.) Border = order Bcount by number desc; 3.) Dump Border ; and the its Output is ...
    Yogesh Kumar13Yogesh Kumar13
    Sep 19, 2012 at 8:26 pm
    Sep 20, 2012 at 12:19 am
  • I'm trying to group tuples by a key, sort by another key within each group, and then pass the sorted list of tuples for each group to a perl script. I need to use the perl script because I need to ...
    Kannan ShahKannan Shah
    Sep 17, 2012 at 7:55 pm
    Sep 18, 2012 at 10:28 pm
  • Hey all, I've starting using SequenceFiles more and more (in particular the elephant bird load and storage functions) and am wondering what's the best approach is for marshaling between a schema from ...
    Mat KelceyMat Kelcey
    Sep 16, 2012 at 12:16 am
    Sep 16, 2012 at 3:27 am
  • Hi all, I'm wondering if anyone has experience with my following scenario: I have a HBase table loaded with millions of records. I want to load these records into Pig, process each batch of 1000 by ...
    Terry SiuTerry Siu
    Sep 12, 2012 at 6:55 pm
    Sep 13, 2012 at 4:39 pm
  • I'm trying to do a UNION on two datasets with identical schemas (k:bytearray, v:chararray). When using the UNION operator like so: combined_data = UNION dataset1, dataset2; I get the following ...
    Xavier StevensXavier Stevens
    Sep 4, 2012 at 3:53 pm
    Sep 11, 2012 at 5:27 pm
  • I am trying to store field in a bag command but it fails with store b.page into '/flume_vol/flume/input/page.dat'; store b.network into '/flume_vol/flume/input/network.dat'; B: {b: {(page ...
    Mohit AnchliaMohit Anchlia
    Sep 10, 2012 at 5:53 pm
    Sep 11, 2012 at 1:57 am
  • http://hortonworks.com/blog/twitter-analytics-presents-hadoop-and-pig-at-uc-berkeley/ I think these lectures were posted before, but I thought Pig users might find this as amusing as me :) You can ...
    Russell JurneyRussell Jurney
    Sep 10, 2012 at 9:10 pm
    Sep 10, 2012 at 11:12 pm
  • Hi, I'd appreciate if anyone has some ideas/pointers regarding a pig script and custom UDF I have written. I've found it runs too slowly on my hadoop cluster to be useful....... I have two million ...
    James NewhavenJames Newhaven
    Sep 3, 2012 at 4:33 pm
    Sep 3, 2012 at 8:32 pm
  • Hello. I found 26 errors on "Pig Latin Basics" while translating the document to Japanese. (https://github.com/miyakawataku/pig/issues) Should I report each of them as an individual issue? Or should ...
    Miyakawa TakuMiyakawa Taku
    Sep 1, 2012 at 5:02 am
    Sep 1, 2012 at 5:44 am
  • Hi, Is it possible to use a regular expression as a delimiter to load a data, say sth. like A = load 'data' using PigStorage('\s+'); However, by checking the doc, it seems that only one character is ...
    Lei tangLei tang
    Sep 28, 2012 at 11:06 pm
    Sep 28, 2012 at 11:27 pm
  • Hi I have two files.. File 1 contains following data. Id, amount 1234, 22.7 1158,88 1234, 280 File 2 contains following data Id, min, max 1234, 8, 150 Now I want to calculate the mean (avg) but ...
    Jamal sashaJamal sasha
    Sep 28, 2012 at 4:52 pm
    Sep 28, 2012 at 5:41 pm
  • Hi, I am new to pig. In pig, I want to load multiple files with date variables at their names. If I load files between 2012/02/12 to 2012/02/19, the following works $START = "12" $END = "19" raw_data ...
    Jerry JiangJerry Jiang
    Sep 28, 2012 at 1:23 pm
    Sep 28, 2012 at 3:39 pm
  • Hi all, I am not getting the idea to use distinct for this, or how else we can perform this I have a file DATAFILE have content like. Date, time, ssitename, sip, csusername, cip ...
    Yogesh Kumar13Yogesh Kumar13
    Sep 26, 2012 at 5:56 am
    Sep 26, 2012 at 8:46 am
  • Hello, Need help with finding the distinct count. Would appreciate if you could please help. Here's my data file: id , dept, budget 1, Marketing, 9000 2, Marketing, 1000 3, Finance, 9000 4, Sales, ...
    Hadoop LearnerHadoop Learner
    Sep 26, 2012 at 1:55 am
    Sep 26, 2012 at 2:20 am
  • Hi, I have a quick question. I have two type of files. dirA/ -- file_a , file_b, file_c dirB/ -- another_file_a, another_file_b... Files in directory A contains tranascation information. So something ...
    Fraz manFraz man
    Sep 26, 2012 at 12:39 am
    Sep 26, 2012 at 1:08 am
  • I have a input files that I need to split into multiple rows but with the same key. For example: 1,a\nb\n 2,a\nc\n After the split this looks like: 1,a 1,b 2,a 2,c Is this possible without writing an ...
    Mohit AnchliaMohit Anchlia
    Sep 25, 2012 at 10:47 pm
    Sep 25, 2012 at 11:10 pm
  • Hi, I have a text file which has my hbase table information. It is comma separated. The first is attribute name (which I want it to be as column qualifier) and the second is attribute value. The file ...
    Sep 24, 2012 at 2:31 pm
    Sep 24, 2012 at 5:06 pm
    Sep 23, 2012 at 6:42 pm
    Sep 23, 2012 at 11:19 pm
Group Navigation
period‹ prev | Sep 2012 | next ›
Group Overview
groupuser @
categoriespig, hadoop

83 users for September 2012

Dmitriy Ryaboy: 30 posts Mohit Anchlia: 18 posts Cheolsoo Park: 17 posts Bill Graham: 13 posts Russell Jurney: 11 posts Alan Gates: 10 posts Ruslan Al-Fakikh: 10 posts Haitao Yao: 8 posts Deepak Tiwari: 7 posts Brian Choi: 5 posts HAJIHASHEMI, ZAHRA (AG/1000): 5 posts Jim Donofrio: 5 posts Mohnish Kodnani: 5 posts Adam Kawa: 4 posts Alberto Cordioli: 4 posts Huo Zhu: 4 posts William Oberman: 4 posts Yogesh Kumar13: 3 posts Anurag Gulati: 3 posts Arun Ahuja: 3 posts
show more