Grokbase Groups Pig user April 2011
FAQ

Search Discussions

73 discussions - 341 posts

  • 17

    SUM

    x = foreach g2 generate group, data.(size); dump x; ((drm,0),{(464868)}) ((drm,1),{(464868)}) ((snezz,0),{(8073),(8073)}) but: x = foreach g2 generate group, SUM(data.size); 2011-04-24 18:02:18,910 ...
    PobPob
    Apr 24, 2011 at 4:03 pm
    Apr 25, 2011 at 9:41 pm
  • Sent for Renato, since Apache's mail system has decided it doesn't like him. Alan. I am getting an error while trying to execute a simple fragment replicated join on two files (one of 77MB and the ...
    Alan GatesAlan Gates
    Apr 26, 2011 at 4:24 pm
    May 2, 2011 at 6:01 pm
  • I noticed that there is a Pig JSON Loader (which might or might not be in piggbank). Could anyone confirm the existence or absence of a JSONToTuple UDF? (not a loader) I am inspired by the UDF ...
    Daniel EklundDaniel Eklund
    Apr 19, 2011 at 5:09 pm
    Apr 19, 2011 at 7:20 pm
  • This question might be better diagnosed as an Hbase issue, but since it's ultimately a Pig script I want to use, I figure someone on this group could help me out. I tried asking the IRC channel, but ...
    Daniel EklundDaniel Eklund
    Apr 12, 2011 at 1:54 pm
    Apr 12, 2011 at 11:59 pm
  • Is it possible to dereference a column part of a nested bag. In the schema given below, I am trying to dereference the columns Key and Value which is part of visit bag which is part of visits bag. ...
    Badrinarayanan SBadrinarayanan S
    Apr 8, 2011 at 10:10 am
    Apr 11, 2011 at 5:15 pm
  • Hello ... I have a cluster with 11 nodes each of them have 16 GB RAM, 6 core CPU, ! TB HDD and i use cloudera distribution CHD4b with Pig. I have two Pig Join queries which are a Parallel and a ...
    ByambajargalByambajargal
    Apr 17, 2011 at 1:03 pm
    Aug 23, 2011 at 7:08 am
  • Hi, I have a pig udf.My requirement is , on meeting certain criteria, I want to return from Pig udf.Is there any way I can early exit from Pig udf? Also, how can it be done in a Map/Reduce job? ...
    Souri dattaSouri datta
    Apr 29, 2011 at 12:02 pm
    May 26, 2011 at 10:27 am
  • I'm trying to do something like this: (if 'data' is a set of tuples loaded from a file containing fields a, b and c) (if 'M' is another set of tuples loaded from a file) data = FOREACH data GENERATE ...
    Mark LaczinMark Laczin
    Apr 20, 2011 at 1:27 pm
    Apr 25, 2011 at 4:09 pm
  • I'm trying to replace a couple of fields in a relation with values looked up in another relation. Here's an example; let's say I have a relation mapping each integer to its square: -----map.txt----- ...
    Jay HackerJay Hacker
    Apr 15, 2011 at 8:46 pm
    Apr 22, 2011 at 10:48 pm
  • Hi, All. When I do a pig query on Cassandra, and the Cassandra is updated by application at the same time, what will happen? I may get inconsistent results, right? -- Bing Graduate Student Computer ...
    Bing WeiBing Wei
    Apr 20, 2011 at 11:00 pm
    Apr 21, 2011 at 4:04 pm
  • I am a new pig user and have run into “Internal error 2999” . 2011-04-05 15:59:57,445 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. null Details at logfile: ...
    William DowlingWilliam Dowling
    Apr 5, 2011 at 8:10 pm
    Apr 6, 2011 at 7:51 pm
  • Hello, I have installed Pig 0.8.0 and Cassandra 0.7.4 and I'm not able to read data from cassandra. I write a simple query just to test: grunt A = LOAD 'cassandra://msg_keyspace/messages' USING ...
    Fabio SoutoFabio Souto
    Apr 5, 2011 at 1:42 pm
    Apr 6, 2011 at 2:29 pm
  • Currently using Pig 0.8 and Hadoop 0.20.2. I'm able to run things in local mode as well as run fs -* commands from within Grunt running in MapReduce mode. I can't seem to be able to execute any Pig ...
    Dimitris IliopoulosDimitris Iliopoulos
    Apr 1, 2011 at 9:17 pm
    Apr 4, 2011 at 10:26 pm
  • So I've recently added a protocol/schema to a collection I got from someone else, recompiled it, and added it to my scripts and am having problems. More specifically, it built just fine, and when ...
    Kris CowardKris Coward
    Apr 29, 2011 at 6:38 pm
    May 2, 2011 at 8:43 am
  • Hi, How to get the actual time spent in doing all the map-reduce operations while executing a pig script. It should exclude the time wait for the scheduler - and any other waiting time. Please help. ...
    Sumit ghoshSumit ghosh
    Apr 22, 2011 at 5:19 am
    Apr 28, 2011 at 10:33 am
  • Hello guys I am running cloudere distribution cdh3u0 on my cluster with Pig and Hbase. i can read data from hbase using the following pig query: my_data = LOAD 'hbase://table1' using ...
    ByambajargalByambajargal
    Apr 25, 2011 at 12:33 pm
    Apr 27, 2011 at 3:34 pm
  • Hi. I am looking for a way to get the result of top ordered. Is it possible ? Example: A = LOAD 'datatest' USING PigStorage(';') as (first: chararray, second: int); D = GROUP A BY first; topResults = ...
    Ugo jardonnetUgo jardonnet
    Apr 26, 2011 at 1:11 pm
    Apr 26, 2011 at 5:46 pm
  • Hi, When I run the below pig codes: a = load '/logs/2011-03-31'; b = filter a by $1=='a' and $2=='b'; store b into '20110331-ab'; It runs a M/R that have thousands maps, and then create a output ...
    Jameson LiJameson Li
    Apr 1, 2011 at 7:58 am
    Apr 2, 2011 at 8:26 pm
  • Hi I have the following input relation: Name Score Jack 25 Jimmy 30 Sam 20 Hick 35 Tampa 22 My goal is to rank the tuples by score. Pig script: sample_data = LOAD 'sample.txt' USING PigStorage() AS ...
    Arun A KArun A K
    Apr 27, 2011 at 2:08 am
    Apr 27, 2011 at 4:15 am
  • Hi all, I have a pig script that produces a complex nested data structure: result: {child: chararray, childTraces: {action: int, time: long}, legacy: {parent: chararray, parentTraces: {action: int, ...
    GianmarcoGianmarco
    Apr 15, 2011 at 4:19 pm
    May 25, 2011 at 5:32 pm
  • Hi Folks I've done a load of a dataset and I am attempting to filter out unwanted records by checking that one of my tuple fields contains a particular string. I've distilled this issue down to the ...
    Steve WattSteve Watt
    Apr 22, 2011 at 9:26 pm
    Apr 23, 2011 at 1:07 am
  • Hi, First, I group 2 tables using a key (named sid): rich_sessions = GROUP sessions BY sid, activities BY sid; After this operation, all the tuples in the bag "activities" start with the same "sid" ...
    Vincent BaratVincent Barat
    Apr 20, 2011 at 4:05 pm
    Apr 21, 2011 at 7:23 am
  • Hi, What would be the best way to write this script? I have two datasets - huge (hkey, hdata), small(skey). I want to filter all the data from huge dataset for which F(hdata, skey) is true. Please ...
    Aniket MokashiAniket Mokashi
    Apr 15, 2011 at 3:21 am
    Apr 15, 2011 at 4:13 pm
  • Hi, I am trying to run a filter against a column which is the result of a flatten operation. But the filter clause throws an exception as org.apache.pig.data.DataByteArray cannot be cast to ...
    Badrinarayanan SBadrinarayanan S
    Apr 7, 2011 at 12:28 pm
    Apr 8, 2011 at 6:50 pm
  • Hi, I have a file which has records having fixed length fields (and spaces appended to fill the field length) How can I load these records using Pig specifying field lengths and also auto trimming ...
    Shantian PurkadShantian Purkad
    Apr 5, 2011 at 6:19 am
    Apr 6, 2011 at 7:41 pm
  • I wish to have a Hadoop cluster with DR (Disaster Recovery), for which I need to have the data backup at a geographically different location. (if there's an eqrthquake or tsunami that hits one ...
    Deepak N85Deepak N85
    Apr 5, 2011 at 1:14 pm
    Apr 5, 2011 at 9:24 pm
  • If I have a tuple of values, is there a way to eliminate duplicate values per tuple? Example: (5,5,4,7,2,3,4,9) = (5,4,7,2,3,9) Thanks
    MarkMark
    Apr 3, 2011 at 2:45 pm
    Apr 3, 2011 at 8:38 pm
  • Hi Badri, http://mail-archives.apache.org/mod_mbox/pig-user/201104.mbox/%3C009a01cbf5ba$232d61d0$69882570$@com%3E Did you get a resolution on the above email post yet? Thanks Himanshu Garg +1 203 308 ...
    HimanshuHimanshu
    Apr 28, 2011 at 8:58 pm
    Apr 29, 2011 at 6:54 pm
  • Hi all, A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra. Currently there are a few handy UDFs in there like: ...
    Jeremy HannaJeremy Hanna
    Apr 27, 2011 at 6:57 pm
    Apr 27, 2011 at 10:00 pm
  • Hello, is it possible to return a bag from UDF? When I def. my python UDF like this... it simply doesnt work.... @outputSchema("y:bag{key:int, t:tuple(len:int,word:chararray)}") def toTuple(bag): ...
    PobPob
    Apr 24, 2011 at 2:06 pm
    Apr 24, 2011 at 4:01 pm
  • Hi there, I'm planning to do some performance measurements of my hadoop pig code in order to see how it scales. Does anyone have some suggestions on how to do that? I thought of measuring the time ...
    Lai WillLai Will
    Apr 20, 2011 at 7:58 pm
    Apr 20, 2011 at 10:05 pm
  • Hi, I recently for Pig to work with Lzo compression, with pig loaders from Elephant Bird. But, from my understanding my work flow is turning out to be: Step 1 : lzo-compress the raw input file. Step ...
    Chaitanya SharmaChaitanya Sharma
    Apr 19, 2011 at 7:45 pm
    Apr 19, 2011 at 9:22 pm
  • Hi, I am trying to get LZO support for my little pig - 0.8 project , I'm using https://github.com/gerritjvv/elephant-bird.git for the pig-lzo-loaders; and https://github.com/kevinweil/hadoop-lzo for ...
    Chaitanya SharmaChaitanya Sharma
    Apr 18, 2011 at 6:24 pm
    Apr 18, 2011 at 8:07 pm
  • Hi, is it possible to create an aggregating function with 2 parameters one of which is bag and another one is not? In particular, i want to use that to work around lack of function invocation ...
    Dmitriy LyubimovDmitriy Lyubimov
    Apr 16, 2011 at 1:27 am
    Apr 17, 2011 at 7:07 pm
  • Hi there, I have some questions about how PIG performs joins. The site says there are three types of specialized joins: Replicated, skew, and merge joins. I wanted to know these implementations. For ...
    Renato Marroquín MogrovejoRenato Marroquín Mogrovejo
    Apr 13, 2011 at 4:49 am
    Apr 13, 2011 at 1:02 pm
  • I am having trouble getting Pig to see my Hadoop configuration files despite following the "Classpath in MapReduce Mode" instructions in the ...
    W.P. McNeillW.P. McNeill
    Apr 12, 2011 at 8:39 pm
    Apr 13, 2011 at 12:21 am
  • I have a relation built by grouping the join (TCRaw) of a pair of basic relations (SrcFuid and NewCitationRel): grunt describe TCGroupedByFuid; TCGroupedByFuid: { group: (SrcFuid::citingdocid: int, ...
    William DowlingWilliam Dowling
    Apr 7, 2011 at 3:33 pm
    Apr 7, 2011 at 11:09 pm
  • No matter what I try, I end up losing the tuples after the initial flatten. I'm using some auto-generated test data with firstn, last and a concatanation for the key. The script and outputs. . . rows ...
    BobBob
    Apr 6, 2011 at 10:40 pm
    Apr 6, 2011 at 11:20 pm
  • Hi There. We need as part of our start-up product to compute "similar user feature". And we've decided to go with pig for it. I've been learning pig for a few days now and understand how it work. So ...
    Diallo Mamadou BoboDiallo Mamadou Bobo
    Apr 4, 2011 at 4:12 pm
    Apr 5, 2011 at 5:01 pm
  • Hi there Let's say I have DUMP A (user1, date1, {(item1), (item2)}, {(skill1), (skill2)}) (user1, date2, {(item3), (item4), (item5)}, {(skill1), (skill3)}) (user2, date2, {(item2), (item5)}, ...
    Lai WillLai Will
    Apr 28, 2011 at 4:06 pm
    Apr 28, 2011 at 6:48 pm
  • I'm sure this is well known, I'm just curious why it is documented as such... perhaps I am missing something obvious, but I see: /** * A load function that parses a line of input into fields using a ...
    Jonathan CoveneyJonathan Coveney
    Apr 28, 2011 at 3:31 pm
    Apr 28, 2011 at 3:51 pm
  • hey, is there a way to write UnitTests to pig scripts?
    Shai HarelShai Harel
    Apr 26, 2011 at 8:06 am
    Apr 27, 2011 at 6:57 am
  • So I'm running into something strange. Consider the following code: tfidf_all = LOAD '$TFIDF' AS (doc_id:chararray, token:chararray, weight:double); grouped = GROUP tfidf_all BY doc_id; vectors = ...
    Jacob PerkinsJacob Perkins
    Apr 24, 2011 at 5:41 pm
    Apr 26, 2011 at 2:44 pm
  • I have a bag of items (a result after a group operation) (key1, {1, 2, 3}) (key2, {1, 4, 5}) ... ect i want to generate a CROSS product on each entry (key1, {(1,1), (1,2), (1,3), (2,2), (2,3), ...
    Shai HarelShai Harel
    Apr 26, 2011 at 12:04 pm
    Apr 26, 2011 at 2:13 pm
  • Hello guys I am running cloudere distribution cdh3u0 on my cluster and i am trying to connect pig with Hbase. I have 11 nodes on my cluster so i have configured one machine as HBaseMaster and rest ...
    ByambajargalByambajargal
    Apr 24, 2011 at 5:40 pm
    Apr 24, 2011 at 8:04 pm
  • Hello, I installed jython + jruby on debian. export PIG_CLASSPATH=/path/cassandra-0.7/contrib/pig:/usr/share/java/jython.jar /path/cassandra-0.7/contrib/pig - here is my udf, myFunc.py ...
    PobPob
    Apr 23, 2011 at 9:46 pm
    Apr 23, 2011 at 10:37 pm
  • Hi there, I have a pig script and that has hardcoded some input/output file paths as well as some parameters. It's relatively tedious to change these... Is there a way to define constants at the ...
    Lai WillLai Will
    Apr 18, 2011 at 9:29 am
    Apr 18, 2011 at 10:35 am
  • I have been getting strange errors in my pig script and narrowed it down a bit and found that when I do a COUNT, sometimes it returns a float, but most of the time it returns a long. Some example ...
    Jeremy HannaJeremy Hanna
    Apr 15, 2011 at 9:44 pm
    Apr 16, 2011 at 1:22 am
  • wondering whether this is a bug or originally designed for. When I register my python udf file like this: Register 'a/b/mypyudfs.py' using jython as mypyudfs; I got an error saying "could not ...
    Xiaomeng WanXiaomeng Wan
    Apr 14, 2011 at 7:59 pm
    Apr 14, 2011 at 10:48 pm
  • I am going through a lot of processing with my data and then I reformat it to go back into my data store using the storefunc. I store it out to hdfs and it visually looks just fine. However when I ...
    Jeremy HannaJeremy Hanna
    Apr 8, 2011 at 4:31 pm
    Apr 8, 2011 at 10:39 pm
Group Navigation
period‹ prev | Apr 2011 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions73
posts341
users70
websitepig.apache.org

70 users for April 2011

Dmitriy Ryaboy: 35 posts Jeremy Hanna: 21 posts Pob: 19 posts Alan Gates: 18 posts Daniel Dai: 16 posts Mridul Muralidharan: 16 posts Xiaomeng Wan: 10 posts Daniel Eklund: 9 posts Jacob Perkins: 9 posts Bill Graham: 8 posts Byambajav byambajargal: 8 posts Thejas M Nair: 8 posts William F. Dowling: 8 posts Badrinarayanan S: 7 posts Mark Laczin: 7 posts Renato Marroquín Mogrovejo: 7 posts Sumit ghosh: 7 posts Lai Will: 6 posts Mark: 6 posts Aniket Mokashi: 5 posts
show more