Grokbase Groups Pig user July 2012
FAQ

Search Discussions

48 discussions - 219 posts

  • Hi all, I used to use the following pig script to do the counting of the records. m_skill_group = group m_skills_filter by member_id; grpd = group m_skill_group all; cnt = foreach grpd generate ...
    Sheng GuoSheng Guo
    Jul 2, 2012 at 6:42 pm
    Aug 9, 2012 at 2:52 pm
  • I have a problem where I can't join a relation to itself on a different field. describe pairs pairs: {from: chararray,to: chararray,message_id: chararray,in_reply_to: chararray} pairs2 = pairs ...
    Russell JurneyRussell Jurney
    Jul 20, 2012 at 2:35 am
    Jul 24, 2012 at 8:11 am
  • Hi All, I am new to PIG, trying to stroe data in HDFS as comma separated by using command store RECORDS into 'hadoop/pig/records' using PigStorage(','); If I do dump RECORDS ; it shows (YogeshKumar ...
    Yogesh Kumar13Yogesh Kumar13
    Jul 25, 2012 at 10:49 am
    Jul 26, 2012 at 2:36 am
  • The email package is a part of Jython, I believe: http://www.jython.org/docs/library/email.html However, when I 'import email' in udfs.py, I get this error: 2012-07-23 17:32:51,027 [main] ERROR ...
    Russell JurneyRussell Jurney
    Jul 24, 2012 at 12:34 am
    Jan 29, 2013 at 7:49 pm
  • Hi all, I loaded a file to pig by command from HDFS. A=load '/HADOOP/Yogesh/demo.txt' as (name:chararray, roll:int); its get loaded and when i do dump A: it shows (Yogesh 12,) (Aashi 13,) (mohit 14,) ...
    Yogesh Kumar13Yogesh Kumar13
    Jul 25, 2012 at 6:06 pm
    Jul 25, 2012 at 7:00 pm
  • Hello, I got the following code: A = LOAD '§file1' USING AvroStorage(); B = LOAD '$file2' USING AvroStorage(); C = JOIN A BY id LEFT OUTER, B BY id; SPLIT C INTO D IF B::id IS NULL, E OTHERWISE ...
    Florian Zumkeller-QuastFlorian Zumkeller-Quast
    Jul 25, 2012 at 9:49 am
    Jul 27, 2012 at 3:34 pm
  • Hi pig users, I have coded my own algebraic UDF in Java, and it seems that pig do not use the algebraic interface at all. (I put some log messages in my Initial,Intermed and Final functions, and they ...
    Benoit MathieuBenoit Mathieu
    Jul 25, 2012 at 4:32 pm
    Jul 25, 2012 at 5:33 pm
  • Hi all, I am a new comer here. I encounter a problem toady: Pig version: 0.10.0 temp2 = LOAD '/pig/procedure/tzone' USING PigStorage(';'); zone = FOREACH temp2 { a = STRSPLIT($0,'#',3); product = ...
    Cdy ChenCdy Chen
    Jul 10, 2012 at 2:04 pm
    Jul 17, 2012 at 1:27 pm
  • hi, I wrote a pig script that one of the reduces always OOM no matter how I change the parallelism. Here's the script snippet: Data = group SourceData all; Result = foreach Data generate group, ...
    Haitao YaoHaitao Yao
    Jul 6, 2012 at 6:44 am
    Jul 10, 2012 at 3:30 pm
  • Hi, I am new to pig scripting. I like to generate multiple tuples from a single tuple. What I mean is: I have file with following data in it. ID | ColumnName1:Value1 | ColumnName2:Value2 so I load it ...
    NareshNaresh
    Jul 2, 2012 at 2:34 am
    Jul 5, 2012 at 6:29 pm
  • I created a Udf that returns a Bag of Tuples. the syntax is all fine, but when I run it in pig, Pig gives error: 2/07/17 16:51:58 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with ...
    YangYang
    Jul 18, 2012 at 12:06 am
    Jul 19, 2012 at 10:14 am
  • Hey everyone, in the thread "Downgrade CDH4 to CDH3" of the cloudera mailing list I talked about issues we had with pig while testing cdh4 and that we had trouble in switching back to cdh3. After I ...
    Markus ReschMarkus Resch
    Jul 17, 2012 at 11:26 am
    Jul 26, 2012 at 4:51 pm
  • Gentlemen, We have recently attempted to compile and use the latest trunk code and have encountered a rather strange issue. Our job which is attached, has been working fine on V11 of pig that we have ...
    Alex RovnerAlex Rovner
    Jul 26, 2012 at 2:41 pm
    Aug 9, 2012 at 10:42 pm
  • Hello, I'm running a pretty simple pig job but despite my best efforts to disable compression, the output parts are written in in .lzo_deflate form like ...
    James KebingerJames Kebinger
    Jul 30, 2012 at 6:21 pm
    Jul 30, 2012 at 7:24 pm
  • Can someone explain this script to me? It is freaking me out. When did Pig start spitting out 'None' in place of null? register /me/pig/build/ivy/lib/Pig/avro-1.5.3.jar register ...
    Russell JurneyRussell Jurney
    Jul 24, 2012 at 5:50 am
    Jul 24, 2012 at 8:44 pm
  • Do I need to create a udf for this is or is there something out there? Thanks, Ben
    Benjamin JuhnBenjamin Juhn
    Jul 27, 2012 at 5:58 pm
    Jul 29, 2012 at 7:32 pm
  • Hi All, I am new to Pig. Does Pig - 0.10.0 compitable with hadoop-0.20.2? Please suggest Regards Yogesh Kumar Please do not print this email unless it is absolutely necessary. The information ...
    Yogesh Kumar13Yogesh Kumar13
    Jul 25, 2012 at 5:36 am
    Jul 25, 2012 at 1:57 pm
  • Hi Guys, I want to join 2 tables in hive on couple of columns and out them one condition is timestamp of one column is greater then the other one. In SQL I could have written in this way: table_a a ...
    Sonia gehlotSonia gehlot
    Jul 5, 2012 at 7:22 pm
    Jul 6, 2012 at 3:09 pm
  • Hi Guys, I have use case, where I need to generate data feed using Pig script. Data feed in total is of about 12 GB. I want Pig script to generate 1 file and data in that data should be sorted as ...
    Sonia gehlotSonia gehlot
    Jul 3, 2012 at 12:06 am
    Jul 3, 2012 at 7:19 pm
  • Since ComparisonFunc is now depreciated, what is its replacement? I can't any information in the Javadoc. Is it safe to continue to extend ComparisonFunc for custom ordering? Thanks. Calvin
    Calvin CheungCalvin Cheung
    Jul 31, 2012 at 1:06 am
    Aug 9, 2012 at 5:59 pm
  • Hi, all I got an idea about a new feature for pig: define global constants in pig script. Here's the example like this: -- define a global constant for storage define store_location ...
    Haitao YaoHaitao Yao
    Jul 27, 2012 at 3:32 am
    Jul 30, 2012 at 6:39 am
  • Hi All, I having a structure AA: {roll: bytearray, job: bytearray,name: bytearray} (12,yahoo,Yogesh) (13,google,Mohit) (14,L.K.G ,Aashi) (15,School,Renu) (16,tcs,Rajat) (12,yahoo, Vishu) (13,google, ...
    Yogesh Kumar13Yogesh Kumar13
    Jul 26, 2012 at 1:36 pm
    Jul 26, 2012 at 2:13 pm
  • Hi all, I've been using apache pig to do some ETL work, but ran into a weird problem today when trying pyhon UDFs. I borrowed an example from ...
    MiaoMiaoMiaoMiao
    Jul 20, 2012 at 7:12 am
    Jul 24, 2012 at 5:57 am
  • Hi all, Sorry to bother you guys, I used rm FILE_PATH to delete the file in pig script, but sometimes if the file does not exist, or I am not sure if it exists, this statement will give some error ...
    Sheng GuoSheng Guo
    Jul 23, 2012 at 9:13 am
    Jul 23, 2012 at 8:22 pm
  • Is there a reason that piggybank isn't compiled and put into the apache tar releases as a jar?
    David CapwellDavid Capwell
    Jul 12, 2012 at 4:37 pm
    Jul 16, 2012 at 11:28 pm
  • Hi There, I'm trying to concat the first tag string with type string for all records. Could someone advise on syntax? records: {meta:(type: chararray, tags_bag: {t: (tags_tuple: (tags: chararray))} ...
    Benjamin JuhnBenjamin Juhn
    Jul 8, 2012 at 5:57 pm
    Jul 10, 2012 at 4:44 am
  • Hi all, I'm walking through a pig script in grunt, but I am getting stuck with some issues using nested foreach. I'm using Pig version 0.9.2 I'm trying to find the number of unique users from a bag ...
    Chun YangChun Yang
    Jul 6, 2012 at 7:42 pm
    Jul 6, 2012 at 9:41 pm
  • Cutting this over from #hadoop-pig IRC: hi Pig people. I have some TV viewing logs in a text format - example http://pastebin.com/raw.php?i=HS4zy2pP - ... unfortunately it has some nesting/list ...
    Dan BrickleyDan Brickley
    Jul 5, 2012 at 8:21 pm
    Jul 6, 2012 at 2:14 pm
  • Hi, Is there any restriction in using Pig in Windows? For my development i want to use only local mode in my laptop. I am facing issues in getting Linux installed, so i thought if i could just ...
    Subir SSubir S
    Jul 5, 2012 at 6:09 pm
    Jul 6, 2012 at 1:20 am
  • normally job tracker and task tracker is on different nodes. when I submit a pig script using UDF. I think the UDF constructor is first run (several times, don't know why) on the job tracker, and ...
    YangYang
    Jul 3, 2012 at 4:58 pm
    Jul 3, 2012 at 5:06 pm
  • Hi, does anyone happen to have a sample of how to load a avro record from HDFS given a location. In my case the schema is just "binary". I'm working on a custom loader and I've been playing around ...
    Fabian AleniusFabian Alenius
    Jul 2, 2012 at 10:50 pm
    Jul 2, 2012 at 11:45 pm
  • Hi all, I got problems compiling zebra for use with pig in CDH4. ----------- /usr/lib/pig/contrib/zebra$ ant jar Buildfile: /usr/lib/pig/contrib/zebra/build.xml javacc-exists: BUILD FAILED ...
    Benoit MathieuBenoit Mathieu
    Jul 20, 2012 at 10:58 am
    Jul 20, 2012 at 6:55 pm
  • Hi, Not sure what is going on here. Trying to get simple pig script to work when put on hdfs:// I am able to "hadoop fs -cat" the file but get an error when trying to read it in pig? Any ideas of ...
    John MorrisonJohn Morrison
    Jul 13, 2012 at 3:19 pm
    Jul 18, 2012 at 1:15 am
  • I have the following PIG script. In the beginning, I set the mapred.map.tasks but when the job is launched, I see from jobtracker UI that the job.xml shows that the mapred.map.tasks param is set to ...
    YangYang
    Jul 17, 2012 at 12:16 am
    Jul 17, 2012 at 3:50 am
  • Hi there, I have a bag with the following schema and I'm having trouble accessing the first chararray element. {{someString: chararray}} Can someone advise on the correct syntax? Thanks, Ben
    Benjamin JuhnBenjamin Juhn
    Jul 7, 2012 at 10:37 pm
    Jul 8, 2012 at 12:41 am
  • Hi, I am using the built-in org.apache.pig.builtin.AVG function. I have a set of 100,000 items that I want to average. The relevant pig latin is below: L = FOREACH K GENERATE AVG(I.productcost), ...
    James NewhavenJames Newhaven
    Jul 4, 2012 at 5:38 pm
    Jul 4, 2012 at 9:05 pm
  • Hi, I need to determine the number of days between dates on a running list of records. The records associated with each key will be small (less than 100) I should be able to do it in one reducer. The ...
    Bob BriskiBob Briski
    Jul 3, 2012 at 3:21 pm
    Jul 3, 2012 at 4:58 pm
  • Hortonworks will be hosting the next Pig Hackathon on August 24th. http://www.meetup.com/PigUser/events/75286212/ The agenda: - Help newcomers get started on their first UDF or patch and walk through ...
    Alan GatesAlan Gates
    Jul 30, 2012 at 4:43 pm
    Jul 30, 2012 at 4:43 pm
  • Hi all, I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the same script using the two Pig versions, 0.9.1 starts off fast and almost immediately submits the job to the cluster. On ...
    Chun YangChun Yang
    Jul 26, 2012 at 10:33 pm
    Jul 26, 2012 at 10:33 pm
  • Hi, I have written a python udf which imports the datetime module. However, when I registered the python udf in pig, I encountered the following error: 2012-07-25 15:14:34,335 [main] ERROR ...
    Lei tangLei tang
    Jul 25, 2012 at 10:48 pm
    Jul 25, 2012 at 10:48 pm
  • Hi all, I'm having a problem with Python Pig embedding where I can run a Pig script fine by itself, but I get errors when I try to embed it into python using Pig.compileFromFile(), like so: init = ...
    Chun YangChun Yang
    Jul 21, 2012 at 12:20 am
    Jul 21, 2012 at 12:20 am
  • In piggybank there is org.apache.pig.piggybank.evaluation.stats.COR, but it's deprecated. There are also COR.Final, COR.Initial and COR.Intermed. I tried to find some examples, but unsuccessful ...
    Danfeng LiDanfeng Li
    Jul 19, 2012 at 6:23 pm
    Jul 19, 2012 at 6:23 pm
  • I see the following while running Pig Main in local mode The archive: /pig-trunk/build/ivy/lib/Pig/jython-2.5.0.jar which is referenced by the classpath, does not exist. I see this jython lib ...
    Prashant KommireddiPrashant Kommireddi
    Jul 12, 2012 at 7:00 pm
    Jul 12, 2012 at 7:00 pm
  • Can anyone using AvroStorage with ILLUSTRATE verify that it works? When I switched to pig-0.10 from my janky branch it stopped working :( -- Russell Jurney twitter.com/rjurney <span ...
    Russell JurneyRussell Jurney
    Jul 5, 2012 at 11:36 pm
    Jul 5, 2012 at 11:36 pm
  • I have the following code in my LoadFunc public class Loader extends PigStorage implements LoadMetadata { ... @Override public ResourceSchema getSchema(String location, Job job) throws IOException { ...
    David CapwellDavid Capwell
    Jul 5, 2012 at 5:32 pm
    Jul 5, 2012 at 5:32 pm
  • hi, all I encountered an Exception like this: ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. null at ...
    Haitao YaoHaitao Yao
    Jul 4, 2012 at 3:18 am
    Jul 4, 2012 at 3:18 am
  • Does the replace function replace adjacent occurrences of the string or does one need to specify it using regex? Thanks, Ranjith
    RanjithRanjith
    Jul 3, 2012 at 3:44 am
    Jul 3, 2012 at 3:44 am
  • when I upload 2 files like -file myfile1 -file myfile2 only the first one is actually uploaded. but according to ...
    YangYang
    Jul 2, 2012 at 6:30 pm
    Jul 2, 2012 at 6:30 pm
Group Navigation
period‹ prev | Jul 2012 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions48
posts219
users57
websitepig.apache.org

57 users for July 2012

Russell Jurney: 19 posts Jonathan Coveney: 16 posts Haitao Yao: 14 posts Dmitriy Ryaboy: 12 posts Alan Gates: 11 posts Yogesh Kumar13: 10 posts Yang: 9 posts Mohammad Tariq: 8 posts Alex Rovner: 7 posts Subir S: 6 posts Benoit Mathieu: 5 posts Chun Yang: 5 posts Sheng Guo: 5 posts Cheolsoo Park: 4 posts Naresh: 4 posts Pablomar: 4 posts Robert Yerex: 4 posts Ruslan Al-Fakikh: 4 posts Ruslan Al-Fakikh: 4 posts Sonia gehlot: 4 posts
show more