Grokbase Groups Pig user June 2012

Search Discussions

87 discussions - 371 posts

  • Hey everyone, We're doing some aggregation. The result contains a key where we want to have a single output file for each key. Is it possible to store files like this? Especially adjusting the path ...
    Markus ReschMarkus Resch
    Jun 22, 2012 at 11:55 am
    Jul 5, 2012 at 3:01 pm
  • Is it possible to pass a bag to a Pig UDF constructor? Basically in the constructor I want to initialize some hash map so that on every exec operation, I can use the hashmap to do a lookup and find ...
    Dexin WangDexin Wang
    Jun 26, 2012 at 5:28 pm
    Jun 29, 2012 at 5:26 pm
  • Dear all, in the sql, there is a in clause which is used to check if the value is in a set or not? Does pig also have the same in clause? Such as: B = filter A by A1 in C; A,B,C are relation names ...
    Jun 25, 2012 at 9:51 am
    Jul 4, 2012 at 2:02 pm
  • Hi, I just realized that one of my large scale pig jobs that has 100K map jobs actually only has one reduce task. Reading the documentation I see that the number of reduce tasks is defined by the ...
    Pankaj GuptaPankaj Gupta
    Jun 1, 2012 at 6:46 pm
    Jun 18, 2012 at 2:18 pm
  • Hi everyone, Sorry to bother. How can I configure the number of mappers for my pig script? Thanks a lot! Sheng
    Sheng GuoSheng Guo
    Jun 23, 2012 at 2:27 am
    Jun 26, 2012 at 11:48 pm
  • my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int, y:int)} in my pig script: I do K = foreach blah generate UDF( xxx); M = foreach K generate x; here PIG 0.8.1 says x can not be ...
    Jun 24, 2012 at 10:41 am
    Jul 17, 2012 at 7:52 pm
  • I'm using ivy to download dependencies for pig, but after updating the version to 0.10.0 I am getting errors in my unit tests: [testng] java.lang.NoClassDefFoundError ...
    Matthew HayesMatthew Hayes
    Jun 8, 2012 at 7:08 pm
    Jul 5, 2012 at 10:06 pm
  • Hi there, I'm trying to write a group by statement, only returning the top 100 records from each group. Does pig support this? Thanks, Ben
    Benjamin JuhnBenjamin Juhn
    Jun 29, 2012 at 11:19 pm
    Jun 30, 2012 at 4:48 am
  • I agree that pig does not have loop probably for a good reason. but currently I need to write a code to find the transitive closures of many edges in a graph. so I need to iterate a code snippet ...
    Jun 21, 2012 at 2:12 am
    Jun 21, 2012 at 5:12 am
  • I am trying to parse URL using map type of pig. My query string is: My very simple script for testing is this. But when I look at ...
    Mohit AnchliaMohit Anchlia
    Jun 18, 2012 at 6:20 pm
    Jun 19, 2012 at 7:34 pm
  • Dear All, How can I define UDF load function to load the bag field? Such as A = LOAD 'location' as (filed_name : bag {}). Can anyone show me an example code? Regards! Yong
    Jun 11, 2012 at 4:07 pm
    Jun 13, 2012 at 8:03 am
  • let's say my pig script generates 2 MR jobs. it seems that currently pig parser won't try to parse the second part until it finishes running the first MR. by that time 1 hour may have passed and now ...
    Jun 29, 2012 at 2:19 am
    Jun 29, 2012 at 5:30 pm
  • hi, How can I distinct only one field of a relation? here's the demo: A = LOAD 'data' AS (a1:int,a2:int,a3:int); B = distinct A by a1; how can I do this? Haitao Yao <span class="m_body_email_addr" ...
    Haitao YaoHaitao Yao
    Jun 27, 2012 at 2:54 am
    Jun 27, 2012 at 2:09 pm
  • Hi, Is it possible to implement transpose operation of rows into columns and vice versa... i.e. col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11 col12 can this be converted to col1 col4 col7 ...
    Subir SSubir S
    Jun 21, 2012 at 8:17 am
    Jun 25, 2012 at 7:56 pm
  • Hi all, is it possible to have an input path (as parameter to a LOAD statement) that contains several files in *different formats* - say serialized Avro data and tab separated values and make pig ...
    Johannes SchwenkJohannes Schwenk
    Jun 15, 2012 at 12:14 pm
    Jun 15, 2012 at 4:05 pm
  • Hi everyone, Following is mine test environment: node 1:namenode, secondarynamenode, jobtracker, hbase master node 2:datanode, tasktracker In node 1, I run following COMMANDS in pig shell, but I ...
    Jun 13, 2012 at 7:05 am
    Jun 13, 2012 at 9:35 am
  • I ran the following simple pig script a = load 'a'; b = foreach a generate [222#1]; dump b; but it gave the following error $ pig -x local a.pig 2012-06-07 20:49:13,039 [main] INFO ...
    Jun 8, 2012 at 3:52 am
    Jun 27, 2012 at 6:59 pm
  • This has to be something obvious but I can’t seem to get python parameters once I add in a main(). Thanks. ~/pig-0.10/bin/pig haha ... 2012-06-18 17:11:50,312 [main] INFO ...
    Duckworth, WillDuckworth, Will
    Jun 18, 2012 at 9:19 pm
    Jun 19, 2012 at 11:25 pm
  • I am looking at how to parse URL with query parameters to process clickstream data. Are there any examples I can look at? My steps that I envision are: 1) Read lines and convert query parameters into ...
    Mohit AnchliaMohit Anchlia
    Jun 11, 2012 at 5:55 pm
    Jun 18, 2012 at 5:36 pm
  • this is what happened with my pig script. why would it generate 2 map-only jobs? wouldn't the optimization process chain together both mappers and keep only 1 mapper stage? thanks Yang
    Jun 12, 2012 at 6:22 am
    Jun 18, 2012 at 2:01 am
  • Hi Guys, Is there is any way in Pig to check either the field is integer or not. I have a pig script with a field coming as string and I am expecting it to have always integer value, but due to some ...
    Sonia gehlotSonia gehlot
    Jun 7, 2012 at 10:19 pm
    Jun 8, 2012 at 3:27 pm
  • Hello, We have wrote a HiveLoader that loads data from a hive warehouse (HCatalogue had roadblocks at the time and we decided against using it) We have one minor issue that would be great to solve ...
    Alex RovnerAlex Rovner
    Jun 1, 2012 at 3:49 pm
    Jun 3, 2012 at 3:00 am
  • Hi list, I have written two UDFs that run fine with Hadoop 0.20.2 and Pig 0.10.0. I am trying to switch from Hadoop 0.20.2 to 0.23 and am encountering some errors on running my tests ...
    Johannes SchwenkJohannes Schwenk
    Jun 20, 2012 at 11:39 am
    Jul 3, 2012 at 4:13 pm
  • We found the following simple logic will cause very long compiling time for pig 0.10.0, while using pig 0.8.1, everything is fine. A = load 'A.txt' using PigStorage() AS (m: int); B = FOREACH A { ...
    Danfeng LiDanfeng Li
    Jun 26, 2012 at 10:12 pm
    Jun 26, 2012 at 11:21 pm
  • Hi all, I found in pig latin a 'matches' operator for pattern matching. I didn't find it in documentation but maybe there exists something similar but for searching? Basically in java world I would ...
    Jakub GlapaJakub Glapa
    Jun 18, 2012 at 11:15 am
    Jun 19, 2012 at 10:57 am
  • Hi, I am getting an out of memory error while running Pig. I am running a pretty big job with one master node and over 100 worker nodes. Pig divides the execution in two map-reduce jobs. Both the ...
    Pankaj GuptaPankaj Gupta
    Jun 18, 2012 at 6:16 am
    Jun 19, 2012 at 3:48 am
  • Hey all, we're currently testing to switch over from CDH3 to CDH4. When I try to read my Avro input data I get en Schema unknown Error: bash-3.2$ pig 12/06/15 08:48:08 WARN pig.Main: Cannot write to ...
    Markus ReschMarkus Resch
    Jun 15, 2012 at 8:50 am
    Jun 18, 2012 at 5:11 pm
  • Hello, I'm wondering why I cannot precise the output file name for example: C = store user_results into 'tables/user.txt'; this command create a *folder *with the name *user.txt* and inside it I find ...
    Baraa MohamadBaraa Mohamad
    Jun 14, 2012 at 4:47 pm
    Jun 14, 2012 at 5:38 pm
  • How do I store the pig console output to a file. pig -x mapred -l logs -param $xyz=1000 pqr.pig a.txt does not work for me. Are there any tricks to make this work? Or is it available somewhere else ...
    Shan sShan s
    Jun 13, 2012 at 4:06 pm
    Jun 14, 2012 at 5:23 am
  • how do i go about writing simple " CASE " statement in apache pig. example ------------ case when a1 = b1 then c1 when a = b2 then c2 end any inputs appreciated.
    Srini NaravatlaSrini Naravatla
    Jun 7, 2012 at 2:59 pm
    Jun 7, 2012 at 6:08 pm
  • Hi all I'm new in pig and in hadoop . Can you tell me how I can : 1. append to existing file on HDFS with pig 2. update file with pig, if it could be passible. 10x. -- -- Michael G. --
    Michael G.Michael G.
    Jun 1, 2012 at 6:55 pm
    Jun 3, 2012 at 6:11 am
  • Pig has this CSVExcelStorage [1] and CSVLoader [2] as part of PiggyBank. It may help. [1] [2] ...
    Subir SSubir S
    Jun 27, 2012 at 11:21 am
    Jun 28, 2012 at 6:50 pm
  • I'm trying to run the following code, ---- I had another piece of code that looks exactly the same , just var name differences, that one runs fine, but this one gave errors: 2012-06-20 20:14:38,285 ...
    Jun 21, 2012 at 3:16 am
    Jun 27, 2012 at 8:22 pm
  • Hi All, I've yet to be able to successfully import the datetime module in a Jython UDF when running my Pig script on our cluster. Having perused past message traffic, it seems the answer was to put ...
    Chris DiehlChris Diehl
    Jun 20, 2012 at 12:16 am
    Jun 20, 2012 at 1:43 am
  • Hi All, I recently downloaded and installed Pig 0.10 on our Hadoop cluster. After configuring things as I've done before to use Jython UDFs, I'm seeing deserialization errors. I've verified that my ...
    Chris DiehlChris Diehl
    Jun 8, 2012 at 7:36 pm
    Jun 19, 2012 at 11:30 pm
  • Hello, The example of DIFF lacks of the generate. It should be X = FOREACH A GENERATE DIFF(B1,B2); Regards! Yong
    Jun 13, 2012 at 12:24 pm
    Jun 18, 2012 at 2:11 am
  • In production I use short Pig scripts and schedule them with Azkaban with dependencies setup, so that I can use Azkaban to restart long data pipelines at the point of failure. I edit the failing pig ...
    Russell JurneyRussell Jurney
    Jun 16, 2012 at 2:36 am
    Jun 16, 2012 at 6:56 pm
  • Tuesday, Pig Meetup Alan Gates - upcoming improvements in operators/backend physical plan. Desphagetification. Reworking UDF interface, keep backward compatibility. Hadoop 2 coming, will be slow ...
    Russell JurneyRussell Jurney
    Jun 13, 2012 at 4:46 am
    Jun 16, 2012 at 2:10 am
  • I need to set mapred.min.split.size for one part of my pig script because the mapper job corresponding to the first part of the script takes much longer time per input record than other parts of the ...
    Jun 11, 2012 at 2:07 am
    Jun 15, 2012 at 7:41 am
  • Hello, I checked out branch-0.10, and I am trying to run e2e RubyUDFs tests in MR mode. But I am getting the following error: java.lang.IllegalStateException: *Could not initialize interpreter (from ...
    Cheolsoo ParkCheolsoo Park
    Jun 8, 2012 at 12:31 am
    Jun 11, 2012 at 5:20 pm
  • I want to copy 26,000 HDFS files generated by a pig script to Amazon S3. I am using the copyToLocal command, but I noticed the copy throughput is only one file per second - so it is going to take ...
    James NewhavenJames Newhaven
    Jun 8, 2012 at 11:40 am
    Jun 9, 2012 at 3:14 pm
  • Hello, I'm having a lot of null entries in my data. Due to later processing it would be very helpful if I could set a default value for null to be the string "other". I couldn't find a way to do this ...
    Mario LassnigMario Lassnig
    Jun 7, 2012 at 11:38 am
    Jun 8, 2012 at 11:30 pm
  • According to the documentation it looks like the only way to share macros is through a file. What about importing a macros file from a resource stored in a jar? Has this been considered? I was ...
    Matthew HayesMatthew Hayes
    Jun 8, 2012 at 1:14 am
    Jul 30, 2012 at 10:29 pm
  • Hi ! I've a bag of map... ([k3#v13,k1#v11,k2#v12]) ([k1#v12,k2#v22]) ([k4#v31]) ... and would like to extract all key names: (k3) (k1) (k2) (k1) (k2) (k4) I cannot figure out how to do this (except ...
    Vincent BaratVincent Barat
    Jun 29, 2012 at 12:06 pm
    Jun 29, 2012 at 6:56 pm
  • pigUnit compares the serialized string of an output result against a pre-defined expected output string. this is a problem with DataBag output, cuz the order of elements in a bag is not guaranteed ...
    Jun 22, 2012 at 9:30 pm
    Jun 27, 2012 at 11:24 pm
  • if I run hadoop jar myjar.jar and myjar.jar contains some hadoop classes, where would these hadoop classes be loaded from ? the system hadoop installation or from myjar.jar ? Thanks Yang
    Jun 25, 2012 at 9:19 pm
    Jun 27, 2012 at 4:04 pm
  • if I have an assert line: assertOutput('myinput' , input_data, 'myoutput', output_data); and in my test pig script, I dump out the var myoutput: dump myoutput; the UDF used in the script fails to get ...
    Jun 23, 2012 at 12:34 am
    Jun 24, 2012 at 5:52 am
  • Hello, I am building from $PIG_HOME under Hadoop 23: ant eclipse-files and I get the following error: BUILD FAILED /home/kereno/hadoop23/pig10/build.xml:301 ...
    Keren OuaknineKeren Ouaknine
    Jun 21, 2012 at 8:11 am
    Jun 22, 2012 at 4:10 pm
  • My script is simple: /* Avro */ register /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/avro-1.5.3.jar register /home/hadoop/pig-0.10.0/build/ivy/lib/Pig/json-simple-1.1.jar register ...
    Russell JurneyRussell Jurney
    Jun 22, 2012 at 1:57 am
    Jun 22, 2012 at 2:03 am
  • It can even be a bytearray. Basically I have a bunch of files, and I want one file - one row. Is there an easy way to do this? Or will I need to provide a special fileinputformat etc?
    Jonathan CoveneyJonathan Coveney
    Jun 21, 2012 at 9:35 pm
    Jun 21, 2012 at 10:35 pm
Group Navigation
period‹ prev | Jun 2012 | next ›
Group Overview
groupuser @
categoriespig, hadoop

88 users for June 2012

Yang: 32 posts Jonathan Coveney: 23 posts Dmitriy Ryaboy: 18 posts Russell Jurney: 17 posts Alan Gates: 16 posts Subir S: 16 posts Daniel Dai: 12 posts Thejas Nair: 11 posts Johannes Schwenk: 10 posts Prashant Kommireddi: 10 posts Yonghu: 9 posts Markus Resch: 8 posts Mohit Anchlia: 8 posts Norbert Burger: 8 posts Pankaj Gupta: 8 posts Ruslan Al-Fakikh: 8 posts Alex Rovner: 7 posts Duckworth, Will: 6 posts Bill Graham: 5 posts Danfeng Li: 5 posts
show more