Grokbase Groups Pig user August 2011

Search Discussions

71 discussions - 271 posts

  • I'm currently testing PIG 0.9.x branch. Several of my jobs that use to work correctly with PIG 0.8.1 now fail due to a cast error returning a null pointer in one of my UDF function. Apparently, PIG ...
    Vincent BaratVincent Barat
    Aug 29, 2011 at 3:06 pm
    Oct 6, 2011 at 9:36 pm
  • I'm analyzing a daily apache log file. I'd like to get the number of requests and of visits by hour. I managed to get the requests, but how do I get the visits? grunt RAW_LOGS = LOAD '<log-file ' ...
    David RiccitelliDavid Riccitelli
    Aug 19, 2011 at 2:13 pm
    Aug 19, 2011 at 4:52 pm
  • I think this is similar to to the 'merge' join issue not being automatically supported. If we have done a GROUP in the past, this data should have been mapped, then handed off to the reducers and ...
    Kevin BurtonKevin Burton
    Aug 30, 2011 at 8:21 pm
    Aug 31, 2011 at 11:00 pm
  • This just bit me. I can do: STORE data INTO '/tmp/brokenfs.out'; but fs -ls '/tmp/brokenfs.out'; won't work because it can't be quoted. fs -ls /tmp/brokenfs.out; works though. ………... I'm pretty sure ...
    Kevin BurtonKevin Burton
    Aug 24, 2011 at 8:28 pm
    Aug 25, 2011 at 6:26 pm
  • All, I have some data that I would like to store into a file and then load it in a UDF to do some operations in the next pig statement. For example, doc_ids = FOREACH docs GENERATE doc_id; STORE ...
    Eshwaran Vijaya KumarEshwaran Vijaya Kumar
    Aug 10, 2011 at 4:10 pm
    Aug 11, 2011 at 12:51 am
  • Hello, I found pig-0.8.1 included classes of Hbase-0.90.0. My question is: 1. If I replace Hbase-0.90.3 with Hbase-0.90.0, could pig-0.8.1 work normally? 2. why Hbase class files are included in ...
    Aug 7, 2011 at 3:58 pm
    Jun 12, 2012 at 1:22 am
  • I'm optimizing a somewhat large pig job. One of the intermediate steps is a group which we use moving forward. The data right now looks like: 0 {(1),(2),(3),(4)} which has a second column of a bag of ...
    Kevin BurtonKevin Burton
    Aug 20, 2011 at 8:48 am
    Aug 23, 2011 at 2:37 am
  • Hi Eric, Thanks for your response. Brisk sounds nice, but I feel that disregarding HDFS and totally switching to Cassandra is not the right thing to do. Just my opinion there. I feel we are not using ...
    Tharindu MathewTharindu Mathew
    Aug 30, 2011 at 5:07 pm
    Aug 31, 2011 at 5:31 am
  • I have a Jython UDF I've written that works fine in local mode but bombs out when I run it on my cluster. I'm running 0.8.0, and my stack trace and environment variables are below. ...
    Mark RoddyMark Roddy
    Aug 30, 2011 at 12:15 am
    Aug 30, 2011 at 6:00 pm
  • Hello, I runTestDataModel test case with IBM JDK, and got the following error: Testcase: testTupleToString took 0.002 sec FAILED toString expected:< a little ...
    Aug 17, 2011 at 2:56 am
    Aug 26, 2011 at 3:28 am
  • I'm looking at BinStorage which I believe if I've read correct is used for all Pig intermediate files. … so any optimizations here would be transparent to the user. I just did a simple STORE using ...
    Kevin BurtonKevin Burton
    Aug 20, 2011 at 10:44 am
    Aug 22, 2011 at 3:15 pm
  • OK….. I still can't get this to work. I've read the documentation and i still get the same error on 0.9.0 … Here's my code. I think it's implying that I need to have the predecessor as a LOAD and ...
    Kevin BurtonKevin Burton
    Aug 20, 2011 at 9:03 pm
    Aug 22, 2011 at 5:18 am
  • I was reading about USING 'merge' with JOIN when relations are already sorted. I actually was just looking through some code and realized that one of my JOINs was on two relations that were *already* ...
    Kevin BurtonKevin Burton
    Aug 20, 2011 at 5:52 am
    Aug 21, 2011 at 4:59 pm
  • Hi, I'm running pig jobs using Amazon pig support, where you submit jobs with comma concatenated parameters like this: elastic-mapreduce --pig-script --args myscript.pig --args ...
    Dexin WangDexin Wang
    Aug 17, 2011 at 11:22 pm
    Aug 18, 2011 at 7:44 pm
  • Hi, I have some metrics stored on a Cassandra supercolumn and the subcolumns are the timestamps of each metric, I'm loading the metrics in pig with this line: all_metrics = LOAD ...
    Fabio SoutoFabio Souto
    Aug 17, 2011 at 3:10 pm
    Aug 18, 2011 at 6:40 am
  • Despite any amount of finagling I do with the classpath, I can't get pig to connect to my local pseudo-distributed hadoop instance NOR my cluster on EC2. My EC cluster is 20.2 CDH, local ...
    Chris AllenChris Allen
    Aug 15, 2011 at 10:38 pm
    Aug 17, 2011 at 8:59 pm
  • Hi all, I'm curious if it's possible to migrate Apache Pig to other MR runtimes instead of Hadoop? I assume it requires tons of work to do so, right? Thanks, Yuduo
    Yuduo ZhouYuduo Zhou
    Aug 29, 2011 at 8:18 pm
    Sep 1, 2011 at 1:10 am
  • Hey to everyone! I've encountered with a problem when I need to pass null or empty -param to pig, but I can't figure out how could it be done? Following does not work: /pig/bin/pig -param rootPath= ...
    Marek MiglinskiMarek Miglinski
    Aug 22, 2011 at 10:13 am
    Aug 30, 2011 at 3:03 pm
  • I'm reading the documentation and it says: "*Regular Join Optimizations* Optimization for regular joins ensures that the last table in the join is not brought into memory but streamed through ...
    Kevin BurtonKevin Burton
    Aug 28, 2011 at 7:12 pm
    Aug 29, 2011 at 9:43 pm
  • My apologies if this is in the docs somewhere, I was unable to find anything, but I might be calling it the wrong name. I'm doing a full outer join in Pig - as such, one or the other join keys may be ...
    James KebingerJames Kebinger
    Aug 29, 2011 at 6:15 pm
    Aug 29, 2011 at 7:18 pm
  • Hi, I run PIG jobs from a Java process (using PigServer). Most of which use HBaseStorage to load data from HBase. Each job is run using a new PigServer object, and I correctly call ...
    Vincent BaratVincent Barat
    Aug 26, 2011 at 12:30 pm
    Aug 29, 2011 at 1:43 pm
  • Hi, Over the bunch of request I run using PIG 0.8.1, the most heavy one is the following: /* load session data from HBase */ start_sessions = load ... (start of sessions) end_sessions = load ... (end ...
    Vincent BaratVincent Barat
    Aug 23, 2011 at 4:28 pm
    Aug 26, 2011 at 12:11 pm
  • Hello, On, one can find the scripts to generate data. But the data seems generic, meaning no relation to the pigmix scripts 1-17 published on ...
    Keren OuaknineKeren Ouaknine
    Aug 26, 2011 at 1:48 am
    Aug 26, 2011 at 4:33 am
  • Hi Folks, I want to delete a file "xyz.tmp" from my hdfs location below: hdfs://MASTER/user/test/xyz.tmp I have embedded the following statement in my pigscript: --a.pig fs -rmr 'xyz.tmp'; Everytime ...
    Ipshita chatterjiIpshita chatterji
    Aug 25, 2011 at 8:31 am
    Aug 25, 2011 at 5:03 pm
  • Here's an explain I'm trying to grok. The last Load is frustrating because the file isn't descriptive at all. I have to scroll up and find out which file it was from which mapred job. I the file had ...
    Kevin BurtonKevin Burton
    Aug 23, 2011 at 8:19 pm
    Aug 23, 2011 at 11:46 pm
  • Hi, Iam able to compile pig udf for pig-0.8.0 version . Its giving me an error when I have tried compiling on pig-0.8.1 version. following is the error message: cannot access ...
    Aug 19, 2011 at 6:33 pm
    Aug 23, 2011 at 5:10 pm
  • Hello, I'm trying to generate a tuple from a very wide data set, but running in to problems. I'm running Pig 0.9.0 r1148983 in local mode. Because the data set it so wide, I'd prefer not to ...
    Aug 16, 2011 at 10:17 pm
    Aug 18, 2011 at 6:02 pm
  • Hi All, I am trying to perform a join of some hbase tables in pig and I am using HBaseStorage to load the data from hbase in pig . I was able to load my data using HBaseStorage but I have one ...
    Gayatri RaoGayatri Rao
    Aug 15, 2011 at 5:58 pm
    Aug 15, 2011 at 6:38 pm
  • Hi folks, We have a ~35 GB Hbase table that's split across several hundred regions. I'm using the Pig version bundled with CDH3u1, which is 0.8.1 plus a few patches. In particular, it includes ...
    Norbert BurgerNorbert Burger
    Aug 15, 2011 at 4:20 pm
    Aug 15, 2011 at 6:14 pm
  • org.apache.pig.PigCounters PROACTIVE_SPILL_COUNT_RECS 2,372,598 2,372,598 SPILLABLE_MEMORY_MANAGER_SPILL_COUNT 64 64 PROACTIVE_SPILL_COUNT_BAGS I was checking my jobtracker and I have no idea what ...
    Sean BarrySean Barry
    Aug 2, 2011 at 6:43 pm
    Aug 3, 2011 at 6:30 pm
  • How is UNION implemented? Does it read from two source files or does it create a temporary file by reading the N source files/relations and then writing a new temp file which is then read from? I ...
    Kevin BurtonKevin Burton
    Aug 29, 2011 at 8:32 pm
    Aug 29, 2011 at 9:45 pm
  • Hi I read on the wiki that further developments will be carried out allowing users to write their UDFs in other languages. I am specifically interested in being able to use R functions in Pig. Also, ...
    Asif JanAsif Jan
    Aug 29, 2011 at 2:25 pm
    Aug 29, 2011 at 6:36 pm
  • The COUNT_STAR thing bites people a lot -- clearly, even the most advanced Pig users mess this up once in a while. It's a really hard bug to track down. We should reconsider our decision to make ...
    Dmitriy RyaboyDmitriy Ryaboy
    Aug 1, 2011 at 7:18 pm
    Aug 22, 2011 at 5:09 pm
  • Hi , Please see the code snippet below: register pig.jar; register piggybank.jar; o1 = load 'observations.csv' as (obs_id, encounter_id, sub_form_id, observed_by, verified_by, remark); oc1 = load ...
    Ipshita chatterjiIpshita chatterji
    Aug 19, 2011 at 2:42 pm
    Aug 19, 2011 at 6:27 pm
  • I have a need within a larger Pig script to pull just a few records from an Hbase table. I know the exact key, so it'd be trivial with a get() from a UDF. Another alternative is use to a custom ...
    Norbert BurgerNorbert Burger
    Aug 19, 2011 at 4:17 pm
    Aug 19, 2011 at 4:31 pm
  • Hi Folks, I am very new to PIG. I am facing problems in using DiffDate function present in org.apache.pig.piggybank. evaluation.datetime. How do I pass 2 dates in a tuple format? I get an error. This ...
    Ipshita chatterjiIpshita chatterji
    Aug 19, 2011 at 4:06 am
    Aug 19, 2011 at 6:05 am
  • Hey, I wanted to see if the following is possible in pig-0.8.1. a = load '/logs/apache/*/today/access.log.txt' USING PigStorage() AS ('.... tuple') I want to add to the existing tuple a chararray ...
    Sridhar basamSridhar basam
    Aug 18, 2011 at 7:19 pm
    Aug 18, 2011 at 7:53 pm
  • Hello, In pig-0.8.1/src directory, I did not find any java file import javax.servlet.jsp... So my question is: why jsp-api-2.1-6.1.14 is included in pig-0.8.1-SNAPSHOT.jar? Can I replace it with ...
    Aug 12, 2011 at 7:03 am
    Aug 17, 2011 at 6:44 pm
  • Hi group, Can I use a nested group in foreach? For example: A = load data ... as (a1:..., a2:..., a3:..., ...) B = group A by a1; C = foreach B { * inner_group = group A by a2;* generate group, ...
    Aug 16, 2011 at 11:12 pm
    Aug 17, 2011 at 6:20 pm
  • Hi dear pigs, I got a problem: When I use UNION command to combine some results in one relation at the end of pig script, it sometimes will miss some results from UNION. For example: union_all_res = ...
    Aug 17, 2011 at 7:15 am
    Aug 17, 2011 at 5:54 pm
  • Hi all first, sorry if this has been asked before, but could not find any reference in the list archives. I have tried to run the PigUnit example (top_queries.pig) provided on ...
    Sotiris MatzanasSotiris Matzanas
    Aug 11, 2011 at 8:53 am
    Aug 16, 2011 at 7:14 am
  • I am trying to use the PIG SUM function to sum a group of integers created by a UDF and I am getting Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer A = ...
    Rob parkerRob parker
    Aug 11, 2011 at 3:16 pm
    Aug 11, 2011 at 6:27 pm
  • Hello, I found pig-0.8.1 included junit-4.5 class files. Could you please give me some suggestion my questions : Can pig-0.8.1 can work with junit 4.3.1 or 4.8.1 or 4.8.2? why included classes in ...
    Aug 7, 2011 at 3:53 pm
    Aug 9, 2011 at 10:16 am
  • Hi, I have been struck with this exception: config() at org.apache.hadoop.conf.Configuration.( at ...
    Jagaran dasJagaran das
    Aug 6, 2011 at 3:23 am
    Aug 6, 2011 at 5:00 am
  • I have been very excited to give Pig 0.9 a try and run it against our Cloudera CDH3U0 hadoop cluster and I need to point Pig to the cloudera hadoop libraries to make it work. I tried re-building pig ...
    Andy SautinsAndy Sautins
    Aug 4, 2011 at 7:58 pm
    Aug 4, 2011 at 11:20 pm
  • Fang Fang FF ChenFang Fang FF Chen
    Aug 1, 2011 at 3:08 pm
    Aug 1, 2011 at 3:18 pm
  • Hi pigs: Can I distinct by multiple columns? For example: A = load ... as (a1:int, a2:int, a3:int); B = DISTINCT A; -- It's OK. -- But can I distinct by a1 and a2? C = DISTINCT A.a1, A.a2; -- It's ...
    Aug 31, 2011 at 6:32 am
    Aug 31, 2011 at 4:07 pm
  • I'm trying to spend more time understanding EXPLAIN so I can see what optimizations pig is doing under the hood. I was actually trying to answer my own question using EXPLAIN and avoid sending a ...
    Kevin BurtonKevin Burton
    Aug 30, 2011 at 8:03 pm
    Aug 30, 2011 at 9:34 pm
  • Hello Thejas Nair, During running ant -Dtestcase=TestMergeJoinOuter test with other JDK but not SUN JDK. I found my output is different from the one under SUN JDK: SUN JDK: passed Other JDK: failed ...
    Aug 23, 2011 at 8:45 am
    Aug 25, 2011 at 12:58 am
  • I seem to have a need for pre compiler directives in pig. These aren't part of the compiled map reduce job…. For example: if file_exists( "/foo" ): run prepare.pig run execute.pig …. the prepare.pig ...
    Kevin BurtonKevin Burton
    Aug 24, 2011 at 7:28 am
    Aug 24, 2011 at 5:07 pm
Group Navigation
period‹ prev | Aug 2011 | next ›
Group Overview
groupuser @
categoriespig, hadoop

58 users for August 2011

Dmitriy Ryaboy: 33 posts Kevin Burton: 31 posts Daniel Dai: 21 posts Thejas Nair: 21 posts David Riccitelli: 13 posts Lulynn_2008: 12 posts Ashutosh Chauhan: 10 posts Alan Gates: 8 posts Vincent Barat: 8 posts Jeremy Hanna: 7 posts Fang Fang FF Chen: 6 posts Bill Graham: 5 posts Ipshita chatterji: 5 posts Eshwaran Vijaya Kumar: 4 posts Jagaran das: 4 posts Norbert Burger: 4 posts Ggrambo: 3 posts Byambajav byambajargal: 3 posts Chris Allen: 3 posts Dexin Wang: 3 posts
show more