Grokbase Groups Pig user June 2011

Search Discussions

65 discussions - 249 posts

  • Hi Guys, Can anyone please tell me how to read Explain plan in pig? When I do explain plan for any of my pig query it gives me really good flow diagram, but it uses some Pig functions so didn' t ...
    Jagaran dasJagaran das
    Jun 10, 2011 at 6:58 pm
    Jun 15, 2011 at 6:25 pm
  • Hello, I just tried the example from the pig udf manual step by step. But I got the error information. Can anyone tell me how to solve it? grunt REGISTER /home/huyong/test/myudfs.jar; grunt A = LOAD ...
    Jun 18, 2011 at 9:23 am
    Jun 20, 2011 at 7:16 am
  • Hi, This is probably not directly a Pig question. Anyone running Pig on amazon EC2 instances? Something's not making sense to me. I ran a Pig script that has about 10 mapred jobs in it on a 16 node ...
    Dexin WangDexin Wang
    Jun 13, 2011 at 6:55 pm
    Jun 16, 2011 at 4:17 am
  • Hi So I checked out the pig version from, because the 0.8 from the website wouldn't build on my machine and I needed to build the project for my UDFs. So ...
    Marian ConduracheMarian Condurache
    Jun 30, 2011 at 1:40 pm
    Jul 4, 2011 at 5:53 am
  • Hi, At the last contributors meeting we discussed the need to balance fast turn-around of features while maintaining stability of release branches. The conclusion we came to is to do time based ...
    Olga NatkovichOlga Natkovich
    Jun 2, 2011 at 9:10 pm
    Jun 10, 2011 at 10:46 pm
  • Does anyone have followup to this problem? I am getting: Caused by: Deserialization error: could not instantiate ...
    Daniel EklundDaniel Eklund
    Jun 2, 2011 at 10:57 pm
    Jun 7, 2011 at 2:53 pm
  • Hi folks, We've migrated to pig 0.8.1 and everything went pretty smoothly except for one oddity involving how we generate schemas for complex Thrift structures; namely, it seems like we get into ...
    Dmitriy RyaboyDmitriy Ryaboy
    Jun 18, 2011 at 2:42 pm
    Jun 20, 2011 at 5:16 pm
  • Hello, I would like to experiment with PigUnit and cannot find a jar file that I need to reference in order to use the framework. Anyone? Thanks Alex R
    Alex RovnerAlex Rovner
    Jun 17, 2011 at 11:06 pm
    Jun 18, 2011 at 3:29 pm
  • Hi all, *I am receiving the following exception:* org.apache.pig.backend.executionengine.ExecException: ERROR 2078: Caught error from UDF: org.apache.pig.piggybank.evaluation.math.DoubleMax [Caught ...
    Lakshminarayana MotamarriLakshminarayana Motamarri
    Jun 16, 2011 at 10:36 am
    Jun 18, 2011 at 1:48 pm
  • Hello all- I've got a quick question and Google isn't proving to be much help. I've got a big file, that has a few lines in it prefaced with a pound sign (#) to indicate they are to be ignored. I ...
    Moore, Michael A.Moore, Michael A.
    Jun 7, 2011 at 7:04 pm
    Jun 8, 2011 at 2:44 pm
  • Hi, I am using Pig for number crunching on data that has a large number of columns (~300 or so). The script has around 25 operators and all I am doing in the script is group bys and SUMs. The script ...
    Shubham ChopraShubham Chopra
    Jun 15, 2011 at 5:10 pm
    Jun 15, 2011 at 10:59 pm
  • Howdy, I'm coming from cassandra, and I'm actually trying to count all columns in a column family. I believe that is similar to counting the number tuples in a bag in the lingo in the pig manual. It ...
    William ObermanWilliam Oberman
    Jun 3, 2011 at 7:54 pm
    Jun 8, 2011 at 9:32 pm
  • Hi, I hava some files in the hdfs://path/load/ like this: file_29_00001 file_47_00001 file_16_00001 ... These files are generate by other M/R jobs. The files are only contains one column, and the ...
    Jameson LiJameson Li
    Jun 13, 2011 at 11:08 am
    Jun 17, 2011 at 9:47 am
  • I think I'm stuck on typing issues trying to store data in cassandra. To verify, cassandra wants (key, {tuples}) My pig script is fairly brief: raw = LOAD 'cassandra://test_in/test_cf' USING ...
    William ObermanWilliam Oberman
    Jun 15, 2011 at 6:18 pm
    Jun 15, 2011 at 7:25 pm
  • Hi, I'm using PIG in local mode during development phases, and I get bored about traces outputted by local jobs (the one attached). Is there a way to get rid of them? Thanks for your help. 2011-06-27 ...
    Vincent BaratVincent Barat
    Jun 27, 2011 at 10:46 am
    Aug 26, 2011 at 4:14 pm
  • Hi All, I have few sample log: - - [10/Apr/2007:10:40:54 +0300] "GET /favicon.ico HTTP/1.1" 200 766 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20061201 Firefox/ ...
    Abh notAbh not
    Jun 27, 2011 at 7:50 am
    Jun 27, 2011 at 8:41 pm
  • Hi, I have columns in HBase which contain names with long numbers and are stored as bytes. I'm using pig 0.8.1 and I'm trying to load the data stored in them. When scanning the table in the HBase ...
    Juan Martin PampliegaJuan Martin Pampliega
    Jun 23, 2011 at 6:25 pm
    Jun 27, 2011 at 5:52 pm
  • Hi all, allows storing the Pig output into different directories, taken from a given field in a relation, so that the output is partitioned by the unique values of that ...
    Thomas KapplerThomas Kappler
    Jun 16, 2011 at 7:39 am
    Jun 17, 2011 at 4:44 pm
  • Hi, My pig query is roughly the following: register some_lib.jar a = load 'somefile' using CustomUDF(); b = foreach a generate CustomProjectionUDF(); c = foreach b generate var1, var2, var3; d = ...
    Shubham ChopraShubham Chopra
    Jun 16, 2011 at 6:13 pm
    Jun 16, 2011 at 7:48 pm
  • Recently I uncovered a nasty situation in my data that caused an IndexOutOfBoundsException. I am including a sample pig script and data (at the bottom) that illuminate the concern. Succinctly: ...
    Daniel EklundDaniel Eklund
    Jun 9, 2011 at 11:53 am
    Jun 9, 2011 at 11:27 pm
  • I'm trying to load data from HBase and process it with Pig. I'm running Pig and HBase en local mode and they both run fine on their own. I'm using the following script: REGISTER ...
    Juan Martin PampliegaJuan Martin Pampliega
    Jun 3, 2011 at 8:18 pm
    Jun 9, 2011 at 1:02 pm
  • I am getting a error while running a Pig Script on a 400MB compressed file. But the script works fine with a sample input file with 1000 lines. The error details are given below. Any thoughts ? ...
    Subhramanian, DeepakSubhramanian, Deepak
    Jun 1, 2011 at 1:00 pm
    Jun 1, 2011 at 6:05 pm
  • I was wondering why assertOutput in PigTest calls registerScript twice? Once in assertOutput and then again in getAlias? I added a mv to the end of my pig script and its getting called each time ...
    Jennie Cochran-ChinnJennie Cochran-Chinn
    Jun 29, 2011 at 7:09 pm
    Jul 15, 2011 at 11:03 pm
  • Hi All, I want to tokenize a string and then want to do some processing on token. for example: line = "Today is first day of the week and its Monday" p = load 'file.txt' as (line: chararray); t = ...
    Sonia gehlotSonia gehlot
    Jun 27, 2011 at 9:14 pm
    Jun 27, 2011 at 11:29 pm
  • Hi all, I'm getting the exception (at the end) from the following using Pig: eLine = FOREACH logLine GENERATE FLATTEN( REGEX_EXTRACT_ALL( $0, '.*Output.Count\\s*\\-\\s*([A-Za-z\\.]+)\\s*(\\d+)' ) ) ...
    Jonathan HollowayJonathan Holloway
    Jun 23, 2011 at 4:42 pm
    Jun 25, 2011 at 12:21 am
  • Hi all, Does anybody have a list of the features for the Pig 0.9 release. I noticed from SVN that there control flow structures have been added. How would these work with 0.9? Many thanks, Jon.
    Jonathan HollowayJonathan Holloway
    Jun 20, 2011 at 3:04 pm
    Jun 20, 2011 at 5:35 pm
  • Hi, In our production Cassandra systems we are observing the time taken by same PIG script keeps increasing each and every day. The PIG scripts reads data for a day at a time from a Cassandra Column ...
    Badrinarayanan SBadrinarayanan S
    Jun 17, 2011 at 6:32 pm
    Jun 18, 2011 at 3:20 am
  • Hello, When we have LOAD clause with Bag as its member, what type of input file structure is expected? Can default PigStorage() function handle that? e.g. in A = LOAD 'data.txt' AS (B: bag {T: ...
    Saumitra ShahapureSaumitra Shahapure
    Jun 14, 2011 at 7:47 pm
    Jun 15, 2011 at 6:38 pm
  • Hi, can anyone let me know the configuration steps required to read data from Cassandra using PIG with consistency level of LOCAL_QUORUM. Thanks, badri
    Badrinarayanan SBadrinarayanan S
    Jun 14, 2011 at 4:23 am
    Jun 14, 2011 at 2:56 pm
  • Hi, I have a really weird problem ....i am new to PIG so I don't really understand this SUM function error ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.SUM as multiple ...
    Marian ConduracheMarian Condurache
    Jun 29, 2011 at 3:37 pm
    Jun 29, 2011 at 4:36 pm
  • Hello, I've written a custom storage for pig that mostly inherits PigStorage. However even if I have a public constructor with 2 arguments like a = load '...' using MyStorage('some', 1) Pig yields a ...
    Marius DanciuMarius Danciu
    Jun 27, 2011 at 8:02 am
    Jun 27, 2011 at 8:53 pm
  • Hello All, I'm having an issue where I get a 'ClassCastException: cannot be cast to java.lang.String' when passing in something of type chararray to REGEX_EXTRACT. ...
    Michael MayMichael May
    Jun 22, 2011 at 10:59 pm
    Jun 23, 2011 at 8:37 pm
  • Hi, Previously we had cassandra-0.7.6-2 setup installed in a cluster and used PIG Scripts for reading data from cassandra DB. Right now we changed our Cassandra version from Cassandra-0.7.6-2 to ...
    Jun 23, 2011 at 11:27 am
    Jun 23, 2011 at 5:11 pm
  • I was wondering what a good approach would be to the following: On each node in a Hadoop cluster I have the same directory with different log files in them (in the local filesystem, not hdfs). I'd ...
    Dylan ScottDylan Scott
    Jun 18, 2011 at 10:25 pm
    Jun 19, 2011 at 9:46 pm
  • Hi, I tried to use python udf in pig, but get the error 'Could not initialize class org.apache.pig.scripting.jython.JythonScriptEngine $Interpreter'. I register jython.jar in my pig script as ...
    Bing WeiBing Wei
    Jun 17, 2011 at 8:26 pm
    Jun 18, 2011 at 3:30 pm
  • Hello, I'm having an issue with regex in pig. Specifically, I'm loading an apache access log and trying to break out the bits from the query string: logs = LOAD '$input' using logloader as ...
    Jun 18, 2011 at 12:44 am
    Jun 18, 2011 at 1:14 am
  • I have log files like this: #timestamp (ms), server, user, action, domain , x, y , z 1262332800008, 7, 50817, 2,, 31, blahblah, foobar 1262332800017, 2, 373168, 0,, 67, blahblah, ...
    Sujee ManiyamSujee Maniyam
    Jun 17, 2011 at 10:38 pm
    Jun 18, 2011 at 12:36 am
  • Hi all, I was wondering whether somebody could explain how Pig deals with nested directories of log files, Something like: /logs/2011-01-01/a.log /logs/2011-01-01/b.log /logs/2011-01-01/c.log I'm ...
    Jonathan HollowayJonathan Holloway
    Jun 16, 2011 at 1:58 am
    Jun 17, 2011 at 10:02 am
  • We started doing this recently and thought it might be useful to others. Pig (and Hive) have a sample function that allows you to sample data from your data store. In pig it looks something like ...
    Jeremy HannaJeremy Hanna
    Jun 15, 2011 at 5:36 pm
    Jun 15, 2011 at 7:01 pm
  • Hi, I'm looking to perform a sum normalization (divide a score by the sum of scores of my data) with pig. 1) My first problem is I can't find a great way to do that. Any suggestion? I have an answer ...
    Tristan CroisetTristan Croiset
    Jun 14, 2011 at 1:59 pm
    Jun 14, 2011 at 6:44 pm
  • I looked through the help and the docs pages but couldn't find anything that did this. Is there any way to show a list of current relations loaded while on the grunt shell? It would seem that the ...
    Jeremy HannaJeremy Hanna
    Jun 11, 2011 at 4:19 pm
    Jun 14, 2011 at 5:23 pm
  • I have a pig script that is working well for small test data sets but fails on a run over realistic-sized data. Logs show INFO ...
    William DowlingWilliam Dowling
    Jun 10, 2011 at 6:16 pm
    Jun 10, 2011 at 7:57 pm
  • I'm currently trying to write a pig script to output a feature index. Is there a built-in function for converting an unknown length tuple to output once for each item in the tuple? Example code: raw ...
    Xavier StevensXavier Stevens
    Jun 2, 2011 at 6:39 pm
    Jun 2, 2011 at 6:58 pm
  • As I know, the older version of PigStorage only supports a single char delimiter. Is this changed in the latest version? Thanks, Michael
    Jiang lichtJiang licht
    Jun 29, 2011 at 3:50 am
    Jul 1, 2011 at 12:59 pm
  • I was just wondering if the following was a common scenario for others and whether things could be done in a more debug friendly way under the covers. Currently we've found that developing with pig ...
    Jeremy HannaJeremy Hanna
    Jun 25, 2011 at 4:57 pm
    Jun 30, 2011 at 3:55 am
  • Hello, I have created a new maven project in Eclipse and have added the following dependencies: <dependency <groupId org.apache.pig</groupId <artifactId pig</artifactId <version ...
    Alex RovnerAlex Rovner
    Jun 23, 2011 at 4:08 pm
    Jun 24, 2011 at 12:51 am
  • I'm having trouble trying to flatten a bag to a tuple of int's in Pig, e.g. {(12),(4),(7),(190)} to: (12,4,7,190) It seems like it should be trivial to do, but not quite sure how to do it. Can this ...
    Jonathan HollowayJonathan Holloway
    Jun 22, 2011 at 4:29 pm
    Jun 22, 2011 at 10:04 pm
  • Hello all, I'm trying to run pig 0.8.1 jobs on top of HBase with a custom LoadFunc. This has worked in the past for us but for some reason it's now not working and I can't quite tell why. With ...
    Erik OnnenErik Onnen
    Jun 20, 2011 at 6:08 pm
    Jun 20, 2011 at 8:40 pm
  • Does anyone know the current status of the project Howl? Did anyone have any luck using it with Pig? Thanks Alex
    Alex RovnerAlex Rovner
    Jun 18, 2011 at 2:37 am
    Jun 18, 2011 at 8:12 am
  • Hello Pig mailing list, I have around 10 TB of apache log files (1 TB as .gz compressed files) and analyze these files with pig. Obviously apache log files can be compressed pretty good with gzip, so ...
    Dirk MstDirk Mst
    Jun 17, 2011 at 8:45 am
    Jun 18, 2011 at 2:31 am
Group Navigation
period‹ prev | Jun 2011 | next ›
Group Overview
groupuser @
categoriespig, hadoop

65 users for June 2011

Dmitriy Ryaboy: 30 posts Daniel Dai: 18 posts Alan Gates: 12 posts Jeremy Hanna: 10 posts Daniel Eklund: 9 posts Jonathan Coveney: 9 posts Shubham Chopra: 9 posts Thejas M Nair: 9 posts William Oberman: 9 posts Dexin Wang: 8 posts Jonathan Holloway: 8 posts 勇胡: 8 posts Alex Rovner: 7 posts Jameson Li: 6 posts Juan Martin Pampliega: 6 posts William Dowling: 4 posts Badrinarayanan S: 4 posts Kim Vogt: 4 posts Moore, Michael A.: 4 posts Abh not: 3 posts
show more