Search Discussions
-
Howl is a table management system built to provide metadata and storage management across data processing tools in Hadoop (Pig, Hive, MapReduce, ...). You can learn more details at ...
Alan Gates
Feb 2, 2011 at 9:19 pm
Feb 8, 2011 at 6:11 pm -
Hi Guys, I getting wired error while running my pig script. *case_state = FOREACH join_pe_pre GENERATE* * f1, f2, f3, f4, (* * (f5 '.*.facebook..*')* * ? f10* * : null* * ) as facebook_referrals,* * ...
Sonia gehlot
Feb 19, 2011 at 11:39 pm
Feb 23, 2011 at 12:35 am -
Hey, I have a bunch of files where the filename is significant. I'm loading the files by supplying the top level directory that contains the files. Is there a way to capture the filename of the file ...
Kim Vogt
Feb 3, 2011 at 11:53 pm
Feb 4, 2011 at 7:30 pm -
So I finally got a couple of test scripts running on my cluster to take a sample data file, load it, do a little processing, store it, load it, do a little more processing, and dump the results. Once ...
Kris Coward
Feb 28, 2011 at 3:48 am
Mar 3, 2011 at 7:01 pm -
I tried to process a big number of small files on pig and I got a strange problem. 2011-02-27 00:00:58,746 [Thread-15] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths ...
Charles Gonçalves
Feb 27, 2011 at 3:26 am
Mar 1, 2011 at 10:07 pm -
Hey All, Running into a problem storing data from a pig script storing results into HBase. We are getting the following error: java.lang.NullPointerException at ...
Matt Davies
Feb 14, 2011 at 9:57 pm
Feb 16, 2011 at 12:02 am -
Hey all, I wanted to know if the patch from https://issues.apache.org/jira/browse/PIG-200 is safe for Pig0.8, and how to apply it is the same way as shown in the JIRA. Thanks. Renato M.
Renato Marroquín Mogrovejo
Feb 3, 2011 at 5:58 pm
Feb 8, 2011 at 6:46 am -
I ve written a simple UDF that parses a chararray (which looks like ...[a].....[b]...[a]...) to capture stuff inside brackets and return them as String a=2;b=1; and so on. The input chararray are ...
Aniket Mokashi
Feb 24, 2011 at 3:50 am
Feb 25, 2011 at 1:26 am -
I ran into a problem that I have spent quite some time on and start to think it's probably pig's doing something optimization that makes this thing hard. This is my pseudo code: raw = LOAD ... then ...
Dexin Wang
Feb 18, 2011 at 9:42 pm
Feb 22, 2011 at 6:59 pm -
I have been working my way through Pig recently with a lot of help from the folks in #hadoop-pig on Freenode. The problem I am having is with reading any gzip'd files from anywhere (either locally or ...
Eric Lubow
Feb 21, 2011 at 11:47 pm
Feb 22, 2011 at 2:23 pm -
Guys, I'm working on my MSc now using pig/hadoop to process logs. I'm basically using it to do some characterizations on a traffic analysis from some of the greatest Media groups from Brazil. One of ...
Charles Gonçalves
Feb 19, 2011 at 9:13 pm
Feb 21, 2011 at 1:36 am -
So in the interest of being a little less i/o bound, and saving a whole mess of disk, I've started using com.twitter.elephantbird.pig.store.LzoTokenizedStorage for storage... or more accurately, will ...
Kris Coward
Feb 11, 2011 at 6:48 pm
Feb 18, 2011 at 10:33 am -
Hi all if I have the following XML file <attr tag="00020000" vr="UL" len="4" 180</attr <attr tag="00020001" vr="OB" len="2" 00\01</attr *how I can read it using xmlloader, I mean how I can read for ...
Baraa Mohamad
Feb 22, 2011 at 2:59 pm
Mar 1, 2011 at 9:59 am -
Hi, I have a custom loader that creates and returns a tuple of id, bags. I want to open these bags and get their contents. For example- data = load 'loc' using myLoader() as (id, bag1, bag2); ...
Aniket Mokashi
Feb 17, 2011 at 12:59 am
Feb 24, 2011 at 12:16 am -
Hi folks, Is anyone who uses HBaseStorage in Pig still on hbase 0.20.6? There are a number of tickets outstanding to improve HBaseStorage and I've suggested that we should add a shim layer to work ...
Dmitriy Ryaboy
Feb 14, 2011 at 1:42 am
Feb 14, 2011 at 7:03 pm -
Trying to write a simple storefunc that makes use of the input data's field names. Is there a way to gain access to this inside of the call to putNext? Ostensibly you could set a variable with the ...
Jacob Perkins
Feb 1, 2011 at 4:47 am
Feb 1, 2011 at 4:42 pm -
Hello everyone I am using CDH3 Beta 4 distribution on 11 node cluster machines. I successfully installed Hadoop. However, Pig seems to be giving us some trouble. I installed Pig according to the ...
Byambajargal
Feb 23, 2011 at 12:46 pm
Feb 23, 2011 at 11:14 pm -
Hi All, I am new to Hadoop and I started exploring Pig since last month. I have few question I have to replicate some SQL query to Pig that has left join for example: select blah, blah From ...
Sonia gehlot
Feb 16, 2011 at 10:10 pm
Feb 16, 2011 at 11:49 pm -
Is possible to use a parallel statment inside a nested foreach block like in : 28 E = GROUP B ALL PARALLEL 100; 29 30 edge_breakdown = FOREACH E { 31 dist_cIps = DISTINCT B.cIp *PARALLEL X * ; 32 ...
Charles Gonçalves
Feb 11, 2011 at 4:58 pm
Feb 11, 2011 at 6:27 pm -
Guys, Does Pig read the _log directories from an output script ? What I want is to read an pig output dir (or multiples) from pig scripts. But I just want the part-XXXX files not the .part-crc or ...
Charles Gonçalves
Feb 17, 2011 at 11:12 pm
Feb 17, 2011 at 11:28 pm -
Hello all, I've been scratching my head over a problem with a pig script I'm having, and hoping another set of eyeballs will help. I'm using pig 0.8, in local mode Here's my simplified use case: I ...
James Kebinger
Feb 16, 2011 at 11:57 pm
Feb 17, 2011 at 7:02 pm -
Hi, I have been using Pig for a few months now, was using 0.7 earlier and recently migrated to 0.8. In a script I am working on right now I hit a snag where the script failed. I investigated some ...
Amramesh
Feb 14, 2011 at 9:41 pm
Feb 16, 2011 at 9:45 pm -
Last week I sent an email proposing that we, the Pig project, sponsor Howl as an incubator project. You can see the thread at http://tinyurl.com/4acfut4 . However, in proposing this I did not realize ...
Alan Gates
Feb 9, 2011 at 2:02 am
Feb 9, 2011 at 6:55 pm -
Howdy, I am curious about how algebraic UDFs are invoked. I know about how to write them, but in the case where your initial step is to create a costly datastructure, how can you ensure that this is ...
Jonathan Coveney
Feb 3, 2011 at 8:30 pm
Feb 4, 2011 at 12:10 am -
I'm looking at Pig's TupleSize implementation and wondering if it's implemented correctly: @Override public Long exec(Tuple input) throws IOException { try{ if (input == null) return null; return ...
Eric Tschetter
Feb 1, 2011 at 8:33 pm
Feb 3, 2011 at 6:52 pm -
Hi Guys, I'm Have an UDF in which I want to pass a long in a timestamp representation and get an Date formated with the SimpleDateFormat Class. I will pass to the UDF constructor the string format to ...
Charles Gonçalves
Feb 1, 2011 at 2:13 pm
Feb 3, 2011 at 2:06 am -
Hi, I'd like to be able to flatten a map, so each K- V is flattened into it's own row. Basically something like this: cat map_data.txt 32 [123#bill,222#joe] 77 [977#mary,987#jane] 44 ...
Bill Graham
Feb 28, 2011 at 6:17 am
Feb 28, 2011 at 7:29 am -
Hi I am trying to use REPLACE function in PIG to clean the string in following way.. REPLACE(REPLACE(REPLACE('string','.',' '),'-',' '),' ',' ') So if string has got . or - or spaces , it removes ...
Sanjeev Shrivastava
Feb 22, 2011 at 6:16 pm
Feb 22, 2011 at 6:32 pm -
Hi Guys, I am trying to build a web interface that can use pig to do "on demand" batch processing using pig, but JSP (or more specifically, tomcat) does not seem to want to know pig. I keep receiving ...
Robert Waddell
Feb 18, 2011 at 2:40 pm
Feb 18, 2011 at 3:14 pm -
This may be better asked on one of the other hadoop lists, but as the job in question is done with Pig I thought I would start here. I have a nightly job that runs against around 1000 gzip log files. ...
Kester, Scott
Feb 16, 2011 at 2:52 pm
Feb 17, 2011 at 11:25 am -
hi all please I need your help, i'm a newbie with pig but I really would like to use it I tried to use xmlloader within this small program register H:/apps/pig-0.7.0/piggybank.jar; A = load ...
Baraa Mohamad
Feb 11, 2011 at 6:07 pm
Feb 11, 2011 at 6:20 pm -
I'm trying just to do a breakdown for all my logs but every time I use a operation like : FILTER alias BY some_udf(alias); I got a problem. First I got : ERROR 0: Scalar has more than one row in the ...
Charles Gonçalves
Feb 11, 2011 at 1:43 am
Feb 11, 2011 at 1:25 pm -
I am trying to implement a maxmind call where I do not have to put the maxmind file on the nodes. I referred to this http://web.archiveorange.com/archive/v/3inw3FVtG19NUTr25Yra ...
Jonathan Coveney
Feb 9, 2011 at 11:22 pm
Feb 10, 2011 at 3:07 pm -
Can anyone give me any hints on why a JOIN may be failing with this weird error.... I can DESCRIBE the two tables it is joining justurls: {tweetid: bytearray,userid: bytearray,url: bytearray} userdb: ...
Alex McLintock
Feb 7, 2011 at 7:39 pm
Feb 8, 2011 at 6:57 am -
A) Am I right in thinking that no UDF can turn (1, (2,3,4) ) into (1, 2 ) (1, 3 ) (1, 4 ) because you always get out the same number of tuples as you put in? B) Would FLATTEN ($1) do that - if the ...
Alex McLintock
Feb 7, 2011 at 7:31 pm
Feb 7, 2011 at 11:11 pm -
Hey Guys, I am trying to optimize my Pig jobs as much as possible and wanted to know a little about how Pig handles its loading of data. When I have: var1 = LOAD .... local_var1 = FOREACH local_var1 ...
Robert Waddell
Feb 6, 2011 at 8:11 pm
Feb 6, 2011 at 10:41 pm -
Hi Guys, I noted that concatenated gziped files not work on Hadoop https://issues.apache.org/jira/browse/HADOOP-6335 <https://issues.apache.org/jira/browse/HADOOP-6335 So, have anyone passed by this ...
Charles Gonçalves
Feb 3, 2011 at 3:36 am
Feb 3, 2011 at 4:21 pm -
I was just curious if anyone might have a decent code example of using the distributed cache with Pig? I have a file, let's call it file.dat, and I want to make it available locally to all of my ...
Jonathan Coveney
Feb 28, 2011 at 10:43 pm
Feb 28, 2011 at 11:11 pm -
I have a jar with UDFS called squeal.jar...I know it works, because I can access methods in the jar. However, when I try this: register /home/jcoveney/udfs/squeal/jar/squeal.jar; A = LOAD 'test.txt' ...
Jonathan Coveney
Feb 28, 2011 at 4:24 pm
Feb 28, 2011 at 5:08 pm -
This seems like it should work: register '/tmp/test-udfs.jar'; /* package test.udfs; import java.io.IOException; import org.apache.pig.EvalFunc; import org.apache.pig.data.DataBag; import ...
Ryan Tecco
Feb 23, 2011 at 2:11 am
Feb 23, 2011 at 8:04 pm -
Hi All, I want to do this in Pig. "row_number() over (partition by col1 order by col2)" Any suggestions how I can do this? I know I can do group by instead of partition by and order by in Pig. But is ...
Sonia gehlot
Feb 22, 2011 at 6:27 am
Feb 22, 2011 at 2:53 pm -
Hi, I am new to Hbase/Hadoop concept. Following is the scenario -: 1) Our Hadoop is installed in a remote system. Data is loaded in HBase through HBase writer. 2) I am trying to install pig on my ...
Rashmi behera
Feb 22, 2011 at 11:43 am
Feb 22, 2011 at 1:08 pm -
Hi I am passing a parameter to my Pig script. Its value is a date string in the form of MM/DD/YYYY. I am trying to have this parameter's value as an additional column to the final output of the ...
Arun A K
Feb 13, 2011 at 7:00 am
Feb 15, 2011 at 9:33 am -
Hi folks, For those of you who are using Elephant-Bird, I just wanted to let you know about the compatibility roadmap so you can plan accordingly. We've tried to inject a bit of versioning into EB, ...
Dmitriy Ryaboy
Feb 14, 2011 at 6:14 am
Feb 15, 2011 at 1:50 am -
I'm trying to understand the best way of setting up repeated processing of continuously generated data - like logs. I can manually copy files from normal FS to HDFS and kick off pig scripts but ...
Alex McLintock
Feb 6, 2011 at 10:38 pm
Feb 8, 2011 at 7:05 am -
I am developing a new UDF for loading Json data. It differs from those currently available because it tries to construct the supplied nested maps and arrays as Pig data structures rather than a ...
Alex McLintock
Feb 5, 2011 at 1:13 pm
Feb 8, 2011 at 6:53 am -
Hi I am doing an inner join on two relations say A, B. A has fields - Word1:chararray, Word2:chararray, Word3:chararray, Metric1:long, Metric2:long B has fields - UniqueWord1:chararray, UniqueID:long ...
Arun A K
Feb 5, 2011 at 10:12 pm
Feb 7, 2011 at 5:17 pm -
Hey there, I've just started butting heads against a problem where I'm trying to cast bytearrays in customer-provided data to integers. The overwhelming majority of the time, we seem to get actual ...
Kris Coward
Feb 3, 2011 at 8:51 pm
Feb 3, 2011 at 11:42 pm -
Can anyone point me to a Loader UDF which creates nested tuples - ie tuples with bags/other tuples within them? I believe you couldn't do this before about Pig 0.7.0 but I can't see any examples of ...
Alex McLintock
Feb 1, 2011 at 7:56 pm
Feb 2, 2011 at 11:44 pm -
Hi all, Please consider submitting to the: The Fourth IEEE International Scalable Computing Challenge (SCALE 2011), sponsored by the IEEE Computer Society Technical Committee on Scalable Computing ...
Viraj Bhat
Feb 1, 2011 at 9:04 pm
Feb 1, 2011 at 9:04 pm
Group Overview
group | user |
categories | pig, hadoop |
discussions | 62 |
posts | 276 |
users | 57 |
website | pig.apache.org |
57 users for February 2011
Archives
- May 2013 (92)
- April 2013 (226)
- March 2013 (362)
- February 2013 (192)
- January 2013 (166)
- December 2012 (115)
- November 2012 (223)
- October 2012 (249)
- September 2012 (275)
- August 2012 (249)
- July 2012 (219)
- June 2012 (371)
- May 2012 (281)
- April 2012 (377)
- March 2012 (341)
- February 2012 (323)
- January 2012 (364)
- December 2011 (266)
- November 2011 (234)
- October 2011 (207)
- September 2011 (321)
- August 2011 (271)
- July 2011 (253)
- June 2011 (249)
- May 2011 (239)
- April 2011 (341)
- March 2011 (321)
- February 2011 (276)
- January 2011 (320)
- December 2010 (244)
- November 2010 (136)
- October 2010 (251)
- September 2010 (161)
- August 2010 (201)
- July 2010 (198)
- June 2010 (171)
- May 2010 (205)
- April 2010 (192)
- March 2010 (237)
- February 2010 (192)
- January 2010 (182)
- December 2009 (106)
- November 2009 (169)
- October 2009 (105)
- September 2009 (134)
- August 2009 (108)
- July 2009 (140)
- June 2009 (151)
- May 2009 (150)
- April 2009 (133)
- March 2009 (124)
- February 2009 (119)
- January 2009 (66)
- December 2008 (45)
- November 2008 (80)
- October 2008 (102)
- September 2008 (112)
- August 2008 (32)
- July 2008 (46)
- June 2008 (78)
- May 2008 (79)
- April 2008 (26)
- March 2008 (42)
- February 2008 (30)
- January 2008 (15)
- December 2007 (31)
- November 2007 (13)
- October 2007 (9)