Search Discussions

62 discussions - 276 posts

  • Howl is a table management system built to provide metadata and storage management across data processing tools in Hadoop (Pig, Hive, MapReduce, ...). You can learn more details at ...
    Alan GatesAlan Gates
    Feb 2, 2011 at 9:19 pm
    Feb 8, 2011 at 6:11 pm
  • Hi Guys, I getting wired error while running my pig script. *case_state = FOREACH join_pe_pre GENERATE* * f1, f2, f3, f4, (* * (f5 '.*.facebook..*')* * ? f10* * : null* * ) as facebook_referrals,* * ...
    Sonia gehlotSonia gehlot
    Feb 19, 2011 at 11:39 pm
    Feb 23, 2011 at 12:35 am
  • Hey, I have a bunch of files where the filename is significant. I'm loading the files by supplying the top level directory that contains the files. Is there a way to capture the filename of the file ...
    Kim VogtKim Vogt
    Feb 3, 2011 at 11:53 pm
    Feb 4, 2011 at 7:30 pm
  • So I finally got a couple of test scripts running on my cluster to take a sample data file, load it, do a little processing, store it, load it, do a little more processing, and dump the results. Once ...
    Kris CowardKris Coward
    Feb 28, 2011 at 3:48 am
    Mar 3, 2011 at 7:01 pm
  • I tried to process a big number of small files on pig and I got a strange problem. 2011-02-27 00:00:58,746 [Thread-15] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths ...
    Charles GonçalvesCharles Gonçalves
    Feb 27, 2011 at 3:26 am
    Mar 1, 2011 at 10:07 pm
  • Hey All, Running into a problem storing data from a pig script storing results into HBase. We are getting the following error: java.lang.NullPointerException at ...
    Matt DaviesMatt Davies
    Feb 14, 2011 at 9:57 pm
    Feb 16, 2011 at 12:02 am
  • Hey all, I wanted to know if the patch from https://issues.apache.org/jira/browse/PIG-200 is safe for Pig0.8, and how to apply it is the same way as shown in the JIRA. Thanks. Renato M.
    Renato Marroquín MogrovejoRenato Marroquín Mogrovejo
    Feb 3, 2011 at 5:58 pm
    Feb 8, 2011 at 6:46 am
  • I ve written a simple UDF that parses a chararray (which looks like ...[a].....[b]...[a]...) to capture stuff inside brackets and return them as String a=2;b=1; and so on. The input chararray are ...
    Aniket MokashiAniket Mokashi
    Feb 24, 2011 at 3:50 am
    Feb 25, 2011 at 1:26 am
  • I ran into a problem that I have spent quite some time on and start to think it's probably pig's doing something optimization that makes this thing hard. This is my pseudo code: raw = LOAD ... then ...
    Dexin WangDexin Wang
    Feb 18, 2011 at 9:42 pm
    Feb 22, 2011 at 6:59 pm
  • I have been working my way through Pig recently with a lot of help from the folks in #hadoop-pig on Freenode. The problem I am having is with reading any gzip'd files from anywhere (either locally or ...
    Eric LubowEric Lubow
    Feb 21, 2011 at 11:47 pm
    Feb 22, 2011 at 2:23 pm
  • Guys, I'm working on my MSc now using pig/hadoop to process logs. I'm basically using it to do some characterizations on a traffic analysis from some of the greatest Media groups from Brazil. One of ...
    Charles GonçalvesCharles Gonçalves
    Feb 19, 2011 at 9:13 pm
    Feb 21, 2011 at 1:36 am
  • So in the interest of being a little less i/o bound, and saving a whole mess of disk, I've started using com.twitter.elephantbird.pig.store.LzoTokenizedStorage for storage... or more accurately, will ...
    Kris CowardKris Coward
    Feb 11, 2011 at 6:48 pm
    Feb 18, 2011 at 10:33 am
  • Hi all if I have the following XML file <attr tag="00020000" vr="UL" len="4" 180</attr <attr tag="00020001" vr="OB" len="2" 00\01</attr *how I can read it using xmlloader, I mean how I can read for ...
    Baraa MohamadBaraa Mohamad
    Feb 22, 2011 at 2:59 pm
    Mar 1, 2011 at 9:59 am
  • Hi, I have a custom loader that creates and returns a tuple of id, bags. I want to open these bags and get their contents. For example- data = load 'loc' using myLoader() as (id, bag1, bag2); ...
    Aniket MokashiAniket Mokashi
    Feb 17, 2011 at 12:59 am
    Feb 24, 2011 at 12:16 am
  • Hi folks, Is anyone who uses HBaseStorage in Pig still on hbase 0.20.6? There are a number of tickets outstanding to improve HBaseStorage and I've suggested that we should add a shim layer to work ...
    Dmitriy RyaboyDmitriy Ryaboy
    Feb 14, 2011 at 1:42 am
    Feb 14, 2011 at 7:03 pm
  • Trying to write a simple storefunc that makes use of the input data's field names. Is there a way to gain access to this inside of the call to putNext? Ostensibly you could set a variable with the ...
    Jacob PerkinsJacob Perkins
    Feb 1, 2011 at 4:47 am
    Feb 1, 2011 at 4:42 pm
  • Hello everyone I am using CDH3 Beta 4 distribution on 11 node cluster machines. I successfully installed Hadoop. However, Pig seems to be giving us some trouble. I installed Pig according to the ...
    Feb 23, 2011 at 12:46 pm
    Feb 23, 2011 at 11:14 pm
  • Hi All, I am new to Hadoop and I started exploring Pig since last month. I have few question I have to replicate some SQL query to Pig that has left join for example: select blah, blah From ...
    Sonia gehlotSonia gehlot
    Feb 16, 2011 at 10:10 pm
    Feb 16, 2011 at 11:49 pm
  • Is possible to use a parallel statment inside a nested foreach block like in : 28 E = GROUP B ALL PARALLEL 100; 29 30 edge_breakdown = FOREACH E { 31 dist_cIps = DISTINCT B.cIp *PARALLEL X * ; 32 ...
    Charles GonçalvesCharles Gonçalves
    Feb 11, 2011 at 4:58 pm
    Feb 11, 2011 at 6:27 pm
  • Guys, Does Pig read the _log directories from an output script ? What I want is to read an pig output dir (or multiples) from pig scripts. But I just want the part-XXXX files not the .part-crc or ...
    Charles GonçalvesCharles Gonçalves
    Feb 17, 2011 at 11:12 pm
    Feb 17, 2011 at 11:28 pm
  • Hello all, I've been scratching my head over a problem with a pig script I'm having, and hoping another set of eyeballs will help. I'm using pig 0.8, in local mode Here's my simplified use case: I ...
    James KebingerJames Kebinger
    Feb 16, 2011 at 11:57 pm
    Feb 17, 2011 at 7:02 pm
  • Hi, I have been using Pig for a few months now, was using 0.7 earlier and recently migrated to 0.8. In a script I am working on right now I hit a snag where the script failed. I investigated some ...
    Feb 14, 2011 at 9:41 pm
    Feb 16, 2011 at 9:45 pm
  • Last week I sent an email proposing that we, the Pig project, sponsor Howl as an incubator project. You can see the thread at http://tinyurl.com/4acfut4 . However, in proposing this I did not realize ...
    Alan GatesAlan Gates
    Feb 9, 2011 at 2:02 am
    Feb 9, 2011 at 6:55 pm
  • Howdy, I am curious about how algebraic UDFs are invoked. I know about how to write them, but in the case where your initial step is to create a costly datastructure, how can you ensure that this is ...
    Jonathan CoveneyJonathan Coveney
    Feb 3, 2011 at 8:30 pm
    Feb 4, 2011 at 12:10 am
  • I'm looking at Pig's TupleSize implementation and wondering if it's implemented correctly: @Override public Long exec(Tuple input) throws IOException { try{ if (input == null) return null; return ...
    Eric TschetterEric Tschetter
    Feb 1, 2011 at 8:33 pm
    Feb 3, 2011 at 6:52 pm
  • Hi Guys, I'm Have an UDF in which I want to pass a long in a timestamp representation and get an Date formated with the SimpleDateFormat Class. I will pass to the UDF constructor the string format to ...
    Charles GonçalvesCharles Gonçalves
    Feb 1, 2011 at 2:13 pm
    Feb 3, 2011 at 2:06 am
  • Hi, I'd like to be able to flatten a map, so each K- V is flattened into it's own row. Basically something like this: cat map_data.txt 32 [123#bill,222#joe] 77 [977#mary,987#jane] 44 ...
    Bill GrahamBill Graham
    Feb 28, 2011 at 6:17 am
    Feb 28, 2011 at 7:29 am
  • Hi I am trying to use REPLACE function in PIG to clean the string in following way.. REPLACE(REPLACE(REPLACE('string','.',' '),'-',' '),' ',' ') So if string has got . or - or spaces , it removes ...
    Sanjeev ShrivastavaSanjeev Shrivastava
    Feb 22, 2011 at 6:16 pm
    Feb 22, 2011 at 6:32 pm
  • Hi Guys, I am trying to build a web interface that can use pig to do "on demand" batch processing using pig, but JSP (or more specifically, tomcat) does not seem to want to know pig. I keep receiving ...
    Robert WaddellRobert Waddell
    Feb 18, 2011 at 2:40 pm
    Feb 18, 2011 at 3:14 pm
  • This may be better asked on one of the other hadoop lists, but as the job in question is done with Pig I thought I would start here. I have a nightly job that runs against around 1000 gzip log files. ...
    Kester, ScottKester, Scott
    Feb 16, 2011 at 2:52 pm
    Feb 17, 2011 at 11:25 am
  • hi all please I need your help, i'm a newbie with pig but I really would like to use it I tried to use xmlloader within this small program register H:/apps/pig-0.7.0/piggybank.jar; A = load ...
    Baraa MohamadBaraa Mohamad
    Feb 11, 2011 at 6:07 pm
    Feb 11, 2011 at 6:20 pm
  • I'm trying just to do a breakdown for all my logs but every time I use a operation like : FILTER alias BY some_udf(alias); I got a problem. First I got : ERROR 0: Scalar has more than one row in the ...
    Charles GonçalvesCharles Gonçalves
    Feb 11, 2011 at 1:43 am
    Feb 11, 2011 at 1:25 pm
  • I am trying to implement a maxmind call where I do not have to put the maxmind file on the nodes. I referred to this http://web.archiveorange.com/archive/v/3inw3FVtG19NUTr25Yra ...
    Jonathan CoveneyJonathan Coveney
    Feb 9, 2011 at 11:22 pm
    Feb 10, 2011 at 3:07 pm
  • Can anyone give me any hints on why a JOIN may be failing with this weird error.... I can DESCRIBE the two tables it is joining justurls: {tweetid: bytearray,userid: bytearray,url: bytearray} userdb: ...
    Alex McLintockAlex McLintock
    Feb 7, 2011 at 7:39 pm
    Feb 8, 2011 at 6:57 am
  • A) Am I right in thinking that no UDF can turn (1, (2,3,4) ) into (1, 2 ) (1, 3 ) (1, 4 ) because you always get out the same number of tuples as you put in? B) Would FLATTEN ($1) do that - if the ...
    Alex McLintockAlex McLintock
    Feb 7, 2011 at 7:31 pm
    Feb 7, 2011 at 11:11 pm
  • Hey Guys, I am trying to optimize my Pig jobs as much as possible and wanted to know a little about how Pig handles its loading of data. When I have: var1 = LOAD .... local_var1 = FOREACH local_var1 ...
    Robert WaddellRobert Waddell
    Feb 6, 2011 at 8:11 pm
    Feb 6, 2011 at 10:41 pm
  • Hi Guys, I noted that concatenated gziped files not work on Hadoop https://issues.apache.org/jira/browse/HADOOP-6335 <https://issues.apache.org/jira/browse/HADOOP-6335 So, have anyone passed by this ...
    Charles GonçalvesCharles Gonçalves
    Feb 3, 2011 at 3:36 am
    Feb 3, 2011 at 4:21 pm
  • I was just curious if anyone might have a decent code example of using the distributed cache with Pig? I have a file, let's call it file.dat, and I want to make it available locally to all of my ...
    Jonathan CoveneyJonathan Coveney
    Feb 28, 2011 at 10:43 pm
    Feb 28, 2011 at 11:11 pm
  • I have a jar with UDFS called squeal.jar...I know it works, because I can access methods in the jar. However, when I try this: register /home/jcoveney/udfs/squeal/jar/squeal.jar; A = LOAD 'test.txt' ...
    Jonathan CoveneyJonathan Coveney
    Feb 28, 2011 at 4:24 pm
    Feb 28, 2011 at 5:08 pm
  • This seems like it should work: register '/tmp/test-udfs.jar'; /* package test.udfs; import java.io.IOException; import org.apache.pig.EvalFunc; import org.apache.pig.data.DataBag; import ...
    Ryan TeccoRyan Tecco
    Feb 23, 2011 at 2:11 am
    Feb 23, 2011 at 8:04 pm
  • Hi All, I want to do this in Pig. "row_number() over (partition by col1 order by col2)" Any suggestions how I can do this? I know I can do group by instead of partition by and order by in Pig. But is ...
    Sonia gehlotSonia gehlot
    Feb 22, 2011 at 6:27 am
    Feb 22, 2011 at 2:53 pm
  • Hi, I am new to Hbase/Hadoop concept. Following is the scenario -: 1) Our Hadoop is installed in a remote system. Data is loaded in HBase through HBase writer. 2) I am trying to install pig on my ...
    Rashmi beheraRashmi behera
    Feb 22, 2011 at 11:43 am
    Feb 22, 2011 at 1:08 pm
  • Hi I am passing a parameter to my Pig script. Its value is a date string in the form of MM/DD/YYYY. I am trying to have this parameter's value as an additional column to the final output of the ...
    Arun A KArun A K
    Feb 13, 2011 at 7:00 am
    Feb 15, 2011 at 9:33 am
  • Hi folks, For those of you who are using Elephant-Bird, I just wanted to let you know about the compatibility roadmap so you can plan accordingly. We've tried to inject a bit of versioning into EB, ...
    Dmitriy RyaboyDmitriy Ryaboy
    Feb 14, 2011 at 6:14 am
    Feb 15, 2011 at 1:50 am
  • I'm trying to understand the best way of setting up repeated processing of continuously generated data - like logs. I can manually copy files from normal FS to HDFS and kick off pig scripts but ...
    Alex McLintockAlex McLintock
    Feb 6, 2011 at 10:38 pm
    Feb 8, 2011 at 7:05 am
  • I am developing a new UDF for loading Json data. It differs from those currently available because it tries to construct the supplied nested maps and arrays as Pig data structures rather than a ...
    Alex McLintockAlex McLintock
    Feb 5, 2011 at 1:13 pm
    Feb 8, 2011 at 6:53 am
  • Hi I am doing an inner join on two relations say A, B. A has fields - Word1:chararray, Word2:chararray, Word3:chararray, Metric1:long, Metric2:long B has fields - UniqueWord1:chararray, UniqueID:long ...
    Arun A KArun A K
    Feb 5, 2011 at 10:12 pm
    Feb 7, 2011 at 5:17 pm
  • Hey there, I've just started butting heads against a problem where I'm trying to cast bytearrays in customer-provided data to integers. The overwhelming majority of the time, we seem to get actual ...
    Kris CowardKris Coward
    Feb 3, 2011 at 8:51 pm
    Feb 3, 2011 at 11:42 pm
  • Can anyone point me to a Loader UDF which creates nested tuples - ie tuples with bags/other tuples within them? I believe you couldn't do this before about Pig 0.7.0 but I can't see any examples of ...
    Alex McLintockAlex McLintock
    Feb 1, 2011 at 7:56 pm
    Feb 2, 2011 at 11:44 pm
  • Hi all, Please consider submitting to the: The Fourth IEEE International Scalable Computing Challenge (SCALE 2011), sponsored by the IEEE Computer Society Technical Committee on Scalable Computing ...
    Viraj BhatViraj Bhat
    Feb 1, 2011 at 9:04 pm
    Feb 1, 2011 at 9:04 pm
Group Navigation
period‹ prev | Feb 2011 | next ›
Group Overview
groupuser @
categoriespig, hadoop

57 users for February 2011

Dmitriy Ryaboy: 39 posts Charles Gonçalves: 23 posts Daniel Dai: 18 posts Jonathan Coveney: 17 posts Alan Gates: 15 posts Jacob Perkins: 11 posts Sonia gehlot: 10 posts Renato Marroquín Mogrovejo: 9 posts Kim Vogt: 8 posts Dexin Wang: 7 posts Alex McLintock: 6 posts Baraa Mohamad: 6 posts John Sichi: 6 posts Kris Coward: 6 posts Ramesh, Amit: 6 posts Aniket Mokashi: 5 posts Eric Lubow: 5 posts Matt Davies: 5 posts Robert Waddell: 5 posts Thejas M Nair: 5 posts
show more