Grokbase Groups Pig user July 2011

Search Discussions

58 discussions - 253 posts

  • Hi, I'm using PIG 0.8.1 with HBase 0.90 and the following script somethime returns an empty set, and sometimes work ! start_sessions = LOAD 'startSession' USING ...
    Vincent BaratVincent Barat
    Jul 26, 2011 at 5:40 pm
    Aug 26, 2011 at 4:16 pm
  • I have been trying to Store data in HBase suing HbaseStorage class. While I can store the original read data, it fails when I try to store the processed data. Which means I might be messing up the ...
    Sulabh choudhurySulabh choudhury
    Jul 15, 2011 at 7:41 pm
    Jul 17, 2011 at 4:25 am
  • Pig team is happy to announce Pig 0.9.0 release. Apache Pig provides a high-level data-flow language and execution framework for parallel computation on Hadoop clusters. More details about Pig can be ...
    Olga NatkovichOlga Natkovich
    Jul 29, 2011 at 8:26 pm
    Aug 24, 2011 at 8:07 pm
  • hey all, i'v been trying to query cassandra using my pig script, so i used the contrib jar from cassandra. and i'm getting the following error... some thrift failure err.... :| ERROR 2998: Unhandled ...
    Shai HarelShai Harel
    Jul 31, 2011 at 2:49 pm
    Sep 28, 2011 at 5:04 pm
  • expectation from PigStorage.getInputFormat() is that it is a InputFormat<Writable, Text , and PigStorage handles converting Text to Tuple. This is very useful and easy for users to use some other ...
    Raghu AngadiRaghu Angadi
    Jul 21, 2011 at 6:12 pm
    Jul 22, 2011 at 11:48 pm
  • We have a UDF that introspects the output schema and gets the field names there and use that in the exec method. The UDF is found here: ...
    Jeremy HannaJeremy Hanna
    Jul 6, 2011 at 4:43 pm
    Jul 9, 2011 at 6:05 pm
  • Possible to do conditional and more than one generate inside a foreach? for example, I have tuples like this (names, days_ago) (a,0) (b,1) (c,9) (d,40) b shows up 1 day ago, so it belongs to all of ...
    Dexin WangDexin Wang
    Jul 22, 2011 at 11:42 pm
    Jul 25, 2011 at 4:49 pm
  • Hi all, I'm trying to do a map-side only merge join [1] in pig using Zebra's TableLoader. (My data allows merge join.) But I'm being unable to use the TableLoader. Even a simple script that loads a ...
    Ankur JainAnkur Jain
    Jul 19, 2011 at 9:29 pm
    Jul 20, 2011 at 10:48 pm
  • Trying to join two sets and generate a set from the join and I am getting a $ hadoop fs -cat DIM/\* 2011,01,31 2011,02,28 2011,03,31 2011,04,30 2011,05,31 2011,06,30 2011,07,31 2011,08,31 2011,09,30 ...
    Rob parkerRob parker
    Jul 29, 2011 at 4:47 pm
    Jul 29, 2011 at 8:32 pm
  • Hello, I want to use foreach statement to filter the tuple in the bag. But it didn't work. My pig-code is as follows: A = LOAD '/home/test/student.txt' AS (name:chararray, no:int, score: int); B = ...
    Jul 19, 2011 at 1:01 pm
    Jul 20, 2011 at 6:00 pm
  • I am not able to assign a value with spaces to param on command line. $ pig -p cond='x == 1' test.pig results in command line parser error. other attempts like 'pig -p cond='"x == 1"' test.pig' ...
    Raghu AngadiRaghu Angadi
    Jul 11, 2011 at 9:09 pm
    Jul 13, 2011 at 10:44 pm
  • Hello, I run pig test case with cmd: ant test. I am wondering which part is the failure ones? test with Failures or with Errors? And please help to check why these tests result with Failures or ...
    Jul 29, 2011 at 3:30 am
    Aug 1, 2011 at 6:48 pm
  • Hi, I'd like to make PIG load only a subset of an HBase table, based on the timestamp of the records, or on the key of the rows. As an example, I'd like to load only records that have a timestamp N, ...
    Vincent BaratVincent Barat
    Jul 28, 2011 at 10:19 am
    Jul 28, 2011 at 5:27 pm
  • Hi, I have googled a lot about if I can have Pig interact with an RDBMS. Is there any way to have Pig load data from an rdbms? perform some operations and then store data on Hadoop? Thanks, Mance
    Mance RylanMance Rylan
    Jul 26, 2011 at 10:40 am
    Jul 27, 2011 at 8:32 am
  • I tried out hadoop/pig in my test environment using tar.gz's. Before I roll out to production, I thought I'd try the cdh3 pacakges, as that might be easier to maintain (since I'm not a sysadmin). ...
    William ObermanWilliam Oberman
    Jul 8, 2011 at 6:58 pm
    Jul 12, 2011 at 6:12 pm
  • Hello all, Is there an accepted way to use the GeoIP database with pig? I've found some people have tried to write UDF's with their java api. Others say to use the ...
    Ross NordeenRoss Nordeen
    Jul 11, 2011 at 6:58 pm
    Jul 12, 2011 at 12:50 am
  • Hello people, I am new to pig. Currently I am using hadoop and hbase together. Since hadoop-0.20-append supports Hbase in production, so currently I am using hadoop 0.20-append jar files. Now I am ...
    Praveenesh kumarPraveenesh kumar
    Jul 4, 2011 at 5:38 am
    Jul 6, 2011 at 2:09 pm
  • Hi, I'm trying to run the data sampler tool from the penny library, and am getting a ClassNotFoundException for a netty class. I'm using the trunk version of pig, with the patch from PIG-2013 ...
    Doug DanielsDoug Daniels
    Jul 26, 2011 at 5:05 pm
    Aug 10, 2011 at 8:26 pm
  • Hi, Any ideas how to convert unixtimestamp to some readable date format like YYYY-MM-DD ? Is there a buit in function ? Because I tried UnixToIso....but it expects unix time long ....or is there a ...
    Marian ConduracheMarian Condurache
    Jul 29, 2011 at 1:07 pm
    Jul 29, 2011 at 2:42 pm
  • Hi, I just posted a new Pig Editor for Eclipse: The goal is to have it help you like Eclipse for Java (autocomplete, show errors in red....). Still a lot to ...
    Romain RigauxRomain Rigaux
    Jul 26, 2011 at 1:06 am
    Jul 27, 2011 at 3:47 am
  • Hi all I have 2 CSV files a shown below: *File 1: File2: col1 col2 col1 col2 col3 col4 1234 2 1000 1999 2222 3 2000 2999 3333 5 3000 3999 4444 6 4000 4999* Now I need to JOIN these 2 files in such a ...
    Lakshminarayana MotamarriLakshminarayana Motamarri
    Jul 15, 2011 at 8:23 am
    Jul 17, 2011 at 7:55 pm
  • I'm trying to join together several different sources of synonyms using Pig. For example: A = LOAD '/tmp/synonyms.txt' USING PigStorage() AS (id:chararray, label:chararray); DUMP A; (12,synonym1) ...
    Mike HugoMike Hugo
    Jul 12, 2011 at 7:46 pm
    Jul 13, 2011 at 4:11 pm
  • hi, i've got a pretty simple transform of data i need to do and i can't for the life of me work it out. i feel like i'm missing something trivial... i want to go from this... person key value bob age ...
    Mat KelceyMat Kelcey
    Jul 11, 2011 at 5:47 am
    Jul 12, 2011 at 5:17 am
  • I keep getting an ERROR 2044: The type BYTE cannot be collected as a Key type I am not totally sure about the fix for this. Any help would be appreciated.
    Brian AdamsBrian Adams
    Jul 25, 2011 at 5:28 pm
    Jul 26, 2011 at 12:19 am
  • Hi, I have some code that looks like this: top_hits = foreach regrouped { result = TOP(1, 6, projected_joined_albums); -- field 6 = score generate flatten(result); }; I'm not too keen on the TOP ...
    Andrew CleggAndrew Clegg
    Jul 21, 2011 at 4:18 pm
    Jul 22, 2011 at 2:29 pm
  • Hello, Is there a way to specify the zookeeper quorum for a pig job writing out to HBase using HBaseStorage? For instance, jobA - zkquorum 1 jobB - zkquorum 2 I know that it reads from the hbase ...
    Matt DaviesMatt Davies
    Jul 18, 2011 at 5:42 pm
    Jul 19, 2011 at 6:06 am
  • According to here: Inline definition of scripted UDFs is not supported, and there is a reference to a jira issue on this feature: ...
    Mark RoddyMark Roddy
    Jul 21, 2011 at 7:33 pm
    Sep 2, 2011 at 5:02 pm
  • I've been doing the following to count rows: x = foreach (group foo all) generate COUNT($1); Is that the current best practice? If so, would there be interest in a patch that simply did: x = ...
    Grant IngersollGrant Ingersoll
    Jul 26, 2011 at 2:18 pm
    Jul 29, 2011 at 9:05 pm
  • Hi , i would like to create a hadoop fail over system , mainly for the master node in my hadoop cluster. I can t use the linux HA so i have been trying another approach I thought zookeeper wold be a ...
    Thiago VeigaThiago Veiga
    Jul 19, 2011 at 7:23 pm
    Jul 19, 2011 at 8:35 pm
  • I'm attempting to get the PigStorageWithInputPath example ( working, but I must be missing something. It works fine if I specify a single file, but ...
    CJ NiemiraCJ Niemira
    Jul 15, 2011 at 7:42 pm
    Jul 15, 2011 at 9:11 pm
  • I have a Hadoop job running through Pig for which I would like to limit the number of concurrently running mappers per task tracker. The property seems to be just ...
    Dylan ScottDylan Scott
    Jul 8, 2011 at 5:09 pm
    Jul 12, 2011 at 10:09 pm
  • Hi, Consider the following script: a = load 'a' as (x:chararray, y:double); b = foreach a generate *, ABS(y - 2*y) as test; dump b; Are functions like (y-2*y) not supported inside ABS? The weird ...
    Shubham ChopraShubham Chopra
    Jul 8, 2011 at 2:28 pm
    Jul 8, 2011 at 5:55 pm
  • grunt describe policies_by_type; policies_by_type: {group: chararray,policies: {columns::udm_field: chararray,columns::udm_value: chararray}} grunt dump policies_by_type; ...
    Colin TaylorColin Taylor
    Jul 6, 2011 at 2:09 pm
    Jul 6, 2011 at 3:49 pm
  • Hello My dataset has five fields, I want to select DISTINCT lines based upon the first four fields and then append the fifth field from the first common line (based on the first four fields). Is this ...
    Tony BurtonTony Burton
    Jul 1, 2011 at 10:12 am
    Jul 6, 2011 at 1:28 pm
  • I have a doubt that: sometime when I run the pig code: c = stream b through `grep "spider"`; It will return the error message: Received Error while processing the map plan: 'grep "spider" ' failed ...
    Jameson LiJameson Li
    Jul 5, 2011 at 3:01 am
    Jul 6, 2011 at 8:36 am
  • Hi, I have a latest pig build from trunk. I have configured it to run on 12-node hadoop cluster. I am trying to access a hbase table, the map job is running fine for sometime...but after some time.. ...
    Praveenesh kumarPraveenesh kumar
    Jul 5, 2011 at 8:45 am
    Jul 6, 2011 at 4:37 am
  • I have few questions on running the pig script/ map-reduce jobs. 1. I know that pig creates *logical, physical and then execution plans* before it really starts executing the map/reduce job; I am ...
    Prabhu Dhakshina MurthyPrabhu Dhakshina Murthy
    Jul 4, 2011 at 3:39 am
    Jul 4, 2011 at 5:09 am
  • We have our data in folders partitioned by day: ie /user/pig/logs/2011/06/30 Is there any way to select the last x amount of days to use as input? Thanks
    Jul 1, 2011 at 5:06 am
    Jul 1, 2011 at 10:21 am
  • hi, when i run my script pig i get the error message bellow ERROR 2017: Internal error creating job configuration does someone know this error message ? thank's a lot thiago
    Thiago VeigaThiago Veiga
    Jul 28, 2011 at 8:28 pm
    Jul 28, 2011 at 11:21 pm
  • Hi, I'm trying to run a simple AvroStorage example to read from a tsv file via PigStorage and write to Avro, but the job fails with the following exception: java.lang.ClassCastException: ...
    Bill GrahamBill Graham
    Jul 25, 2011 at 9:12 pm
    Jul 28, 2011 at 10:35 pm
  • Hello, I have a custom loader function to read in a parsed schema from some log files, but it seems there is a problem with some of the log files and I need to detect if the end of a line in the log ...
    Jul 26, 2011 at 6:12 pm
    Jul 26, 2011 at 6:21 pm
  • I have data in an HBase table in stored in the following format: rowkey group_id:1 group_id:2 ... group_id:n 2fcab50712467eab4004583eb8fb7f89 1 0 1 085125e8f7cdc99fd91dbd7280373c5b 0 1 0 ...
    Juan Martin PampliegaJuan Martin Pampliega
    Jul 25, 2011 at 6:01 pm
    Jul 25, 2011 at 7:51 pm
  • Hi, I tried including the udfs in my pig script by adding them to the classpath as well as registered them but still its is showing NoClassDefFoundError pig -x local -classpath ...
    Jayesh BharadwajJayesh Bharadwaj
    Jul 25, 2011 at 9:19 am
    Jul 25, 2011 at 4:33 pm
  • Hello again, I have a relation with the following schema: regrouped: {group: (artistid: int,country: int,week: chararray),projected_joined_albums: {key: (artistid: int,country: int,week: ...
    Andrew CleggAndrew Clegg
    Jul 22, 2011 at 2:30 pm
    Jul 22, 2011 at 10:51 pm
  • Hello, I know that pig can support nested data. I want to know that how many nested depth pig can support, infinite nested or there exists a limitation? Thanks! Yong
    Jul 19, 2011 at 9:42 am
    Jul 19, 2011 at 9:21 pm
  • Greetings, I'm trying to upgrade from 0.7.0 to 0.8.1, but am having some trouble with existing scripts. The basic problem I'm trying to solve is storing an alias in to hdfs, overwriting data that may ...
    Chris RosnerChris Rosner
    Jul 19, 2011 at 7:28 am
    Jul 19, 2011 at 7:43 am
  • I'm loading sequence files, of which each row's 'value' is a tab delimited set of columns. I'm exploding the values out so that I can work with them separately, but pig's syntax parser is giving me a ...
    Jameson LoppJameson Lopp
    Jul 18, 2011 at 7:32 pm
    Jul 19, 2011 at 6:03 am
  • Does pig work with 0.20.203? I'm getting this error when run pig and cant figure out why: ERROR 2999: Unexpected internal error. Failed to create DataStorage Will it work if I use version 0.9.0? -- ...
    Ross NordeenRoss Nordeen
    Jul 15, 2011 at 9:35 pm
    Jul 15, 2011 at 9:53 pm
  • how to log the PIG Exception at run time, am running the pig script as a CRON in ubuntu. Thanks, Venkat
    Jul 14, 2011 at 12:57 pm
    Jul 15, 2011 at 1:01 am
Group Navigation
period‹ prev | Jul 2011 | next ›
Group Overview
groupuser @
categoriespig, hadoop

73 users for July 2011

Dmitriy Ryaboy: 28 posts Daniel Dai: 18 posts Raghu Angadi: 15 posts Thejas Nair: 10 posts Vincent Barat: 10 posts Jeremy Hanna: 8 posts Norbert Burger: 8 posts Marian Condurache: 7 posts Alan Gates: 6 posts Bill Graham: 6 posts Jacob Perkins: 5 posts Jagaran das: 5 posts Ankur Jain: 4 posts Ashutosh Chauhan: 4 posts Jameson Li: 4 posts Praveenesh kumar: 4 posts Raghu Angadi: 4 posts Romain Rigaux: 4 posts Ross Nordeen: 4 posts Sulabh choudhury: 4 posts
show more