Grokbase Groups Pig user July 2010
FAQ

Search Discussions

48 discussions - 198 posts

  • I am running my Pig scripts on our QA cluster (with 4 datanoes, see blelow) and has Cloudera CDH2 release installed and global heap max is ­Xmx4096m. I am constantly getting OutOfMemory errors (see ...
    Syed WastiSyed Wasti
    Jul 7, 2010 at 9:10 pm
    Jul 29, 2010 at 7:40 pm
  • Hello everybody, I have a simple table containing sessions. Each sessions has an unique key (the sid, which is actually a uuid). But a session can be present several times in my input table. I want ...
    Vincent BaratVincent Barat
    Jul 12, 2010 at 1:27 pm
    Jul 16, 2010 at 9:25 pm
  • I do this in a static block of the udf class, or by initialazing static variables ... Maybe there is a better way, but Idon't know which one. Dave Viner <dave@vinertech.com a écrit :
    Vincent BaratVincent Barat
    Jul 1, 2010 at 4:37 pm
    Jul 7, 2010 at 8:41 pm
  • Is there a way to set the mapred.min.split.size property in pig? I set it but doesn't seem to have changed the mapper's HDFS_BYTES_READ counter. My mappers are finishing ~10 secs. I have ~20,000 of ...
    Corbin HoenesCorbin Hoenes
    Jul 27, 2010 at 9:09 pm
    Apr 18, 2011 at 8:41 pm
  • I though I would ping pig-user about this to get comments. I put a proposal for builtin date functions I can have ready for Pig 0.8 at http://issues.apache.org/jira/browse/PIG-1430 There are so many ...
    Russell JurneyRussell Jurney
    Jul 2, 2010 at 6:28 am
    Jul 7, 2010 at 12:03 am
  • I have built a bag tuples where the tuples contain fields. I am reading SequenceFiles and have reading MyLoader to do this. I created a subset of all the fields, "isValid" to make the example ...
    Rodriguez, JohnRodriguez, John
    Jul 30, 2010 at 10:11 pm
    Aug 3, 2010 at 5:17 pm
  • So I asked a question earlier, but figured it wasn't very clear and thus less likely to get answered, so here goes. I have 2 sources with seperate ngrams and counts and after doing a full out join i ...
    Brian AdamsBrian Adams
    Jul 6, 2010 at 9:47 pm
    Jul 6, 2010 at 10:29 pm
  • Whenever I start up pig from the commandline, I see the same message from both -x local and -x mapreduce: [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to ...
    Dave VinerDave Viner
    Jul 1, 2010 at 6:00 am
    Jul 4, 2010 at 6:19 am
  • Hey everybody, Does any body know how I can sort a tuple's content? For example, I have (770001,880001,990001,770001) and I would like to obtain (770001,770001,880001,990001). I tried doing a group ...
    Renato Marroquín MogrovejoRenato Marroquín Mogrovejo
    Jul 22, 2010 at 1:34 am
    Jul 26, 2010 at 5:36 am
  • It seems the Pig 0.7.0 JAR contains Jetty classes. It's causing some classloader problems for a webapp of mine that happens to include the Pig JAR. Is there some reason why this has to be this way? ...
    Xavier StevensXavier Stevens
    Jul 30, 2010 at 4:30 pm
    Aug 9, 2010 at 12:50 am
  • I am new to PIG and running into a fairly basic problem. I have a UDF which depends on some other 3rd party jars & libraries. I can call the UDF from my PIG script either from grunt or by running ...
    Kaluskar, SanjayKaluskar, Sanjay
    Jul 26, 2010 at 6:30 pm
    Aug 6, 2010 at 2:03 pm
  • What are some strategies to have pig and java mapreduce jobs exchange data? E.g. we find a particular pig script in a chain is too slow and we could optimize with a custom mapreduce job we'd want pig ...
    Corbin HoenesCorbin Hoenes
    Jul 25, 2010 at 1:53 pm
    Jul 28, 2010 at 7:49 pm
  • Hello - I'd like to use pig to process log files containing BigDecimals. I'm loading my data as JSON via a custom LoadFunc. One approach seems to be to represent the BigDecimal fields as ...
    ToddGToddG
    Jul 9, 2010 at 12:10 am
    Aug 16, 2010 at 10:06 am
  • Hi all, I was wondering if it would be possible to process images on a low level using PIG. I want to be able to write a pig script that can differentiate between two images.
    Ifeanyichukwu OsujiIfeanyichukwu Osuji
    Jul 26, 2010 at 8:20 pm
    Jul 27, 2010 at 2:44 am
  • All, I am using pig embedded in Java and need to use matches in my pig job. However when I try to use escape characters in the pig line, the compiler complains. How do I use complex regex while ...
    Matthew SmithMatthew Smith
    Jul 20, 2010 at 8:58 pm
    Jul 20, 2010 at 10:31 pm
  • Hi, I would greatly appreciate somebody's help with the following pig error during MR all mappers fail with the following stack trace java.lang.ClassCastException: java.lang.Integer cannot be cast to ...
    Dmitriy LyubimovDmitriy Lyubimov
    Jul 19, 2010 at 10:01 pm
    Jul 20, 2010 at 5:04 am
  • if my table is: t = load 'data' as (v:int, c:int); and I want to do u = foreach t generate c/v as cov; how do I get fraction out of this? because u = foreach t generate (double)c / v; crashes with a ...
    Hc busyHc busy
    Jul 15, 2010 at 10:37 pm
    Jul 19, 2010 at 11:31 pm
  • I've got A = FOREACH ... B = FOREACH ... C = FOREACH ... ... X = UNION A, B, C,... Each of the A, B, C data is a single tuple. I want X ordered by the order specified in the UNION. The data in A, B, ...
    EleinElein
    Jul 28, 2010 at 9:46 pm
    Jul 29, 2010 at 4:01 pm
  • Hi all I am trying to load the Zebra file generated from BasicTableOutputFormat in the MapReduce code. The code is similar with org.apache.hadoop.zebra.mapred.TableMapReduceExample. But it throws ...
    Yuting LinYuting Lin
    Jul 6, 2010 at 4:04 am
    Jul 6, 2010 at 6:13 pm
  • Hi all, I have a strange issue with data types. We have a custom loader which loads data from logs similar to apache logs. It sets the type of a field(var) to be int. When I examine its type, it ...
    Uppuluri, RohiniUppuluri, Rohini
    Jul 29, 2010 at 7:18 pm
    Jul 30, 2010 at 7:39 am
  • Hello Everyone, I am trying to execute below mentioned script, but it is throwing error. Script is: A = load 'ex_groupby' USING PigStorage(',') as (a1:int,a2:int,a3:int); G1 = GROUP A by (a1,a2); ...
    Swati JainSwati Jain
    Jul 28, 2010 at 7:48 pm
    Jul 29, 2010 at 7:10 pm
  • Hi All, This is my first mail in the apache mailing list... please bear with me as I am absolutely new to Hadoop and its family. This is my question... I have some data on my hdfs in the following ...
    Preethi vinayak sunnyPreethi vinayak sunny
    Jul 22, 2010 at 9:05 pm
    Jul 26, 2010 at 8:47 pm
  • Hi INFRA == pig 0.6 OVER hadoop 0.20.2 Here is the scenario A = LOAD 'file:///home/hadoop/a' using PigStorage(',') AS (c1:int,c2:double); DUMP A (2,0.0060) (3,0.0050) (3,0.0060) B = FOREACH A ...
    Rohan RaiRohan Rai
    Jul 22, 2010 at 12:17 pm
    Jul 22, 2010 at 4:42 pm
  • Greetings. I'm trying to query HBase using Pig but do something wrong and cannot figure out what exactly. 1. First, I create a table in HBase: hbase(main):001:0 create 'test_table', 'test_family' and ...
    Dmitry DemeshchukDmitry Demeshchuk
    Jul 21, 2010 at 5:42 pm
    Jul 21, 2010 at 7:33 pm
  • Hello I am working on a dataset which has relations of the type: data: {a: (a1: chararray,a2_bag: {a2_tuple: (a21: chararray,a22: chararray)}, a3_bag: {a3_tuple: (a3: long)})} What this means that, ...
    Sparsh GuptaSparsh Gupta
    Jul 8, 2010 at 9:58 am
    Jul 9, 2010 at 11:10 pm
  • Hi all, Is there a way to use the built-in functions of Pig (or has someone already written a UDF) to create a similar result to SQL's GROUP_CONCAT? The idea is that I have a long list of book ISBN ...
    Raviv M-GRaviv M-G
    Jul 6, 2010 at 8:32 pm
    Jul 7, 2010 at 5:56 pm
  • I call a pig script pig -param DATE=$date ~/bin/script.pig inside of the pig script I filter records thru a perl udf DEFINE parser `parser.pl` SHIP ( 'parser.pl' ); ...
    Kochis, AllanKochis, Allan
    Jul 27, 2010 at 1:18 pm
    Jul 27, 2010 at 6:45 pm
  • Top UDF in piggybank isn't sorted. What do people think about allowing a user to pass some kind of ordering to it?
    Corbin HoenesCorbin Hoenes
    Jul 26, 2010 at 9:19 pm
    Jul 26, 2010 at 10:26 pm
  • I want to do know whether it's possible to do loop in pig and end loop by some feedback variable. More specifically 1. I want to read a set of files/directories with different names, and process them ...
    Yong-gang CaoYong-gang Cao
    Jul 24, 2010 at 2:14 am
    Jul 24, 2010 at 2:31 am
  • Hi, I am seeing the below two errors consistently while running my script. This does not stop my script, but throws an error on datanode and successfully completes the same task on a different ...
    Syed WastiSyed Wasti
    Jul 22, 2010 at 6:46 pm
    Jul 22, 2010 at 8:54 pm
  • Hi all, We're building an application that starts multiple pig jobs in parallel by using PigServer. However, Pig doesn't seem to be thread-safe. And since we're running a Java application, I'm not ...
    Wouter de BieWouter de Bie
    Jul 21, 2010 at 5:44 pm
    Jul 22, 2010 at 9:03 am
  • Hi I would like to distribute cache a file of key value pairs. Mridul pointed that to do this set mapred.cached.archives=hdfs://host:port/mypath/file#link mapred.create.symlink=yes My question is how ...
    Kochis, AllanKochis, Allan
    Jul 20, 2010 at 7:02 pm
    Jul 21, 2010 at 10:36 pm
  • All, I am using pig embedded in Java and need to use matches in my pig job. However when I try to use escape characters in the pig line, the compiler complains. How do I use complex regex while ...
    Matthew SmithMatthew Smith
    Jul 20, 2010 at 8:57 pm
    Jul 21, 2010 at 5:28 am
  • Hi, I dont know how to include my hadoop/conf directory to my classpath. Can someone help? Thanks. Ifeanyichukwu Osuji
    Ifeanyichukwu OsujiIfeanyichukwu Osuji
    Jul 20, 2010 at 5:21 pm
    Jul 20, 2010 at 5:24 pm
  • Hello everyone, I am having problems running pig with hadoop. Every time i try to run pig in hadoopmode/mapreducemode i get this: pig-0.7.0$ bin/pig 10/07/20 13:03:11 INFO pig.Main: Logging error ...
    Ifeanyichukwu OsujiIfeanyichukwu Osuji
    Jul 20, 2010 at 5:08 pm
    Jul 20, 2010 at 5:14 pm
  • The default is 1 GB for pig to run. Has anyone had any success running with less say 512? We have a lot of jobs running at the same time--hoping to be able to run them with less ram.
    Corbin HoenesCorbin Hoenes
    Jul 8, 2010 at 5:17 pm
    Jul 9, 2010 at 10:42 pm
  • Hey John, I think pig-user@hadoop.apache.org is what you had in mind. I believe Zebra ( http://hadoop.apache.org/pig/docs/r0.7.0/zebra_overview.html) may be of interest to you. Thanks, Jeff
    Jeff HammerbacherJeff Hammerbacher
    Jul 8, 2010 at 1:43 am
    Jul 8, 2010 at 1:44 am
  • Whenever I start up pig from the commandline, I see the same message from both -x local and -x mapreduce: [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to ...
    Dave VinerDave Viner
    Jul 1, 2010 at 6:00 am
    Jul 1, 2010 at 8:41 am
  • Hi All, We are planning to hold the next Hadoop India User Group meet up on 31st July 2010 in Noida, India. The registration and event details are available at - ...
    Sanjay SharmaSanjay Sharma
    Jul 29, 2010 at 5:52 pm
    Jul 29, 2010 at 5:52 pm
  • Please see http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html . You can use Oset default_parallel 10¹ to ask query to use 10 reducers for all MR jobs, or specify Oparallel x¹ in the pig ...
    Thejas M NairThejas M Nair
    Jul 29, 2010 at 5:19 pm
    Jul 29, 2010 at 5:19 pm
  • Hey guys, I'm trying to determine if a value in one data set is present in a range of 2 values in another data set. Here is my thought process so far: StartData1 = Load data1 as (number1:long, ...
    Matthew SmithMatthew Smith
    Jul 26, 2010 at 11:34 pm
    Jul 26, 2010 at 11:34 pm
  • What are some strategies to have pig and java mapreduce jobs exchange data? E.g. we find a particular pig script in a chain is too slow and we could optimize with a custom mapreduce job we'd want pig ...
    CorbinCorbin
    Jul 23, 2010 at 5:21 pm
    Jul 23, 2010 at 5:21 pm
  • I keep gettin this error when i try to compile a java program using pig...can anyone help? Thanks pig-0.7.0$ javac -cp pig.jar idmapreduce.java idmapreduce.java:2: package org.apache.pig does not ...
    Ifeanyichukwu OsujiIfeanyichukwu Osuji
    Jul 22, 2010 at 4:46 pm
    Jul 22, 2010 at 4:46 pm
  • Is there a good way to "un-nest" a json string in pig? It doesn't look like I can assign a map value to a map. Is it best practice just to "flatten" the json in a custom loadfunc? -Kim
    Kim VogtKim Vogt
    Jul 16, 2010 at 1:57 am
    Jul 16, 2010 at 1:57 am
  • Hi All, We are planning to hold the next Hadoop India User Group meet up on 31st July 2010 in Noida, India. The registration and event details are available at - ...
    Sanjay SharmaSanjay Sharma
    Jul 15, 2010 at 3:22 pm
    Jul 15, 2010 at 3:22 pm
  • I'm excited to announce the first Seattle Hadoop Day, on August 14th! Get your tickets at http://hadoopday2010.eventbrite.com. Hadoop Day is a day-long community-organized event where we gather to ...
    Bradford StephensBradford Stephens
    Jul 12, 2010 at 9:57 pm
    Jul 12, 2010 at 9:57 pm
  • Alan GatesAlan Gates
    Jul 12, 2010 at 5:53 pm
    Jul 12, 2010 at 5:53 pm
  • ROOM CHANGE TO 211 ROOM CHANGE TO 211 Hello Fellow Hadoopists, We are meeting at 7:15 pm on July 15th at the University Heights Community Center 5031 University Way NE Seattle WA 98105 Room #211 note ...
    Sean Jensen-GreySean Jensen-Grey
    Jul 4, 2010 at 1:52 am
    Jul 4, 2010 at 1:52 am
Group Navigation
period‹ prev | Jul 2010 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions48
posts198
users52
websitepig.apache.org

52 users for July 2010

Dmitriy Ryaboy: 28 posts Corbin Hoenes: 12 posts Mridul Muralidharan: 12 posts Ashutosh Chauhan: 10 posts Hc busy: 9 posts Scott Carey: 9 posts Syed Wasti: 9 posts Thejas M Nair: 8 posts Brian Adams: 7 posts Dave Viner: 7 posts Matthew Smith: 7 posts Renato Marroquín Mogrovejo: 6 posts Russell Jurney: 5 posts Vincent Barat: 5 posts Dmitriy Lyubimov: 4 posts Ifeanyichukwu Osuji: 4 posts Alan Gates: 3 posts Harsh J: 3 posts Jeff Zhang: 3 posts ToddG: 3 posts
show more