Search Discussions
-
In Java, I am trying to convert a DataBag from it's String representation with its schema String to a valid DataBag Object: String databag_string = "{(apples,1024)}"; String schema_string = ...
Dan DeCapria, CivicScience
Mar 18, 2013 at 8:19 pm
Mar 21, 2013 at 3:52 pm -
Hi, i'm using hadoop 1.0.4, cassandra 1.2.2 and pig 0.11.0. Can any one help me with an example on how to use pig either for Storing to cassandra from *pig* using Cassandrastorage, or Loading rows ...
Mohammed Abdelkhalek
Mar 18, 2013 at 3:15 pm
Mar 18, 2013 at 5:41 pm -
Hi, I executed below PIG commands. X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS (line:chararray); Y=foreach data { generate STRSPLIT(line,',') ;}; And I get below error. What is wrong in my ...
Mix Nin
Mar 5, 2013 at 10:49 pm
Mar 5, 2013 at 11:51 pm -
When I try to run pig 0.12.0, I got the following error $ pig12 -param input="t" -param output="s" -c b224G_1.pig log4j:ERROR Could not find value for key log4j.appender.NullAppender log4j:ERROR ...
Danfeng Li
Mar 12, 2013 at 9:50 pm
Mar 13, 2013 at 5:28 pm -
If I define and set tuple like this: Tuple t1 = mTupleFactory.newTuple(2); t1.set(0, "Hello"); t1.set(1, NULL); and have schema like: b:bag{t:tuple(a:chararray, b:chararray) and then in the pig ...
Mohit Anchlia
Mar 7, 2013 at 12:59 am
Mar 7, 2013 at 8:34 pm -
I can start a grunt shell just fine: -bash-3.2$ pwd /home/rfcompton/Downloads/pig-0.11.0-src -bash-3.2$ ./bin/pig 2013-03-21 12:55:00,048 [main] INFO org.apache.pig.Main - Apache Pig version ...
Ryan Compton
Mar 21, 2013 at 8:06 pm
Mar 21, 2013 at 11:17 pm -
How do I remove the last item in a bag. For example: (group_1,{(2012-12-15,a),(2012-12-17,a),(2012-12-23,c)}) I would like to remove the last item so that the following is the result ...
Chan, Tim
Mar 12, 2013 at 11:33 pm
Mar 15, 2013 at 7:46 pm -
I am writing a loader for a storage format, which partitions by a particular field in the record. So I would like to implement something which can push down filters on the partitioned field so that ...
Jeff Yuan
Mar 14, 2013 at 8:31 pm
Mar 15, 2013 at 10:17 am -
Hello All, I have dataset like 0, 10.1, 20.1, 30, 40, 50, 60, 70, 80.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 56, 6, 7, 8, 9, 9, 9, 9, 12, 1, 3, 14, 1, 5, 6, 7, 8, 8, ...
Preeti Gupta
Mar 4, 2013 at 11:19 pm
Mar 5, 2013 at 10:49 pm -
Sorry for posting same issue multiple times I wrote a pig script as follows and stored it in x.pig file Data = LOAD '/....' as (,,,, ) NoNullData= FILTER Data by qe is not null; STORE (foreach (group ...
Mix Nin
Mar 27, 2013 at 9:58 pm
Mar 28, 2013 at 4:20 pm -
Hello, Can I compute SUM or AVG without using GROUPBY OR FILTER?
Preeti Gupta
Mar 4, 2013 at 11:50 pm
Mar 5, 2013 at 10:06 pm -
The JsonLoader works, but problem is I'm not loading a JSON file, but just trying to parse a json string as part of a bigger data set. That's why I needed to use JsonStringToMap.
Eli Finkelshteyn
Mar 1, 2013 at 8:24 pm
Mar 4, 2013 at 5:05 pm -
We have some very long pig scripts that run several times per day. We believe that the script parsing process takes very long (about 1h). During this time, the pig command just hangs before any ...
Patrick Salami
Mar 28, 2013 at 7:51 pm
Apr 3, 2013 at 8:28 pm -
Hi, I am unable to typecast fields loaded from my hbase to anything other than default bytearray. I tried both during the LOAD statement and using typecast after loading. Neither works. The script ...
Praveen Bysani
Mar 27, 2013 at 8:30 am
Apr 1, 2013 at 2:43 am -
Hi all, Could anyone be kind enough to point me to some examples on using the COVARIANCE and the CORRELATION UDFS described in here?[1] Renato M. [1] https://issues.apache.org/jira/browse/PIG-277
Renato Marroquín Mogrovejo
Mar 26, 2013 at 10:29 pm
Mar 28, 2013 at 9:42 pm -
Hi there, I have an EvalFunc which uses an internal class that opens up connections to a Redis and MongoDB server. This class has a close() method which closes connections to both Redis and MongoDB ...
Mike Sukmanowsky
Mar 14, 2013 at 9:05 pm
Mar 26, 2013 at 2:48 pm -
Hi, I am trying to run a simple pig script that uses HbaseStorage class to load data from a hbase table. The pig script runs perfectly fine when run standalone in mapreduce mode. But when i submit it ...
Praveen Bysani
Mar 14, 2013 at 9:29 am
Mar 19, 2013 at 8:46 pm -
Hello I'm trying to find a SUM of a range of fields, and am having difficulty. I have the following data structure (from the movielens public dataset) where there's a "fixed" field of "Name" and ...
Nathan Neff
Mar 10, 2013 at 2:45 pm
Mar 19, 2013 at 8:17 pm -
Hi! I am using Pig 0.10 version and I have a question about mapping nested JSON objects from Hbase. *For example: * The below commands loads the field family from Hbase. fields = load ...
Kiran chitturi
Mar 14, 2013 at 3:38 am
Mar 14, 2013 at 3:09 pm -
I have a file with below data xxxxx 11,22,33 44,55,66 77,88,99 I wrote below PIG script X= LOAD '/user/lnindrakrishna/tmp/ExpTag.txt' AS (id :chararray,qc :chararray ,qt :chararray ,qe :chararray ) ...
Mix Nin
Mar 7, 2013 at 12:42 am
Mar 7, 2013 at 8:43 pm -
suppose my data has 100 columns or fields, and i want to impose a schema. is there a way i can create a separate file describing the schema of these fields, and let PIG read the schema from that ...
Vadi Hombal
Mar 27, 2013 at 1:31 pm
Mar 28, 2013 at 4:15 pm -
Hi there, In our system, we have multiple pig scripts that run against a particular HDFS directory. The pig scripts can run at different times, and are scheduled to run regularly. Is there a way to ...
John Farrelly
Mar 27, 2013 at 10:25 am
Mar 27, 2013 at 3:33 pm -
Since there is not date datatype, how do I filter on a date column? I've been setting the date column as a chararray. I would like to do something like: a = filter b by date_col < '2013-01-01';
Tim Chan
Mar 21, 2013 at 10:11 pm
Mar 22, 2013 at 6:04 am -
Hi there, I would like to do something very similar to a nested foreach with using order by and then limit. But I would like to limit on a relation to the total number of records. users = load ...
Marco Cadetg
Mar 18, 2013 at 10:23 am
Mar 19, 2013 at 7:49 am -
Hi, Can we define a UDF in pig that takes a bag as an input and returns another bag as output? How can this be done? Thanks, -- regards Pranjal
Pranjal rajput
Mar 18, 2013 at 9:27 am
Mar 18, 2013 at 3:58 pm -
Hi! I am using Pig 0.10.0 with Hbase in distributed mode to read the records and I have used this command below. fields = load 'hbase://documents' using ...
Kiran chitturi
Mar 13, 2013 at 2:49 pm
Mar 15, 2013 at 3:17 am -
Fellow Hadoopers, We'd like to introduce a joint project between Twitter and Cloudera engineers -- a new columnar storage format for Hadoop called Parquet ( http://parquet.github.com). We created ...
Dmitriy Ryaboy
Mar 12, 2013 at 5:30 pm
Mar 13, 2013 at 7:39 pm -
If I have a bag and would like to remove dupes, while saving the first occurrence, is this possible? For example, for the following bag: (group_1,{(2012-12-15,a),(2012-12-17,a),(2012-12-23,c)}) I ...
Chan, Tim
Mar 8, 2013 at 10:01 pm
Mar 8, 2013 at 11:22 pm -
Hello, I have a file of size 9GB and having approximately 109.5 million records. I execute a pig script on this file that is doing: 1. Group by on a field of the file 2. Count number of records in ...
Panshul Whisper
Mar 6, 2013 at 2:29 pm
Mar 8, 2013 at 2:48 am -
I have a couple of questions regarding job result and schema. The context is that I'm trying to create a custom entry point for Pig that takes a script, executes it, and always stores the last ...
Jeff Yuan
Mar 5, 2013 at 7:18 pm
Mar 5, 2013 at 10:09 pm -
Hi guys, I'm running pig from the command line in local mode, and trying to pass in some properties, for example: pig -x local ... -p mapred.map.tasks=2 -p mapred.reduce.tasks=1 ... I'm getting ...
Jeff Yuan
Mar 2, 2013 at 12:04 am
Mar 3, 2013 at 6:12 am -
Hi guys, I have a quick question about configuring Pig correctly when used in a embedded java program: ie my code instantiates PigServer and registers queries to it. How do I set the directory to ...
Jeff Yuan
Mar 29, 2013 at 7:50 pm
Mar 29, 2013 at 8:17 pm -
Is there an interface to get the standard out and standard error streams for a pig execution? I'm using the Java interface and directly calling PigServer.executeBatch() for example and getting back ...
Jeff Yuan
Mar 20, 2013 at 9:00 pm
Mar 22, 2013 at 8:51 pm -
I'm using parameter passing to pass an input path to my pig script. This does not seem to work: -param input=/path1/{08,09,10,11,12}/*/data/,/path2/{01,02,03}/*/data/
Tim Chan
Mar 20, 2013 at 11:15 pm
Mar 21, 2013 at 7:20 am -
Hi, I am new to Pig. I have a dataset from a time-tracker application. It records the the time that users spend on various activities. For example: UserId | Activity | Tool | BeginTime | EndTime | ...
Pranjal rajput
Mar 15, 2013 at 5:04 pm
Mar 17, 2013 at 7:33 pm -
All, Is there an easy way to read Hive LazySimpleSerde encoded files in Pig? I did some research and found support for Hive's columnar format and for SequenceFiles, but did not see anything for ...
Shawn Hermans
Mar 12, 2013 at 6:17 pm
Mar 13, 2013 at 3:39 pm -
Hello! I successfully read from HBase table using: table = load 'hbase://temp' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:c1, cf:c2', '-loadKey true') as (key:chararray, c1:bytearray, ...
Byte Array
Mar 11, 2013 at 11:29 am
Mar 11, 2013 at 5:04 pm -
hi, I m trying to use the following statement in Pig to parse out my data. B = FOREACH A GENERATE FLATTEN( REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS ...
John Meek
Mar 10, 2013 at 2:58 am
Mar 10, 2013 at 2:38 pm -
Hello! I have a script that gives me following result: time_grouped = GROUP joined BY (ip, hour); counts = FOREACH time_grouped GENERATE group.ip as ip, group.hour as hour, COUNT(joined) as count ...
Eugene Morozov
Mar 6, 2013 at 3:21 pm
Mar 8, 2013 at 7:22 pm -
Hello, Is it possible to use hadoop fs commands in a pig script? What i exactly want to do is, at the end of my pig script, after the execution of store in a file command, I want the pig script to ...
Panshul Whisper
Mar 6, 2013 at 11:34 am
Mar 6, 2013 at 9:16 pm -
I am trying to upload to S3 using pig but I get: grunt store A into 's3://BBBBBCCKIAJV5KGMZVA:KKKKxmw5F7I4AWd6rDRA@ /bucket/1/2/a'; 2013-03-04 18:24:39,475 [main] INFO ...
Mohit Anchlia
Mar 4, 2013 at 11:32 pm
Mar 5, 2013 at 5:45 pm -
Does anyone know of any storefunc/loadfunc for AWS S3 that is available?
Mohit Anchlia
Mar 2, 2013 at 7:51 pm
Mar 3, 2013 at 4:58 am -
*Hi * * * *I have a file that has data as follows * * **AA*:11,22,33;*BB*:144,244,344;*CC*:yny;*DD*:11,33;*EE*:144,344 ; 11111 I need output as follows Event key AA BB CC 11111 11 144 y 11111 22 244 ...
Mix Nin
Mar 1, 2013 at 10:29 pm
Mar 2, 2013 at 8:39 am -
Downloaded pig from http://download.nextag.com/apache/pig/pig-0.11.0/pig-0.11.0.tar.gz Running pig-0.11.0/bin/pig I see ERROR 2998: Unhandled internal error ...
Arun Ahuja
Mar 28, 2013 at 6:36 pm
Mar 28, 2013 at 8:55 pm -
I understand in the traditional map/reduce paradigm that each key will get sent to the same reducer sorted but in pig there is no such thing as a "key". I'm curious to know how pig knows to which ...
Mark
Mar 27, 2013 at 6:46 pm
Mar 28, 2013 at 4:23 pm -
Dear pig users, What does it mean when pig [Cloudera Pig version 0.10.0-cdh4.1.2] reports 2013-03-25 14:46:31,186 [main] INFO org.apache.pig.Main - Logging error messages to ...
William Dowling
Mar 25, 2013 at 7:42 pm
Mar 25, 2013 at 9:48 pm -
Hello all, When I first saw pig, I was under the impressing that it generated java code for a series of map/reduce jobs and then submitted that to hadoop. I have since seen messages that indicate the ...
Gardner Pomper
Mar 17, 2013 at 11:26 pm
Mar 21, 2013 at 6:55 pm -
I'm trying to test a custom LOAD class, which also contains the code for STORE. I put in a STORE in my pigUnit script. but the resulting file is never created. is STORE always skipped in pigUnit? in ...
Yang
Mar 15, 2013 at 10:23 pm
Mar 17, 2013 at 5:01 am -
1. How to display the column names in pig in a console. 2. When using dump, can we just get the top 10 rows rather than all other rows. Your help is appreciated. Thanks Sai
Sai Sai
Mar 16, 2013 at 6:38 am
Mar 16, 2013 at 8:34 am
Group Overview
group | user |
categories | pig, hadoop |
discussions | 96 |
posts | 362 |
users | 95 |
website | pig.apache.org |
95 users for March 2013
Archives
- May 2013 (92)
- April 2013 (226)
- March 2013 (362)
- February 2013 (192)
- January 2013 (166)
- December 2012 (115)
- November 2012 (223)
- October 2012 (249)
- September 2012 (275)
- August 2012 (249)
- July 2012 (219)
- June 2012 (371)
- May 2012 (281)
- April 2012 (377)
- March 2012 (341)
- February 2012 (323)
- January 2012 (364)
- December 2011 (266)
- November 2011 (234)
- October 2011 (207)
- September 2011 (321)
- August 2011 (271)
- July 2011 (253)
- June 2011 (249)
- May 2011 (239)
- April 2011 (341)
- March 2011 (321)
- February 2011 (276)
- January 2011 (320)
- December 2010 (244)
- November 2010 (136)
- October 2010 (251)
- September 2010 (161)
- August 2010 (201)
- July 2010 (198)
- June 2010 (171)
- May 2010 (205)
- April 2010 (192)
- March 2010 (237)
- February 2010 (192)
- January 2010 (182)
- December 2009 (106)
- November 2009 (169)
- October 2009 (105)
- September 2009 (134)
- August 2009 (108)
- July 2009 (140)
- June 2009 (151)
- May 2009 (150)
- April 2009 (133)
- March 2009 (124)
- February 2009 (119)
- January 2009 (66)
- December 2008 (45)
- November 2008 (80)
- October 2008 (102)
- September 2008 (112)
- August 2008 (32)
- July 2008 (46)
- June 2008 (78)
- May 2008 (79)
- April 2008 (26)
- March 2008 (42)
- February 2008 (30)
- January 2008 (15)
- December 2007 (31)
- November 2007 (13)
- October 2007 (9)