Grokbase Groups Pig user March 2013
FAQ

Search Discussions

96 discussions - 362 posts

  • In Java, I am trying to convert a DataBag from it's String representation with its schema String to a valid DataBag Object: String databag_string = "{(apples,1024)}"; String schema_string = ...
    Dan DeCapria, CivicScienceDan DeCapria, CivicScience
    Mar 18, 2013 at 8:19 pm
    Mar 21, 2013 at 3:52 pm
  • Hi, i'm using hadoop 1.0.4, cassandra 1.2.2 and pig 0.11.0. Can any one help me with an example on how to use pig either for Storing to cassandra from *pig* using Cassandrastorage, or Loading rows ...
    Mohammed AbdelkhalekMohammed Abdelkhalek
    Mar 18, 2013 at 3:15 pm
    Mar 18, 2013 at 5:41 pm
  • Hi, I executed below PIG commands. X= LOAD '/user/lnindrakrishna/input/ExpTag.txt' AS (line:chararray); Y=foreach data { generate STRSPLIT(line,',') ;}; And I get below error. What is wrong in my ...
    Mix NinMix Nin
    Mar 5, 2013 at 10:49 pm
    Mar 5, 2013 at 11:51 pm
  • When I try to run pig 0.12.0, I got the following error $ pig12 -param input="t" -param output="s" -c b224G_1.pig log4j:ERROR Could not find value for key log4j.appender.NullAppender log4j:ERROR ...
    Danfeng LiDanfeng Li
    Mar 12, 2013 at 9:50 pm
    Mar 13, 2013 at 5:28 pm
  • If I define and set tuple like this: Tuple t1 = mTupleFactory.newTuple(2); t1.set(0, "Hello"); t1.set(1, NULL); and have schema like: b:bag{t:tuple(a:chararray, b:chararray) and then in the pig ...
    Mohit AnchliaMohit Anchlia
    Mar 7, 2013 at 12:59 am
    Mar 7, 2013 at 8:34 pm
  • I can start a grunt shell just fine: -bash-3.2$ pwd /home/rfcompton/Downloads/pig-0.11.0-src -bash-3.2$ ./bin/pig 2013-03-21 12:55:00,048 [main] INFO org.apache.pig.Main - Apache Pig version ...
    Ryan ComptonRyan Compton
    Mar 21, 2013 at 8:06 pm
    Mar 21, 2013 at 11:17 pm
  • How do I remove the last item in a bag. For example: (group_1,{(2012-12-15,a),(2012-12-17,a),(2012-12-23,c)}) I would like to remove the last item so that the following is the result ...
    Chan, TimChan, Tim
    Mar 12, 2013 at 11:33 pm
    Mar 15, 2013 at 7:46 pm
  • I am writing a loader for a storage format, which partitions by a particular field in the record. So I would like to implement something which can push down filters on the partitioned field so that ...
    Jeff YuanJeff Yuan
    Mar 14, 2013 at 8:31 pm
    Mar 15, 2013 at 10:17 am
  • Hello All, I have dataset like 0, 10.1, 20.1, 30, 40, 50, 60, 70, 80.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2, 3, 4, 5, 56, 6, 7, 8, 9, 9, 9, 9, 12, 1, 3, 14, 1, 5, 6, 7, 8, 8, ...
    Preeti GuptaPreeti Gupta
    Mar 4, 2013 at 11:19 pm
    Mar 5, 2013 at 10:49 pm
  • Sorry for posting same issue multiple times I wrote a pig script as follows and stored it in x.pig file Data = LOAD '/....' as (,,,, ) NoNullData= FILTER Data by qe is not null; STORE (foreach (group ...
    Mix NinMix Nin
    Mar 27, 2013 at 9:58 pm
    Mar 28, 2013 at 4:20 pm
  • Hello, Can I compute SUM or AVG without using GROUPBY OR FILTER?
    Preeti GuptaPreeti Gupta
    Mar 4, 2013 at 11:50 pm
    Mar 5, 2013 at 10:06 pm
  • The JsonLoader works, but problem is I'm not loading a JSON file, but just trying to parse a json string as part of a bigger data set. That's why I needed to use JsonStringToMap.
    Eli FinkelshteynEli Finkelshteyn
    Mar 1, 2013 at 8:24 pm
    Mar 4, 2013 at 5:05 pm
  • We have some very long pig scripts that run several times per day. We believe that the script parsing process takes very long (about 1h). During this time, the pig command just hangs before any ...
    Patrick SalamiPatrick Salami
    Mar 28, 2013 at 7:51 pm
    Apr 3, 2013 at 8:28 pm
  • Hi, I am unable to typecast fields loaded from my hbase to anything other than default bytearray. I tried both during the LOAD statement and using typecast after loading. Neither works. The script ...
    Praveen BysaniPraveen Bysani
    Mar 27, 2013 at 8:30 am
    Apr 1, 2013 at 2:43 am
  • Hi all, Could anyone be kind enough to point me to some examples on using the COVARIANCE and the CORRELATION UDFS described in here?[1] Renato M. [1] https://issues.apache.org/jira/browse/PIG-277
    Renato Marroquín MogrovejoRenato Marroquín Mogrovejo
    Mar 26, 2013 at 10:29 pm
    Mar 28, 2013 at 9:42 pm
  • Hi there, I have an EvalFunc which uses an internal class that opens up connections to a Redis and MongoDB server. This class has a close() method which closes connections to both Redis and MongoDB ...
    Mike SukmanowskyMike Sukmanowsky
    Mar 14, 2013 at 9:05 pm
    Mar 26, 2013 at 2:48 pm
  • Hi, I am trying to run a simple pig script that uses HbaseStorage class to load data from a hbase table. The pig script runs perfectly fine when run standalone in mapreduce mode. But when i submit it ...
    Praveen BysaniPraveen Bysani
    Mar 14, 2013 at 9:29 am
    Mar 19, 2013 at 8:46 pm
  • Hello I'm trying to find a SUM of a range of fields, and am having difficulty. I have the following data structure (from the movielens public dataset) where there's a "fixed" field of "Name" and ...
    Nathan NeffNathan Neff
    Mar 10, 2013 at 2:45 pm
    Mar 19, 2013 at 8:17 pm
  • Hi! I am using Pig 0.10 version and I have a question about mapping nested JSON objects from Hbase. *For example: * The below commands loads the field family from Hbase. fields = load ...
    Kiran chitturiKiran chitturi
    Mar 14, 2013 at 3:38 am
    Mar 14, 2013 at 3:09 pm
  • I have a file with below data xxxxx 11,22,33 44,55,66 77,88,99 I wrote below PIG script X= LOAD '/user/lnindrakrishna/tmp/ExpTag.txt' AS (id :chararray,qc :chararray ,qt :chararray ,qe :chararray ) ...
    Mix NinMix Nin
    Mar 7, 2013 at 12:42 am
    Mar 7, 2013 at 8:43 pm
  • suppose my data has 100 columns or fields, and i want to impose a schema. is there a way i can create a separate file describing the schema of these fields, and let PIG read the schema from that ...
    Vadi HombalVadi Hombal
    Mar 27, 2013 at 1:31 pm
    Mar 28, 2013 at 4:15 pm
  • Hi there, In our system, we have multiple pig scripts that run against a particular HDFS directory. The pig scripts can run at different times, and are scheduled to run regularly. Is there a way to ...
    John FarrellyJohn Farrelly
    Mar 27, 2013 at 10:25 am
    Mar 27, 2013 at 3:33 pm
  • Since there is not date datatype, how do I filter on a date column? I've been setting the date column as a chararray. I would like to do something like: a = filter b by date_col < '2013-01-01';
    Tim ChanTim Chan
    Mar 21, 2013 at 10:11 pm
    Mar 22, 2013 at 6:04 am
  • Hi there, I would like to do something very similar to a nested foreach with using order by and then limit. But I would like to limit on a relation to the total number of records. users = load ...
    Marco CadetgMarco Cadetg
    Mar 18, 2013 at 10:23 am
    Mar 19, 2013 at 7:49 am
  • Hi, Can we define a UDF in pig that takes a bag as an input and returns another bag as output? How can this be done? Thanks, -- regards Pranjal
    Pranjal rajputPranjal rajput
    Mar 18, 2013 at 9:27 am
    Mar 18, 2013 at 3:58 pm
  • Hi! I am using Pig 0.10.0 with Hbase in distributed mode to read the records and I have used this command below. fields = load 'hbase://documents' using ...
    Kiran chitturiKiran chitturi
    Mar 13, 2013 at 2:49 pm
    Mar 15, 2013 at 3:17 am
  • Fellow Hadoopers, We'd like to introduce a joint project between Twitter and Cloudera engineers -- a new columnar storage format for Hadoop called Parquet ( http://parquet.github.com). We created ...
    Dmitriy RyaboyDmitriy Ryaboy
    Mar 12, 2013 at 5:30 pm
    Mar 13, 2013 at 7:39 pm
  • If I have a bag and would like to remove dupes, while saving the first occurrence, is this possible? For example, for the following bag: (group_1,{(2012-12-15,a),(2012-12-17,a),(2012-12-23,c)}) I ...
    Chan, TimChan, Tim
    Mar 8, 2013 at 10:01 pm
    Mar 8, 2013 at 11:22 pm
  • Hello, I have a file of size 9GB and having approximately 109.5 million records. I execute a pig script on this file that is doing: 1. Group by on a field of the file 2. Count number of records in ...
    Panshul WhisperPanshul Whisper
    Mar 6, 2013 at 2:29 pm
    Mar 8, 2013 at 2:48 am
  • I have a couple of questions regarding job result and schema. The context is that I'm trying to create a custom entry point for Pig that takes a script, executes it, and always stores the last ...
    Jeff YuanJeff Yuan
    Mar 5, 2013 at 7:18 pm
    Mar 5, 2013 at 10:09 pm
  • Hi guys, I'm running pig from the command line in local mode, and trying to pass in some properties, for example: pig -x local ... -p mapred.map.tasks=2 -p mapred.reduce.tasks=1 ... I'm getting ...
    Jeff YuanJeff Yuan
    Mar 2, 2013 at 12:04 am
    Mar 3, 2013 at 6:12 am
  • Hi guys, I have a quick question about configuring Pig correctly when used in a embedded java program: ie my code instantiates PigServer and registers queries to it. How do I set the directory to ...
    Jeff YuanJeff Yuan
    Mar 29, 2013 at 7:50 pm
    Mar 29, 2013 at 8:17 pm
  • Is there an interface to get the standard out and standard error streams for a pig execution? I'm using the Java interface and directly calling PigServer.executeBatch() for example and getting back ...
    Jeff YuanJeff Yuan
    Mar 20, 2013 at 9:00 pm
    Mar 22, 2013 at 8:51 pm
  • I'm using parameter passing to pass an input path to my pig script. This does not seem to work: -param input=/path1/{08,09,10,11,12}/*/data/,/path2/{01,02,03}/*/data/
    Tim ChanTim Chan
    Mar 20, 2013 at 11:15 pm
    Mar 21, 2013 at 7:20 am
  • Hi, I am new to Pig. I have a dataset from a time-tracker application. It records the the time that users spend on various activities. For example: UserId | Activity | Tool | BeginTime | EndTime | ...
    Pranjal rajputPranjal rajput
    Mar 15, 2013 at 5:04 pm
    Mar 17, 2013 at 7:33 pm
  • All, Is there an easy way to read Hive LazySimpleSerde encoded files in Pig? I did some research and found support for Hive's columnar format and for SequenceFiles, but did not see anything for ...
    Shawn HermansShawn Hermans
    Mar 12, 2013 at 6:17 pm
    Mar 13, 2013 at 3:39 pm
  • Hello! I successfully read from HBase table using: table = load 'hbase://temp' using org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:c1, cf:c2', '-loadKey true') as (key:chararray, c1:bytearray, ...
    Byte ArrayByte Array
    Mar 11, 2013 at 11:29 am
    Mar 11, 2013 at 5:04 pm
  • hi, I m trying to use the following statement in Pig to parse out my data. B = FOREACH A GENERATE FLATTEN( REGEX_EXTRACT_ALL(line, '^(.+?)\\-(.+?)\\s(.+?)\\-(.)(.)\\s(.+)$')) AS ...
    John MeekJohn Meek
    Mar 10, 2013 at 2:58 am
    Mar 10, 2013 at 2:38 pm
  • Hello! I have a script that gives me following result: time_grouped = GROUP joined BY (ip, hour); counts = FOREACH time_grouped GENERATE group.ip as ip, group.hour as hour, COUNT(joined) as count ...
    Eugene MorozovEugene Morozov
    Mar 6, 2013 at 3:21 pm
    Mar 8, 2013 at 7:22 pm
  • Hello, Is it possible to use hadoop fs commands in a pig script? What i exactly want to do is, at the end of my pig script, after the execution of store in a file command, I want the pig script to ...
    Panshul WhisperPanshul Whisper
    Mar 6, 2013 at 11:34 am
    Mar 6, 2013 at 9:16 pm
  • I am trying to upload to S3 using pig but I get: grunt store A into 's3://BBBBBCCKIAJV5KGMZVA:KKKKxmw5F7I4AWd6rDRA@ /bucket/1/2/a'; 2013-03-04 18:24:39,475 [main] INFO ...
    Mohit AnchliaMohit Anchlia
    Mar 4, 2013 at 11:32 pm
    Mar 5, 2013 at 5:45 pm
  • Does anyone know of any storefunc/loadfunc for AWS S3 that is available?
    Mohit AnchliaMohit Anchlia
    Mar 2, 2013 at 7:51 pm
    Mar 3, 2013 at 4:58 am
  • *Hi * * * *I have a file that has data as follows * * **AA*:11,22,33;*BB*:144,244,344;*CC*:yny;*DD*:11,33;*EE*:144,344 ; 11111 I need output as follows Event key AA BB CC 11111 11 144 y 11111 22 244 ...
    Mix NinMix Nin
    Mar 1, 2013 at 10:29 pm
    Mar 2, 2013 at 8:39 am
  • Hi, I might be a little bit late. I come up with a new idea for the last minute. Currently I'm working on social graph processing. I think we can implement a solution for pig. With this idea I'm ...
    BurakkkBurakkk
    Mar 28, 2013 at 8:28 pm
    Mar 30, 2013 at 5:13 pm
  • Downloaded pig from http://download.nextag.com/apache/pig/pig-0.11.0/pig-0.11.0.tar.gz Running pig-0.11.0/bin/pig I see ERROR 2998: Unhandled internal error ...
    Arun AhujaArun Ahuja
    Mar 28, 2013 at 6:36 pm
    Mar 28, 2013 at 8:55 pm
  • I understand in the traditional map/reduce paradigm that each key will get sent to the same reducer sorted but in pig there is no such thing as a "key". I'm curious to know how pig knows to which ...
    MarkMark
    Mar 27, 2013 at 6:46 pm
    Mar 28, 2013 at 4:23 pm
  • Dear pig users, What does it mean when pig [Cloudera Pig version 0.10.0-cdh4.1.2] reports 2013-03-25 14:46:31,186 [main] INFO org.apache.pig.Main - Logging error messages to ...
    William DowlingWilliam Dowling
    Mar 25, 2013 at 7:42 pm
    Mar 25, 2013 at 9:48 pm
  • Hello all, When I first saw pig, I was under the impressing that it generated java code for a series of map/reduce jobs and then submitted that to hadoop. I have since seen messages that indicate the ...
    Gardner PomperGardner Pomper
    Mar 17, 2013 at 11:26 pm
    Mar 21, 2013 at 6:55 pm
  • I'm trying to test a custom LOAD class, which also contains the code for STORE. I put in a STORE in my pigUnit script. but the resulting file is never created. is STORE always skipped in pigUnit? in ...
    YangYang
    Mar 15, 2013 at 10:23 pm
    Mar 17, 2013 at 5:01 am
  • 1. How to display the column names in pig in a console. 2. When using dump, can we just get the top 10 rows rather than all other rows. Your help is appreciated. Thanks Sai
    Sai SaiSai Sai
    Mar 16, 2013 at 6:38 am
    Mar 16, 2013 at 8:34 am
Group Navigation
period‹ prev | Mar 2013 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions96
posts362
users95
websitepig.apache.org

95 users for March 2013

Jonathan Coveney: 20 posts Mix Nin: 19 posts Dan DeCapria, CivicScience: 18 posts Johnny Zhang: 17 posts Harsha ch: 16 posts Prashant Kommireddi: 15 posts Jeff Yuan: 14 posts Mohit Anchlia: 12 posts Dmitriy Ryaboy: 10 posts Bill Graham: 9 posts Tim Chan: 9 posts Inelu nagamallikarjuna: 8 posts Rohini Palaniswamy: 8 posts Kiran chitturi: 7 posts Mike Sukmanowsky: 7 posts Preeti Gupta: 7 posts Danfeng Li: 6 posts Eli Finkelshteyn: 6 posts Panshul Whisper: 6 posts Praveen Bysani: 6 posts
show more