FAQ

Search Discussions

33 discussions - 115 posts

  • Hi all, I have a use case which is implemented in hive with partitions. Say Customer_data/2012-12-18/.... /2012-12-17/.... /2012-12-16/.... / / I want implement this in pig. How will partitions work ...
    AbhishekAbhishek
    Dec 18, 2012 at 10:39 pm
    Dec 19, 2012 at 4:34 am
  • Hi folks, I am new to pig, and I am trying to get the basic pig + cassandra samples working. I have created the PigTest Keyspace, and I am trying to run some of the command in test_storage.pig, but I ...
    Schappet, James CSchappet, James C
    Dec 5, 2012 at 8:06 pm
    Dec 12, 2012 at 6:08 pm
  • Hi , I have used SequeceFileLoader for loading sequence file. A= load 'part-m-0000' using SequenceFileLoader() as (key:long,value:chararray) "value" is the chararray which consists of 10 fields which ...
    SriniSrini
    Dec 24, 2012 at 5:24 am
    Jan 11, 2013 at 3:38 am
  • Hello, I'm using HBaseStorage and I want to change the layout of the schema before storage. Specifically I want to group some values into a tuple (thus reducing the number of repetitions of the row ...
    YaboulnaYaboulna
    Dec 11, 2012 at 3:31 am
    Dec 13, 2012 at 7:08 am
  • Hi all, How can I achieve above hive query in pig Create table x as select y.col1,y.col2,y.col3,count(*) as count from tab1 y group by y.col1,y.col2,y.col3 Regards Abhishek
    AbhishekAbhishek
    Dec 26, 2012 at 7:06 pm
    Dec 27, 2012 at 8:37 pm
  • I read alot of about pig can ship a tar file and untar it before execution. However, I couldn't find any example. Can someone provide an example? What I would like to do is to ship a python module, ...
    Danfeng LiDanfeng Li
    Dec 20, 2012 at 6:02 pm
    Dec 27, 2012 at 5:41 pm
  • Hi, I have around 4 million time series. ~1000 of them had a special occurrence at some point. Now, I want to draw 10 samples for each special time-series based on a similarity comparison. What I ...
    Thomas BachThomas Bach
    Dec 18, 2012 at 8:00 pm
    Dec 25, 2012 at 12:47 pm
  • PhysicalPlan.getLeaves() return a list of leaves, Most of the cases it's only one"the root", is there any cases that the physical plan will have more than one leaf ? Thanks Sarah
    Sarah MohamedSarah Mohamed
    Dec 17, 2012 at 1:54 am
    Dec 21, 2012 at 6:29 pm
  • I want to extend the existing XMLLoader to go beyond capturing the text inside a tag and to actually create a Pig mapping of the Document Object Model the XML represents. This would be similar to ...
    Russell JurneyRussell Jurney
    Dec 24, 2012 at 7:24 am
    Dec 29, 2012 at 11:00 pm
  • I was working on a LoadFunc and needed some ideas/second opinion on the best way to do this: 1. We use an API to download data from database as flat-files. - A query is given with table name and ...
    Prashant KommireddiPrashant Kommireddi
    Dec 11, 2012 at 9:11 am
    Dec 11, 2012 at 11:07 pm
  • After many joins, my relation's schema because very verbose. For example: e::d::c::b::a::column1:bytearray, e::d::c::b::a::column2:bytearray Is there a way simplify the schema back to ...
    Chan, TimChan, Tim
    Dec 8, 2012 at 6:48 pm
    Dec 10, 2012 at 2:48 pm
  • Hi All, I just noticed that Pig Committer DaiJianYong has mentioned "Cost based optimizer" for pig performanceoptimization. My question are: Do we have any plan for this new feature? Like which ...
    Lulynn_2008Lulynn_2008
    Dec 5, 2012 at 2:32 am
    Dec 6, 2012 at 5:21 pm
  • I am trying to replace string in the input parameter. Is something like this possible? I am passing comma separated list of dirs and I have several sub dirs that I need to read from individually in ...
    Mohit AnchliaMohit Anchlia
    Dec 26, 2012 at 3:36 am
    Dec 27, 2012 at 6:34 pm
  • Is it possible to load multiple files in the same load command? I have files in different path that I need to load, is that possible?
    Mohit AnchliaMohit Anchlia
    Dec 22, 2012 at 6:38 am
    Dec 27, 2012 at 5:15 pm
  • I am getting following error while executing the pig script. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias eventData. Backend error : Unable to ...
    Milind VaidyaMilind Vaidya
    Dec 13, 2012 at 12:23 am
    Dec 22, 2012 at 6:07 am
  • Hi, Say I have three files `data1`, `data2` and `assocs`: $ cat data1 key1,foo key2,bar $ cat data2 key3,braz key4,froz $ cat assoc key1,key3 key2,key4 I load these files via $ pig -b -p debug=WARN ...
    Thomas BachThomas Bach
    Dec 14, 2012 at 10:12 am
    Dec 14, 2012 at 6:55 pm
  • hi, all Can pig support in operator, like this: A = load 'test_data' as (value); B = filter A by value in (1,2,3,4,5); I think this is really useful. thanks. Haitao Yao <span ...
    Haitao YaoHaitao Yao
    Dec 14, 2012 at 2:07 am
    Dec 14, 2012 at 7:03 am
  • Here is a snippet of how schema is applied to tuples String serializedSchema = p.getProperty(signature + SCHEMA_FILE); if (serializedSchema != null) { try { resourceSchema = new ...
    Prashant KommireddiPrashant Kommireddi
    Dec 12, 2012 at 7:48 am
    Dec 13, 2012 at 7:52 am
  • mf0 = LOAD 'max.txt’ AS (maxi:double); data = LOAD '/axp/rimimsat/userdata/msing137/hadoop_streaming/final_anomaly_detection/step3/output' USING PigStorage(',') AS (id:long, f0:double) gruped = group ...
    Jamal sashaJamal sasha
    Dec 12, 2012 at 7:23 pm
    Dec 13, 2012 at 1:52 am
  • Hi, We're using the date and time functions of PiggyBank, and were slightly surprised to see all of the timezones set to UTC, most of them not configurable. Given we want to get it working with IST ...
    Ramakrishna NalamRamakrishna Nalam
    Dec 10, 2012 at 4:13 am
    Dec 11, 2012 at 5:55 pm
  • Hi, I am trying to load some external resources within my jython udf functions, e.g: @outputSchema(....) def test(): f = open('test.txt.') text = f.read() f.close() return text I have place the ...
    Young NgYoung Ng
    Dec 9, 2012 at 8:53 pm
    Dec 9, 2012 at 11:19 pm
  • sorry to ask you if possible could you pls advice on below points In general in Real time how we will write PIG scripts. 1. PIG scripts alone in Eclipse 2. Java + PIG scripts 3 Or both possible / ...
    Kshiva KpsKshiva Kps
    Dec 25, 2012 at 7:11 am
    Feb 1, 2013 at 5:42 am
  • If I self cross a relation, I got the original relation, which is not expected. The input: A.txt 1 2 3 The code: A = load 'A.txt' as (id:chararray); B = cross A, A; dump B; (1) (2) (3) C = foreach A ...
    Danfeng LiDanfeng Li
    Dec 29, 2012 at 1:44 am
    Jan 1, 2013 at 10:41 pm
  • hi, all , here's an error that has no line numbers attached: ERROR 1200: org.apache.pig.newplan.logical.expression.ScalarExpression cannot be cast to ...
    Haitao YaoHaitao Yao
    Dec 24, 2012 at 8:37 am
    Dec 24, 2012 at 8:38 am
  • Hi, I am trying to dig deep on the workings of pig libraries. So can someone help me understand what happens when someone does: in = load 'in.txt' using PigStorage(',') as (foo:int); dump in; what ...
    Jamal sashaJamal sasha
    Dec 20, 2012 at 12:24 am
    Dec 20, 2012 at 12:35 am
  • Soliciting comments... tired of naming things. https://issues.apache.org/jira/browse/PIG-3089 Russell Jurney twitter.com/rjurney
    Russell JurneyRussell Jurney
    Dec 11, 2012 at 2:41 am
    Dec 11, 2012 at 3:28 am
  • Hi,
    L NL N
    Dec 9, 2012 at 5:23 pm
    Dec 10, 2012 at 6:34 pm
  • We are currently working on a flow driver using normal ESB engine. I wanted some suggestion on how one can launch pig jobs from Java Applications into the hadoop cluster.
    Mohit AnchliaMohit Anchlia
    Dec 4, 2012 at 8:16 pm
    Dec 4, 2012 at 8:18 pm
  • I have files in multiple directories like /dir1/a1../dir2/aN. Is there a way to merge these files into a different directory /dir1/a1../dir2/aN - merge into one file /dir1/b1../dir2/bN - merge into ...
    Mohit AnchliaMohit Anchlia
    Dec 19, 2012 at 5:43 am
    Dec 19, 2012 at 5:43 am
  • Hi, I'm working on a pig script (and some associated UDFs) to do a little cleaning on some data, stored as JSON objects, being used by other scripts down the pipe. The JSON objects contain dates, ...
    Kris CowardKris Coward
    Dec 18, 2012 at 9:19 pm
    Dec 18, 2012 at 9:19 pm
  • Hi All, So we have data in S3 partitioned by hour in UTC : 2012/10/11/00 2012/10/11/01 .... 2012/10/12/00 2012/10/12/01 We need to now load data in Pacific time so we need to load for 2012/10/12 data ...
    Meghana narasimhanMeghana narasimhan
    Dec 14, 2012 at 12:55 am
    Dec 14, 2012 at 12:55 am
  • Hello All, I have 4 datasets Dataset1 uid, metric, key1 , key2, key3 Dataset2 key1 , key1Category Dataset3 key2 , key2Category Dataset3 key3 , key3Category JoinedRecord Dataset looks like uid, ...
    Richipal SinghRichipal Singh
    Dec 13, 2012 at 12:36 am
    Dec 13, 2012 at 12:36 am
  • I'm having trouble loading data into Pig from a composite column family in Cassandra. A google search suggests using CassandraStorage.getNext(), but I'm not sure if that's optimal. Any suggestions or ...
    Anita MehrotraAnita Mehrotra
    Dec 10, 2012 at 9:00 pm
    Dec 10, 2012 at 9:00 pm
Group Navigation
period‹ prev | Dec 2012 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions33
posts115
users42
websitepig.apache.org

42 users for December 2012

Jonathan Coveney: 12 posts Russell Jurney: 9 posts Abhishek: 8 posts Bill Graham: 6 posts Prashant Kommireddi: 6 posts Mohit Anchlia: 5 posts Cheolsoo Park: 4 posts Rohini Palaniswamy: 4 posts Thomas Bach: 4 posts Yaboulna: 3 posts Alan Gates: 3 posts Danfeng Li: 3 posts Jamal sasha: 3 posts James Schappet: 3 posts Kshiva Kps: 3 posts Ramakrishna Nalam: 3 posts Sarah Mohamed: 3 posts Young Ng: 3 posts Dmitriy Ryaboy: 2 posts Haitao Yao: 2 posts
show more