Grokbase Groups Pig user March 2012
FAQ
Hi , all
How can I store multiple result using one store function?
for example: store Result1, Result 2 into '/tmp/result' using PigStorage(',');

the default store function does not accept multiple parameter as input .

thanks




姓名(Name): 姚海涛(Haitao Yao)
邮箱(email): yao.erix@gmail.com
新浪微博(weibo): @haitao_yao

Search Discussions

  • Prashant Kommireddi at Mar 2, 2012 at 3:07 am
    Can you merge Result1 and Result2 using "UNION" before STORE?
    http://pig.apache.org/docs/r0.9.1/basic.html#union

    2012/3/1 Haitao Yao <yao.erix@gmail.com>
    Hi , all
    How can I store multiple result using one store function?
    for example: store Result1, Result 2 into '/tmp/result' using
    PigStorage(',');

    the default store function does not accept multiple parameter as
    input .

    thanks




    姓名(Name): 姚海涛(Haitao Yao)
    邮箱(email): yao.erix@gmail.com
    新浪微博(weibo): @haitao_yao
  • Haitao Yao at Mar 2, 2012 at 3:48 am
    Yeah , union can do this.

    But the real purpose for me is to reduce the map reduce job count .

    Although I union 2 result sets into one, It still submit 2 map reduce jobs and read the data twice. here's my script:


    register '/home/hadoop/pig/matrix-pig.jar';
    RawData = load '/data/' using PigStorage(',') as (gid:long, payload:bytearray, ts:long, type:int);
    RawData = filter RawData by type == 1000 and ts >= 20120302090000L and ts <= 20120302100000L;
    FormattedData = foreach RawData {
    payload = he.HEDataConverter(payload);
    generate gid, ts, type, payload#'_event_id' as p__event_id, payload#'object' as p_object;
    }
    FilteredData = filter FormattedData by (int) p__event_id == 217;
    ResultSet = group FilteredData by p_object;
    Result = foreach ResultSet{
    Value = FilteredData.gid;
    Value = distinct Value;
    generate '217', CONCAT(CONCAT('object', ':'), group), he.HECOUNT(Value);
    }


    FormattedData = foreach RawData {
    payload = he.HEDataConverter(payload);
    generate gid, ts, type, payload#'_event_id' as p__event_id, payload#'result' as p_result;
    }
    FilteredData = filter FormattedData by (int) p__event_id == 217;
    ResultSet = group FilteredData by p_result;

    Result1 = foreach ResultSet{
    Value = FilteredData.gid;
    Value = distinct Value;
    generate '217', CONCAT(CONCAT('result', ':'), group), he.HECOUNT(Value);
    }
    A = union Result, Resut1;
    store A;


    How can I use 1 map reduce job to do the work? I do not want to read the data twice. It will cause heavy load on the hdfs.

    thanks!

    姓名(Name): 姚海涛(Haitao Yao)
    邮箱(email): yao.erix@gmail.com
    新浪微博(weibo): @haitao_yao

    在 2012-3-2,上午11:07, Prashant Kommireddi 写道:
    Can you merge Result1 and Result2 using "UNION" before STORE?
    http://pig.apache.org/docs/r0.9.1/basic.html#union

    2012/3/1 Haitao Yao <yao.erix@gmail.com>
    Hi , all
    How can I store multiple result using one store function?
    for example: store Result1, Result 2 into '/tmp/result' using
    PigStorage(',');

    the default store function does not accept multiple parameter as
    input .

    thanks




    姓名(Name): 姚海涛(Haitao Yao)
    邮箱(email): yao.erix@gmail.com
    新浪微博(weibo): @haitao_yao

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 2, '12 at 3:04a
activeMar 2, '12 at 3:48a
posts3
users2
websitepig.apache.org

2 users in discussion

Haitao Yao: 2 posts Prashant Kommireddi: 1 post

People

Translate

site design / logo © 2021 Grokbase