FAQ
Hello,

I have a question regarding treatment of dates with PIG.

My input files contain a timestamp field in 'yyyymmdd hh:mm:ss' format (e.g. 20090201 14:42:00 ) within a comma delimited file. I want to aggregate to day-level relying on extracting the date portion (e.g. yyyymmdd, so the 20090201 ) of the timestamp only. I have been experimenting with the tokenize function but I am unclear how to accomplish an aggregation by date.

What am I doing wrong? How can I get a date-level aggregation?
Is there a 'Date' data type?


Here are the details:


Input Data:

4,20090201 23:59:56,8,1
3,20090202 23:59:56,101,1
4,20090201 23:59:56,114,1
5,20090202 23:59:56,29,1

Desired Output:
20090201, 122
20090202, 130

--My attempt in Pig:
A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
describe A;
B = foreach A generate group, tokenize(A.v2) as (date,time); --fails here.
describe B;
C = group B by B.date;
describe C;
D = foreach C generate B.date, SUM(A.v3);
dump D;


grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
grunt> describe A;
A: (v1, v2, v3, v4 )
grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
2009-02-18 15:11:44,278 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid alias: group in A: (v1, v2, v3, v4 )
at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid alias: group in A: (v1, v2, v3, v4 )
at org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:3301)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:3225)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:2236)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:2175)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:2106)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:2038)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:2006)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:1862)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:1604)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:1569)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:711)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:47)
at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
... 5 more

2009-02-18 15:11:44,279 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid alias: group in A: (v1, v2, v3, v4 )
grunt>


Thanks in advance,
Avram

Search Discussions

  • Alan Gates at Feb 19, 2009 at 5:50 pm
    Date is not a separate type in pig.

    If you want to group on date, I think what you want is this:

    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    B = foreach A generate tokenize(A.v2) as (date,time), v3;
    C = foreach B generate date, v3;
    D = group C by date;
    E = foreach D generate group, SUM(C.v3);
    dump E;

    This script will first tokenize the datestamp into date and time, then
    project just the date and data you're going to sum, and then do the
    grouping.

    Alan.
    On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

    Hello,

    I have a question regarding treatment of dates with PIG.

    My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
    format (e.g. 20090201 14:42:00 ) within a comma delimited file. I
    want to aggregate to day-level relying on extracting the date
    portion (e.g. yyyymmdd, so the 20090201 ) of the timestamp only. I
    have been experimenting with the tokenize function but I am unclear
    how to accomplish an aggregation by date.

    What am I doing wrong? How can I get a date-level aggregation?
    Is there a 'Date' data type?


    Here are the details:


    Input Data:

    4,20090201 23:59:56,8,1
    3,20090202 23:59:56,101,1
    4,20090201 23:59:56,114,1
    5,20090202 23:59:56,29,1

    Desired Output:
    20090201, 122
    20090202, 130

    --My attempt in Pig:
    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    describe A;
    B = foreach A generate group, tokenize(A.v2) as (date,time); --fails
    here.
    describe B;
    C = group B by B.date;
    describe C;
    D = foreach C generate B.date, SUM(A.v3);
    dump D;


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
    2009-02-18 15:11:44,278 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    475)
    at
    org
    .apache
    .pig
    .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    233)
    at
    org
    .apache
    .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
    3301)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
    3225)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
    2236)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
    2175)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
    2106)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
    2038)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
    2006)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
    1862)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
    1604)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
    1569)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
    711)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
    47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more

    2009-02-18 15:11:44,279 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    grunt>


    Thanks in advance,
    Avram
  • Avram Aelony at Feb 19, 2009 at 6:51 pm
    Unfortunately, step B of the solution you proposed fails for me. Any thoughts on how to remedy?


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
    2009-02-19 10:47:11,142 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Cannot instantiate:tokenize
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
    at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Cannot instantiate:tokenize
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFunction(QueryParser.java:2818)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSpec(QueryParser.java:2354)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:2230)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:2175)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:2106)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:2038)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:2006)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:1862)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:1604)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:1569)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:711)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
    at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more
    Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
    at org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
    at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:506)
    at org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigContext.java:528)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFunction(QueryParser.java:2815)
    ... 21 more
    Caused by: java.io.IOException: Could not resolve tokenize using imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
    at org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:34)
    at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
    at org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
    ... 24 more
    Caused by: java.lang.ClassNotFoundException: Could not resolve tokenize using imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
    at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
    ... 25 more

    2009-02-19 10:47:11,143 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Cannot instantiate:tokenize
    grunt>


    thanks,
    Avram


    -----Original Message-----
    From: Alan Gates
    Sent: Thursday, February 19, 2009 9:49 AM
    To: pig-user@hadoop.apache.org
    Subject: Re: date treatment & date level aggregations

    Date is not a separate type in pig.

    If you want to group on date, I think what you want is this:

    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    B = foreach A generate tokenize(A.v2) as (date,time), v3;
    C = foreach B generate date, v3;
    D = group C by date;
    E = foreach D generate group, SUM(C.v3);
    dump E;

    This script will first tokenize the datestamp into date and time, then
    project just the date and data you're going to sum, and then do the
    grouping.

    Alan.
    On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

    Hello,

    I have a question regarding treatment of dates with PIG.

    My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
    format (e.g. 20090201 14:42:00 ) within a comma delimited file. I
    want to aggregate to day-level relying on extracting the date
    portion (e.g. yyyymmdd, so the 20090201 ) of the timestamp only. I
    have been experimenting with the tokenize function but I am unclear
    how to accomplish an aggregation by date.

    What am I doing wrong? How can I get a date-level aggregation?
    Is there a 'Date' data type?


    Here are the details:


    Input Data:

    4,20090201 23:59:56,8,1
    3,20090202 23:59:56,101,1
    4,20090201 23:59:56,114,1
    5,20090202 23:59:56,29,1

    Desired Output:
    20090201, 122
    20090202, 130

    --My attempt in Pig:
    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    describe A;
    B = foreach A generate group, tokenize(A.v2) as (date,time); --fails
    here.
    describe B;
    C = group B by B.date;
    describe C;
    D = foreach C generate B.date, SUM(A.v3);
    dump D;


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
    2009-02-18 15:11:44,278 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    475)
    at
    org
    .apache
    .pig
    .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    233)
    at
    org
    .apache
    .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
    3301)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
    3225)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
    2236)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
    2175)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
    2106)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
    2038)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
    2006)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
    1862)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
    1604)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
    1569)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
    711)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
    47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more

    2009-02-18 15:11:44,279 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    grunt>


    Thanks in advance,
    Avram
  • Olga Natkovich at Feb 19, 2009 at 6:56 pm
    Functions in pig are case sensitive. The function name is TOKENIZE.
    Please, refer to PigLatin Manula for details:
    http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.h
    tm.

    Olga
    -----Original Message-----
    From: Avram Aelony
    Sent: Thursday, February 19, 2009 10:51 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Unfortunately, step B of the solution you proposed fails for
    me. Any thoughts on how to remedy?


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
    2009-02-19 10:47:11,142 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Cannot instantiate:tokenize
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
    java:475)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
    gScriptParser.java:233)
    at
    org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
    arser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by:
    org.apache.pig.impl.logicalLayer.parser.ParseException:
    Cannot instantiate:tokenize
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
    on(QueryParser.java:2818)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
    ec(QueryParser.java:2354)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
    ec(QueryParser.java:2230)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
    QueryParser.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
    tiveExpr(QueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
    pr(QueryParser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
    QueryParser.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateItem(QueryParser.java:1955)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateItemList(QueryParser.java:1894)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
    atement(QueryParser.java:1862)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
    k(QueryParser.java:1604)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
    use(QueryParser.java:1569)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
    ueryParser.java:711)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
    Parser.java:512)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
    yParser.java:362)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
    calPlanBuilder.java:47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more
    Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
    ext.java:506)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
    text.java:528)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
    on(QueryParser.java:2815)
    ... 21 more
    Caused by: java.io.IOException: Could not resolve tokenize
    using imports: [, org.apache.pig.builtin.,
    com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
    ption.java:34)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
    ... 24 more
    Caused by: java.lang.ClassNotFoundException: Could not
    resolve tokenize using imports: [, org.apache.pig.builtin.,
    com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
    ... 25 more

    2009-02-19 10:47:11,143 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Cannot instantiate:tokenize
    grunt>


    thanks,
    Avram


    -----Original Message-----
    From: Alan Gates
    Sent: Thursday, February 19, 2009 9:49 AM
    To: pig-user@hadoop.apache.org
    Subject: Re: date treatment & date level aggregations

    Date is not a separate type in pig.

    If you want to group on date, I think what you want is this:

    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    B = foreach A generate tokenize(A.v2) as (date,time), v3; C =
    foreach B generate date, v3; D = group C by date; E = foreach
    D generate group, SUM(C.v3); dump E;

    This script will first tokenize the datestamp into date and
    time, then project just the date and data you're going to
    sum, and then do the grouping.

    Alan.
    On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

    Hello,

    I have a question regarding treatment of dates with PIG.

    My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
    format (e.g. 20090201 14:42:00 ) within a comma delimited file. I
    want to aggregate to day-level relying on extracting the
    date portion
    (e.g. yyyymmdd, so the 20090201 ) of the timestamp only. I have been
    experimenting with the tokenize function but I am unclear how to
    accomplish an aggregation by date.

    What am I doing wrong? How can I get a date-level aggregation?
    Is there a 'Date' data type?


    Here are the details:


    Input Data:

    4,20090201 23:59:56,8,1
    3,20090202 23:59:56,101,1
    4,20090201 23:59:56,114,1
    5,20090202 23:59:56,29,1

    Desired Output:
    20090201, 122
    20090202, 130

    --My attempt in Pig:
    A = load 'atest.csv' using PigStorage(',') as
    (v1,v2,v3,v4); describe
    A; B = foreach A generate group, tokenize(A.v2) as (date,time);
    --fails here.
    describe B;
    C = group B by B.date;
    describe C;
    D = foreach C generate B.date, SUM(A.v3); dump D;


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
    2009-02-18 15:11:44,278 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    475)
    at
    org
    .apache
    .pig
    .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    233)
    at
    org
    .apache
    .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
    3301)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
    3225)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
    2236)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
    2175)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
    2106)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
    2038)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
    2006)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
    1862)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
    1604)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
    1569)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
    711)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
    47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more

    2009-02-18 15:11:44,279 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    grunt>


    Thanks in advance,
    Avram
  • Avram Aelony at Feb 19, 2009 at 6:59 pm
    I tried the capitalized version, that still leads to an error. Now it appears to be a problem with the alias.



    grunt> B = foreach A generate TOKENIZE(A.v2) as (date,time), v3;
    2009-02-19 10:56:05,075 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid alias: A in A: (v1, v2, v3, v4 )
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
    at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid alias: A in A: (v1, v2, v3, v4 )
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:3301)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:3225)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:2236)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:2175)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:2106)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:2038)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:2006)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgsItem(QueryParser.java:2456)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgs(QueryParser.java:2397)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSpec(QueryParser.java:2356)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:2230)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:2175)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:2106)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:2038)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:2006)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:1862)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:1604)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:1569)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:711)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
    at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
    at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more


    -----Original Message-----
    From: Olga Natkovich
    Sent: Thursday, February 19, 2009 10:54 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Functions in pig are case sensitive. The function name is TOKENIZE.
    Please, refer to PigLatin Manula for details:
    http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.h
    tm.

    Olga
    -----Original Message-----
    From: Avram Aelony
    Sent: Thursday, February 19, 2009 10:51 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Unfortunately, step B of the solution you proposed fails for
    me. Any thoughts on how to remedy?


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
    2009-02-19 10:47:11,142 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Cannot instantiate:tokenize
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
    java:475)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
    gScriptParser.java:233)
    at
    org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
    arser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by:
    org.apache.pig.impl.logicalLayer.parser.ParseException:
    Cannot instantiate:tokenize
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
    on(QueryParser.java:2818)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
    ec(QueryParser.java:2354)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
    ec(QueryParser.java:2230)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
    QueryParser.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
    tiveExpr(QueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
    pr(QueryParser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
    QueryParser.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateItem(QueryParser.java:1955)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateItemList(QueryParser.java:1894)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
    atement(QueryParser.java:1862)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
    k(QueryParser.java:1604)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
    use(QueryParser.java:1569)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
    ueryParser.java:711)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
    Parser.java:512)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
    yParser.java:362)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
    calPlanBuilder.java:47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more
    Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
    ext.java:506)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
    text.java:528)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
    on(QueryParser.java:2815)
    ... 21 more
    Caused by: java.io.IOException: Could not resolve tokenize
    using imports: [, org.apache.pig.builtin.,
    com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
    ption.java:34)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
    ... 24 more
    Caused by: java.lang.ClassNotFoundException: Could not
    resolve tokenize using imports: [, org.apache.pig.builtin.,
    com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
    ... 25 more

    2009-02-19 10:47:11,143 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Cannot instantiate:tokenize
    grunt>


    thanks,
    Avram


    -----Original Message-----
    From: Alan Gates
    Sent: Thursday, February 19, 2009 9:49 AM
    To: pig-user@hadoop.apache.org
    Subject: Re: date treatment & date level aggregations

    Date is not a separate type in pig.

    If you want to group on date, I think what you want is this:

    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    B = foreach A generate tokenize(A.v2) as (date,time), v3; C =
    foreach B generate date, v3; D = group C by date; E = foreach
    D generate group, SUM(C.v3); dump E;

    This script will first tokenize the datestamp into date and
    time, then project just the date and data you're going to
    sum, and then do the grouping.

    Alan.
    On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

    Hello,

    I have a question regarding treatment of dates with PIG.

    My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
    format (e.g. 20090201 14:42:00 ) within a comma delimited file. I
    want to aggregate to day-level relying on extracting the
    date portion
    (e.g. yyyymmdd, so the 20090201 ) of the timestamp only. I have been
    experimenting with the tokenize function but I am unclear how to
    accomplish an aggregation by date.

    What am I doing wrong? How can I get a date-level aggregation?
    Is there a 'Date' data type?


    Here are the details:


    Input Data:

    4,20090201 23:59:56,8,1
    3,20090202 23:59:56,101,1
    4,20090201 23:59:56,114,1
    5,20090202 23:59:56,29,1

    Desired Output:
    20090201, 122
    20090202, 130

    --My attempt in Pig:
    A = load 'atest.csv' using PigStorage(',') as
    (v1,v2,v3,v4); describe
    A; B = foreach A generate group, tokenize(A.v2) as (date,time);
    --fails here.
    describe B;
    C = group B by B.date;
    describe C;
    D = foreach C generate B.date, SUM(A.v3); dump D;


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
    2009-02-18 15:11:44,278 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    475)
    at
    org
    .apache
    .pig
    .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    233)
    at
    org
    .apache
    .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
    3301)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
    3225)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
    2236)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
    2175)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
    2106)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
    2038)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
    2006)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
    1862)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
    1604)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
    1569)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
    711)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
    47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more

    2009-02-18 15:11:44,279 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    grunt>


    Thanks in advance,
    Avram
  • Santhosh Srinivasan at Feb 19, 2009 at 7:02 pm
    Hi Avram,

    A few things to note:

    1. The builtin functions in Pig are Java UDFs, making them case
    sensitive. You should use TOKENIZE instead of tokenize
    2. It looks like the builtin TOKENIZE has to be fixed to support your
    current usage. I have a filed a bug report to track this : PIG-683
    (https://issues.apache.org/jira/browse/PIG-683)

    When PIG-683 is fixed, you should then be able to do the following:


    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    B = foreach A generate flatten(TOKENIZE(v2)) as (date,time), v3;
    C = foreach B generate date, v3;
    D = group C by date;
    E = foreach D generate group, SUM(C.v3);
    dump E;

    Thanks,
    Santhosh

    -----Original Message-----
    From: Avram Aelony
    Sent: Thursday, February 19, 2009 10:59 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    I tried the capitalized version, that still leads to an error. Now it
    appears to be a problem with the alias.



    grunt> B = foreach A generate TOKENIZE(A.v2) as (date,time), v3;
    2009-02-19 10:56:05,075 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid
    alias: A in A: (v1, v2, v3, v4 )
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptPar
    ser.java:233)
    at
    org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java
    :91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Invalid alias: A in A: (v1, v2, v3, v4 )
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(Que
    ryParser.java:3301)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParse
    r.java:3225)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryPa
    rser.java:2236)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParse
    r.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Q
    ueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryPa
    rser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParse
    r.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgsItem(QueryPa
    rser.java:2456)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgs(QueryParser
    .java:2397)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSpec(QueryPa
    rser.java:2356)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryPa
    rser.java:2230)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParse
    r.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Q
    ueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryPa
    rser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParse
    r.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateIte
    m(QueryParser.java:1955)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateIte
    mList(QueryParser.java:1894)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(Qu
    eryParser.java:1862)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryPar
    ser.java:1604)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryP
    arser.java:1569)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser
    .java:711)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.jav
    a:512)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.ja
    va:362)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBui
    lder.java:47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more


    -----Original Message-----
    From: Olga Natkovich
    Sent: Thursday, February 19, 2009 10:54 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Functions in pig are case sensitive. The function name is TOKENIZE.
    Please, refer to PigLatin Manula for details:
    http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.h
    tm.

    Olga
    -----Original Message-----
    From: Avram Aelony
    Sent: Thursday, February 19, 2009 10:51 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Unfortunately, step B of the solution you proposed fails for
    me. Any thoughts on how to remedy?


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
    2009-02-19 10:47:11,142 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Cannot instantiate:tokenize
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
    java:475)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
    gScriptParser.java:233)
    at
    org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
    arser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by:
    org.apache.pig.impl.logicalLayer.parser.ParseException:
    Cannot instantiate:tokenize
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
    on(QueryParser.java:2818)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
    ec(QueryParser.java:2354)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
    ec(QueryParser.java:2230)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
    QueryParser.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
    tiveExpr(QueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
    pr(QueryParser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
    QueryParser.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateItem(QueryParser.java:1955)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateItemList(QueryParser.java:1894)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
    atement(QueryParser.java:1862)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
    k(QueryParser.java:1604)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
    use(QueryParser.java:1569)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
    ueryParser.java:711)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
    Parser.java:512)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
    yParser.java:362)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
    calPlanBuilder.java:47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more
    Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
    ext.java:506)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
    text.java:528)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
    on(QueryParser.java:2815)
    ... 21 more
    Caused by: java.io.IOException: Could not resolve tokenize
    using imports: [, org.apache.pig.builtin.,
    com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
    ption.java:34)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
    ... 24 more
    Caused by: java.lang.ClassNotFoundException: Could not
    resolve tokenize using imports: [, org.apache.pig.builtin.,
    com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
    ... 25 more

    2009-02-19 10:47:11,143 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Cannot instantiate:tokenize
    grunt>


    thanks,
    Avram


    -----Original Message-----
    From: Alan Gates
    Sent: Thursday, February 19, 2009 9:49 AM
    To: pig-user@hadoop.apache.org
    Subject: Re: date treatment & date level aggregations

    Date is not a separate type in pig.

    If you want to group on date, I think what you want is this:

    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    B = foreach A generate tokenize(A.v2) as (date,time), v3; C =
    foreach B generate date, v3; D = group C by date; E = foreach
    D generate group, SUM(C.v3); dump E;

    This script will first tokenize the datestamp into date and
    time, then project just the date and data you're going to
    sum, and then do the grouping.

    Alan.
    On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

    Hello,

    I have a question regarding treatment of dates with PIG.

    My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
    format (e.g. 20090201 14:42:00 ) within a comma delimited file. I
    want to aggregate to day-level relying on extracting the
    date portion
    (e.g. yyyymmdd, so the 20090201 ) of the timestamp only. I have been
    experimenting with the tokenize function but I am unclear how to
    accomplish an aggregation by date.

    What am I doing wrong? How can I get a date-level aggregation?
    Is there a 'Date' data type?


    Here are the details:


    Input Data:

    4,20090201 23:59:56,8,1
    3,20090202 23:59:56,101,1
    4,20090201 23:59:56,114,1
    5,20090202 23:59:56,29,1

    Desired Output:
    20090201, 122
    20090202, 130

    --My attempt in Pig:
    A = load 'atest.csv' using PigStorage(',') as
    (v1,v2,v3,v4); describe
    A; B = foreach A generate group, tokenize(A.v2) as (date,time);
    --fails here.
    describe B;
    C = group B by B.date;
    describe C;
    D = foreach C generate B.date, SUM(A.v3); dump D;


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
    2009-02-18 15:11:44,278 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    475)
    at
    org
    .apache
    .pig
    .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    233)
    at
    org
    .apache
    .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
    3301)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
    3225)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
    2236)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
    2175)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
    2106)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
    2038)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
    2006)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
    1862)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
    1604)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
    1569)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
    711)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
    47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more

    2009-02-18 15:11:44,279 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    grunt>


    Thanks in advance,
    Avram
  • Avram Aelony at Feb 19, 2009 at 7:08 pm
    Thanks for identifying that the TOKENIZE builtin needs a fix and filing the bug report.
    I should have noted in my original email that I had tried uppercase and that uppercase had also failed.

    Thanks for everyone's help & I look forward to the fix.

    Regards,
    -Avram


    -----Original Message-----
    From: Santhosh Srinivasan
    Sent: Thursday, February 19, 2009 11:01 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Hi Avram,

    A few things to note:

    1. The builtin functions in Pig are Java UDFs, making them case
    sensitive. You should use TOKENIZE instead of tokenize
    2. It looks like the builtin TOKENIZE has to be fixed to support your
    current usage. I have a filed a bug report to track this : PIG-683
    (https://issues.apache.org/jira/browse/PIG-683)

    When PIG-683 is fixed, you should then be able to do the following:


    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    B = foreach A generate flatten(TOKENIZE(v2)) as (date,time), v3;
    C = foreach B generate date, v3;
    D = group C by date;
    E = foreach D generate group, SUM(C.v3);
    dump E;

    Thanks,
    Santhosh

    -----Original Message-----
    From: Avram Aelony
    Sent: Thursday, February 19, 2009 10:59 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    I tried the capitalized version, that still leads to an error. Now it
    appears to be a problem with the alias.



    grunt> B = foreach A generate TOKENIZE(A.v2) as (date,time), v3;
    2009-02-19 10:56:05,075 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid
    alias: A in A: (v1, v2, v3, v4 )
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptPar
    ser.java:233)
    at
    org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java
    :91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Invalid alias: A in A: (v1, v2, v3, v4 )
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(Que
    ryParser.java:3301)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParse
    r.java:3225)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryPa
    rser.java:2236)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParse
    r.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Q
    ueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryPa
    rser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParse
    r.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgsItem(QueryPa
    rser.java:2456)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgs(QueryParser
    .java:2397)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSpec(QueryPa
    rser.java:2356)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryPa
    rser.java:2230)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParse
    r.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Q
    ueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryPa
    rser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParse
    r.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateIte
    m(QueryParser.java:1955)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateIte
    mList(QueryParser.java:1894)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(Qu
    eryParser.java:1862)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryPar
    ser.java:1604)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryP
    arser.java:1569)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser
    .java:711)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.jav
    a:512)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.ja
    va:362)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBui
    lder.java:47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more


    -----Original Message-----
    From: Olga Natkovich
    Sent: Thursday, February 19, 2009 10:54 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Functions in pig are case sensitive. The function name is TOKENIZE.
    Please, refer to PigLatin Manula for details:
    http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.h
    tm.

    Olga
    -----Original Message-----
    From: Avram Aelony
    Sent: Thursday, February 19, 2009 10:51 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Unfortunately, step B of the solution you proposed fails for
    me. Any thoughts on how to remedy?


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
    2009-02-19 10:47:11,142 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Cannot instantiate:tokenize
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
    java:475)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
    gScriptParser.java:233)
    at
    org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
    arser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by:
    org.apache.pig.impl.logicalLayer.parser.ParseException:
    Cannot instantiate:tokenize
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
    on(QueryParser.java:2818)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
    ec(QueryParser.java:2354)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
    ec(QueryParser.java:2230)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
    QueryParser.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
    tiveExpr(QueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
    pr(QueryParser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
    QueryParser.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateItem(QueryParser.java:1955)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateItemList(QueryParser.java:1894)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
    atement(QueryParser.java:1862)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
    k(QueryParser.java:1604)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
    use(QueryParser.java:1569)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
    ueryParser.java:711)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
    Parser.java:512)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
    yParser.java:362)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
    calPlanBuilder.java:47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more
    Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
    ext.java:506)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
    text.java:528)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
    on(QueryParser.java:2815)
    ... 21 more
    Caused by: java.io.IOException: Could not resolve tokenize
    using imports: [, org.apache.pig.builtin.,
    com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
    ption.java:34)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
    ... 24 more
    Caused by: java.lang.ClassNotFoundException: Could not
    resolve tokenize using imports: [, org.apache.pig.builtin.,
    com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
    ... 25 more

    2009-02-19 10:47:11,143 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Cannot instantiate:tokenize
    grunt>


    thanks,
    Avram


    -----Original Message-----
    From: Alan Gates
    Sent: Thursday, February 19, 2009 9:49 AM
    To: pig-user@hadoop.apache.org
    Subject: Re: date treatment & date level aggregations

    Date is not a separate type in pig.

    If you want to group on date, I think what you want is this:

    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    B = foreach A generate tokenize(A.v2) as (date,time), v3; C =
    foreach B generate date, v3; D = group C by date; E = foreach
    D generate group, SUM(C.v3); dump E;

    This script will first tokenize the datestamp into date and
    time, then project just the date and data you're going to
    sum, and then do the grouping.

    Alan.
    On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

    Hello,

    I have a question regarding treatment of dates with PIG.

    My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
    format (e.g. 20090201 14:42:00 ) within a comma delimited file. I
    want to aggregate to day-level relying on extracting the
    date portion
    (e.g. yyyymmdd, so the 20090201 ) of the timestamp only. I have been
    experimenting with the tokenize function but I am unclear how to
    accomplish an aggregation by date.

    What am I doing wrong? How can I get a date-level aggregation?
    Is there a 'Date' data type?


    Here are the details:


    Input Data:

    4,20090201 23:59:56,8,1
    3,20090202 23:59:56,101,1
    4,20090201 23:59:56,114,1
    5,20090202 23:59:56,29,1

    Desired Output:
    20090201, 122
    20090202, 130

    --My attempt in Pig:
    A = load 'atest.csv' using PigStorage(',') as
    (v1,v2,v3,v4); describe
    A; B = foreach A generate group, tokenize(A.v2) as (date,time);
    --fails here.
    describe B;
    C = group B by B.date;
    describe C;
    D = foreach C generate B.date, SUM(A.v3); dump D;


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
    2009-02-18 15:11:44,278 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    475)
    at
    org
    .apache
    .pig
    .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    233)
    at
    org
    .apache
    .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
    3301)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
    3225)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
    2236)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
    2175)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
    2106)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
    2038)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
    2006)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
    1862)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
    1604)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
    1569)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
    711)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
    47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more

    2009-02-18 15:11:44,279 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    grunt>


    Thanks in advance,
    Avram
  • Olga Natkovich at Feb 19, 2009 at 7:29 pm
    TOKENIZE is not broken. It has particular semantics that might not work
    for this query but are used in other contexts.

    If a function with different semantics is needed, it can be written and
    contributed to piggybank.

    Olga
    -----Original Message-----
    From: Avram Aelony
    Sent: Thursday, February 19, 2009 11:08 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Thanks for identifying that the TOKENIZE builtin needs a fix
    and filing the bug report.
    I should have noted in my original email that I had tried
    uppercase and that uppercase had also failed.

    Thanks for everyone's help & I look forward to the fix.

    Regards,
    -Avram


    -----Original Message-----
    From: Santhosh Srinivasan
    Sent: Thursday, February 19, 2009 11:01 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Hi Avram,

    A few things to note:

    1. The builtin functions in Pig are Java UDFs, making them
    case sensitive. You should use TOKENIZE instead of tokenize
    2. It looks like the builtin TOKENIZE has to be fixed to
    support your current usage. I have a filed a bug report to
    track this : PIG-683
    (https://issues.apache.org/jira/browse/PIG-683)

    When PIG-683 is fixed, you should then be able to do the following:


    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    B = foreach A generate flatten(TOKENIZE(v2)) as (date,time),
    v3; C = foreach B generate date, v3; D = group C by date; E =
    foreach D generate group, SUM(C.v3); dump E;

    Thanks,
    Santhosh

    -----Original Message-----
    From: Avram Aelony
    Sent: Thursday, February 19, 2009 10:59 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    I tried the capitalized version, that still leads to an
    error. Now it appears to be a problem with the alias.



    grunt> B = foreach A generate TOKENIZE(A.v2) as (date,time), v3;
    2009-02-19 10:56:05,075 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid
    alias: A in A: (v1, v2, v3, v4 )
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
    java:475)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
    gScriptPar
    ser.java:233)
    at
    org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
    arser.java
    :91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Invalid alias: A in A: (v1, v2, v3, v4 )
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasField
    OrSpec(Que
    ryParser.java:3301)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(
    QueryParse
    r.java:3225)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
    ec(QueryPa
    rser.java:2236)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
    QueryParse
    r.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
    tiveExpr(Q
    ueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
    pr(QueryPa
    rser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
    QueryParse
    r.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgsIt
    em(QueryPa
    rser.java:2456)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgs(Q
    ueryParser
    .java:2397)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
    ec(QueryPa
    rser.java:2356)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
    ec(QueryPa
    rser.java:2230)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
    QueryParse
    r.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
    tiveExpr(Q
    ueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
    pr(QueryPa
    rser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
    QueryParse
    r.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateIte
    m(QueryParser.java:1955)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateIte
    mList(QueryParser.java:1894)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
    atement(Qu
    eryParser.java:1862)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
    k(QueryPar
    ser.java:1604)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
    use(QueryP
    arser.java:1569)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
    ueryParser
    .java:711)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
    Parser.jav
    a:512)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
    yParser.ja
    va:362)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
    calPlanBui
    lder.java:47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more


    -----Original Message-----
    From: Olga Natkovich
    Sent: Thursday, February 19, 2009 10:54 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Functions in pig are case sensitive. The function name is TOKENIZE.
    Please, refer to PigLatin Manula for details:
    http://wiki.apache.org/pig-data/attachments/FrontPage/attachme
    nts/plrm.h
    tm.

    Olga
    -----Original Message-----
    From: Avram Aelony
    Sent: Thursday, February 19, 2009 10:51 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Unfortunately, step B of the solution you proposed fails
    for me. Any
    thoughts on how to remedy?


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
    2009-02-19 10:47:11,142 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Cannot instantiate:tokenize
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
    java:475)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
    gScriptParser.java:233)
    at
    org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
    arser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by:
    org.apache.pig.impl.logicalLayer.parser.ParseException:
    Cannot instantiate:tokenize
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
    on(QueryParser.java:2818)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
    ec(QueryParser.java:2354)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
    ec(QueryParser.java:2230)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
    QueryParser.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
    tiveExpr(QueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
    pr(QueryParser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
    QueryParser.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateItem(QueryParser.java:1955)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateItemList(QueryParser.java:1894)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
    atement(QueryParser.java:1862)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
    k(QueryParser.java:1604)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
    use(QueryParser.java:1569)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
    ueryParser.java:711)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
    Parser.java:512)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
    yParser.java:362)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
    calPlanBuilder.java:47)
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more
    Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
    ext.java:506)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
    text.java:528)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
    on(QueryParser.java:2815)
    ... 21 more
    Caused by: java.io.IOException: Could not resolve tokenize using
    imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT.,
    org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
    ption.java:34)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
    ... 24 more
    Caused by: java.lang.ClassNotFoundException: Could not resolve
    tokenize using imports: [, org.apache.pig.builtin.,
    com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
    ... 25 more

    2009-02-19 10:47:11,143 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Cannot instantiate:tokenize
    grunt>


    thanks,
    Avram


    -----Original Message-----
    From: Alan Gates
    Sent: Thursday, February 19, 2009 9:49 AM
    To: pig-user@hadoop.apache.org
    Subject: Re: date treatment & date level aggregations

    Date is not a separate type in pig.

    If you want to group on date, I think what you want is this:

    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); B =
    foreach A generate tokenize(A.v2) as (date,time), v3; C = foreach B
    generate date, v3; D = group C by date; E = foreach D
    generate group,
    SUM(C.v3); dump E;

    This script will first tokenize the datestamp into date and
    time, then
    project just the date and data you're going to sum, and then do the
    grouping.

    Alan.
    On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

    Hello,

    I have a question regarding treatment of dates with PIG.

    My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
    format (e.g. 20090201 14:42:00 ) within a comma delimited
    file. I
    want to aggregate to day-level relying on extracting the
    date portion
    (e.g. yyyymmdd, so the 20090201 ) of the timestamp only. I have been
    experimenting with the tokenize function but I am unclear how to
    accomplish an aggregation by date.

    What am I doing wrong? How can I get a date-level aggregation?
    Is there a 'Date' data type?


    Here are the details:


    Input Data:

    4,20090201 23:59:56,8,1
    3,20090202 23:59:56,101,1
    4,20090201 23:59:56,114,1
    5,20090202 23:59:56,29,1

    Desired Output:
    20090201, 122
    20090202, 130

    --My attempt in Pig:
    A = load 'atest.csv' using PigStorage(',') as
    (v1,v2,v3,v4); describe
    A; B = foreach A generate group, tokenize(A.v2) as (date,time);
    --fails here.
    describe B;
    C = group B by B.date;
    describe C;
    D = foreach C generate B.date, SUM(A.v3); dump D;


    grunt> A = load 'atest.csv' using PigStorage(',') as
    (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate group, tokenize(A.v2) as
    (date,time);
    2009-02-18 15:11:44,278 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    475)
    at
    org
    .apache
    .pig
    .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    233)
    at
    org
    .apache
    .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
    3301)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
    3225)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
    2236)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
    2175)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
    2106)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
    2038)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
    2006)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
    1862)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
    1604)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
    1569)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
    711)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
    47)
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more

    2009-02-18 15:11:44,279 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    grunt>


    Thanks in advance,
    Avram
  • Avram Aelony at Feb 24, 2009 at 5:10 pm
    Hi Olga,

    Thanks for your message. I will have fairly particular needs, so I will take a leap into learning what it takes to develop needed UDFs. If it works out well and they are generic enough that they might be useful to others, I will see if I can get authorization to contribute back to piggybank.

    -Avram


    -----Original Message-----
    From: Olga Natkovich
    Sent: Thursday, February 19, 2009 11:28 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    TOKENIZE is not broken. It has particular semantics that might not work
    for this query but are used in other contexts.

    If a function with different semantics is needed, it can be written and
    contributed to piggybank.

    Olga
    -----Original Message-----
    From: Avram Aelony
    Sent: Thursday, February 19, 2009 11:08 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Thanks for identifying that the TOKENIZE builtin needs a fix
    and filing the bug report.
    I should have noted in my original email that I had tried
    uppercase and that uppercase had also failed.

    Thanks for everyone's help & I look forward to the fix.

    Regards,
    -Avram


    -----Original Message-----
    From: Santhosh Srinivasan
    Sent: Thursday, February 19, 2009 11:01 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Hi Avram,

    A few things to note:

    1. The builtin functions in Pig are Java UDFs, making them
    case sensitive. You should use TOKENIZE instead of tokenize
    2. It looks like the builtin TOKENIZE has to be fixed to
    support your current usage. I have a filed a bug report to
    track this : PIG-683
    (https://issues.apache.org/jira/browse/PIG-683)

    When PIG-683 is fixed, you should then be able to do the following:


    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    B = foreach A generate flatten(TOKENIZE(v2)) as (date,time),
    v3; C = foreach B generate date, v3; D = group C by date; E =
    foreach D generate group, SUM(C.v3); dump E;

    Thanks,
    Santhosh

    -----Original Message-----
    From: Avram Aelony
    Sent: Thursday, February 19, 2009 10:59 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    I tried the capitalized version, that still leads to an
    error. Now it appears to be a problem with the alias.



    grunt> B = foreach A generate TOKENIZE(A.v2) as (date,time), v3;
    2009-02-19 10:56:05,075 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid
    alias: A in A: (v1, v2, v3, v4 )
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
    java:475)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
    gScriptPar
    ser.java:233)
    at
    org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
    arser.java
    :91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Invalid alias: A in A: (v1, v2, v3, v4 )
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasField
    OrSpec(Que
    ryParser.java:3301)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(
    QueryParse
    r.java:3225)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
    ec(QueryPa
    rser.java:2236)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
    QueryParse
    r.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
    tiveExpr(Q
    ueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
    pr(QueryPa
    rser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
    QueryParse
    r.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgsIt
    em(QueryPa
    rser.java:2456)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgs(Q
    ueryParser
    .java:2397)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
    ec(QueryPa
    rser.java:2356)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
    ec(QueryPa
    rser.java:2230)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
    QueryParse
    r.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
    tiveExpr(Q
    ueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
    pr(QueryPa
    rser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
    QueryParse
    r.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateIte
    m(QueryParser.java:1955)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateIte
    mList(QueryParser.java:1894)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
    atement(Qu
    eryParser.java:1862)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
    k(QueryPar
    ser.java:1604)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
    use(QueryP
    arser.java:1569)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
    ueryParser
    .java:711)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
    Parser.jav
    a:512)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
    yParser.ja
    va:362)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
    calPlanBui
    lder.java:47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more


    -----Original Message-----
    From: Olga Natkovich
    Sent: Thursday, February 19, 2009 10:54 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Functions in pig are case sensitive. The function name is TOKENIZE.
    Please, refer to PigLatin Manula for details:
    http://wiki.apache.org/pig-data/attachments/FrontPage/attachme
    nts/plrm.h
    tm.

    Olga
    -----Original Message-----
    From: Avram Aelony
    Sent: Thursday, February 19, 2009 10:51 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Unfortunately, step B of the solution you proposed fails
    for me. Any
    thoughts on how to remedy?


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
    2009-02-19 10:47:11,142 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Cannot instantiate:tokenize
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
    java:475)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
    gScriptParser.java:233)
    at
    org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
    arser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by:
    org.apache.pig.impl.logicalLayer.parser.ParseException:
    Cannot instantiate:tokenize
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
    on(QueryParser.java:2818)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
    ec(QueryParser.java:2354)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
    ec(QueryParser.java:2230)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
    QueryParser.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
    tiveExpr(QueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
    pr(QueryParser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
    QueryParser.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateItem(QueryParser.java:1955)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
    enerateItemList(QueryParser.java:1894)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
    atement(QueryParser.java:1862)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
    k(QueryParser.java:1604)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
    use(QueryParser.java:1569)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
    ueryParser.java:711)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
    Parser.java:512)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
    yParser.java:362)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
    calPlanBuilder.java:47)
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more
    Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
    ext.java:506)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
    text.java:528)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
    on(QueryParser.java:2815)
    ... 21 more
    Caused by: java.io.IOException: Could not resolve tokenize using
    imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT.,
    org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
    ption.java:34)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
    ... 24 more
    Caused by: java.lang.ClassNotFoundException: Could not resolve
    tokenize using imports: [, org.apache.pig.builtin.,
    com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
    ... 25 more

    2009-02-19 10:47:11,143 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Cannot instantiate:tokenize
    grunt>


    thanks,
    Avram


    -----Original Message-----
    From: Alan Gates
    Sent: Thursday, February 19, 2009 9:49 AM
    To: pig-user@hadoop.apache.org
    Subject: Re: date treatment & date level aggregations

    Date is not a separate type in pig.

    If you want to group on date, I think what you want is this:

    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); B =
    foreach A generate tokenize(A.v2) as (date,time), v3; C = foreach B
    generate date, v3; D = group C by date; E = foreach D
    generate group,
    SUM(C.v3); dump E;

    This script will first tokenize the datestamp into date and
    time, then
    project just the date and data you're going to sum, and then do the
    grouping.

    Alan.
    On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

    Hello,

    I have a question regarding treatment of dates with PIG.

    My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
    format (e.g. 20090201 14:42:00 ) within a comma delimited
    file. I
    want to aggregate to day-level relying on extracting the
    date portion
    (e.g. yyyymmdd, so the 20090201 ) of the timestamp only. I have been
    experimenting with the tokenize function but I am unclear how to
    accomplish an aggregation by date.

    What am I doing wrong? How can I get a date-level aggregation?
    Is there a 'Date' data type?


    Here are the details:


    Input Data:

    4,20090201 23:59:56,8,1
    3,20090202 23:59:56,101,1
    4,20090201 23:59:56,114,1
    5,20090202 23:59:56,29,1

    Desired Output:
    20090201, 122
    20090202, 130

    --My attempt in Pig:
    A = load 'atest.csv' using PigStorage(',') as
    (v1,v2,v3,v4); describe
    A; B = foreach A generate group, tokenize(A.v2) as (date,time);
    --fails here.
    describe B;
    C = group B by B.date;
    describe C;
    D = foreach C generate B.date, SUM(A.v3); dump D;


    grunt> A = load 'atest.csv' using PigStorage(',') as
    (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate group, tokenize(A.v2) as
    (date,time);
    2009-02-18 15:11:44,278 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    475)
    at
    org
    .apache
    .pig
    .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    233)
    at
    org
    .apache
    .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
    3301)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
    3225)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
    2236)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
    2175)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
    2106)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
    2038)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
    2006)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
    1862)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
    1604)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
    1569)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
    711)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
    47)
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more

    2009-02-18 15:11:44,279 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    grunt>


    Thanks in advance,
    Avram
  • Pradeep Kamath at Feb 19, 2009 at 7:08 pm
    Use TOKENIZE instead of tokenize (the name is case sensitive).


    -----Original Message-----
    From: Avram Aelony
    Sent: Thursday, February 19, 2009 10:51 AM
    To: pig-user@hadoop.apache.org
    Subject: RE: date treatment & date level aggregations

    Unfortunately, step B of the solution you proposed fails for me. Any
    thoughts on how to remedy?


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
    2009-02-19 10:47:11,142 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Cannot
    instantiate:tokenize
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
    at
    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptPar
    ser.java:233)
    at
    org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java
    :91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Cannot instantiate:tokenize
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFunction(QueryPa
    rser.java:2818)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSpec(QueryPa
    rser.java:2354)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryPa
    rser.java:2230)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParse
    r.java:2175)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Q
    ueryParser.java:2106)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryPa
    rser.java:2038)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParse
    r.java:2006)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateIte
    m(QueryParser.java:1955)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateIte
    mList(QueryParser.java:1894)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(Qu
    eryParser.java:1862)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryPar
    ser.java:1604)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryP
    arser.java:1569)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser
    .java:711)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.jav
    a:512)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.ja
    va:362)
    at
    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBui
    lder.java:47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more
    Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:5
    06)
    at
    org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigContext.java:
    528)
    at
    org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFunction(QueryPa
    rser.java:2815)
    ... 21 more
    Caused by: java.io.IOException: Could not resolve tokenize using
    imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT.,
    org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java
    :34)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
    at
    org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
    ... 24 more
    Caused by: java.lang.ClassNotFoundException: Could not resolve tokenize
    using imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT.,
    org.apache.pig.impl.builtin.]
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
    ... 25 more

    2009-02-19 10:47:11,143 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Cannot
    instantiate:tokenize
    grunt>


    thanks,
    Avram


    -----Original Message-----
    From: Alan Gates
    Sent: Thursday, February 19, 2009 9:49 AM
    To: pig-user@hadoop.apache.org
    Subject: Re: date treatment & date level aggregations

    Date is not a separate type in pig.

    If you want to group on date, I think what you want is this:

    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    B = foreach A generate tokenize(A.v2) as (date,time), v3;
    C = foreach B generate date, v3;
    D = group C by date;
    E = foreach D generate group, SUM(C.v3);
    dump E;

    This script will first tokenize the datestamp into date and time, then
    project just the date and data you're going to sum, and then do the
    grouping.

    Alan.
    On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

    Hello,

    I have a question regarding treatment of dates with PIG.

    My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
    format (e.g. 20090201 14:42:00 ) within a comma delimited file. I
    want to aggregate to day-level relying on extracting the date
    portion (e.g. yyyymmdd, so the 20090201 ) of the timestamp only. I
    have been experimenting with the tokenize function but I am unclear
    how to accomplish an aggregation by date.

    What am I doing wrong? How can I get a date-level aggregation?
    Is there a 'Date' data type?


    Here are the details:


    Input Data:

    4,20090201 23:59:56,8,1
    3,20090202 23:59:56,101,1
    4,20090201 23:59:56,114,1
    5,20090202 23:59:56,29,1

    Desired Output:
    20090201, 122
    20090202, 130

    --My attempt in Pig:
    A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    describe A;
    B = foreach A generate group, tokenize(A.v2) as (date,time); --fails
    here.
    describe B;
    C = group B by B.date;
    describe C;
    D = foreach C generate B.date, SUM(A.v3);
    dump D;


    grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
    grunt> describe A;
    A: (v1, v2, v3, v4 )
    grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
    2009-02-18 15:11:44,278 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    475)
    at
    org
    .apache
    .pig
    .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    233)
    at
    org
    .apache
    .pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
    at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
    at org.apache.pig.Main.main(Main.java:270)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
    3301)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
    3225)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
    2236)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
    2175)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
    2106)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
    2038)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
    2006)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer
    .parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
    at
    org
    .apache
    .pig
    .impl
    .logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
    1862)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
    1604)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
    1569)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
    711)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
    at
    org
    .apache
    .pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
    at
    org
    .apache
    .pig
    .impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
    47)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
    ... 5 more

    2009-02-18 15:11:44,279 [main] ERROR
    org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
    Invalid alias: group in A: (v1, v2, v3, v4 )
    grunt>


    Thanks in advance,
    Avram

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedFeb 18, '09 at 11:20p
activeFeb 24, '09 at 5:10p
posts10
users5
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase