FAQ
I tried the capitalized version, that still leads to an error. Now it appears to be a problem with the alias.



grunt> B = foreach A generate TOKENIZE(A.v2) as (date,time), v3;
2009-02-19 10:56:05,075 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid alias: A in A: (v1, v2, v3, v4 )
at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid alias: A in A: (v1, v2, v3, v4 )
at org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:3301)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:3225)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:2236)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:2175)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:2106)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:2038)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:2006)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgsItem(QueryParser.java:2456)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgs(QueryParser.java:2397)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSpec(QueryParser.java:2356)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:2230)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:2175)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:2106)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:2038)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:2006)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:1862)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:1604)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:1569)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:711)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:47)
at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
... 5 more


-----Original Message-----
From: Olga Natkovich
Sent: Thursday, February 19, 2009 10:54 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

Functions in pig are case sensitive. The function name is TOKENIZE.
Please, refer to PigLatin Manula for details:
http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.h
tm.

Olga
-----Original Message-----
From: Avram Aelony
Sent: Thursday, February 19, 2009 10:51 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

Unfortunately, step B of the solution you proposed fails for
me. Any thoughts on how to remedy?


grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
grunt> describe A;
A: (v1, v2, v3, v4 )
grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
2009-02-19 10:47:11,142 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
Cannot instantiate:tokenize
at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
java:475)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
gScriptParser.java:233)
at
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
arser.java:91)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
at org.apache.pig.Main.main(Main.java:270)
Caused by:
org.apache.pig.impl.logicalLayer.parser.ParseException:
Cannot instantiate:tokenize
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
on(QueryParser.java:2818)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
ec(QueryParser.java:2354)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
ec(QueryParser.java:2230)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
QueryParser.java:2175)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
tiveExpr(QueryParser.java:2106)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
pr(QueryParser.java:2038)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
QueryParser.java:2006)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
enerateItem(QueryParser.java:1955)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
enerateItemList(QueryParser.java:1894)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
atement(QueryParser.java:1862)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
k(QueryParser.java:1604)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
use(QueryParser.java:1569)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
ueryParser.java:711)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
Parser.java:512)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
yParser.java:362)
at
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
calPlanBuilder.java:47)
at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
... 5 more
Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
at
org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
ext.java:506)
at
org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
text.java:528)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
on(QueryParser.java:2815)
... 21 more
Caused by: java.io.IOException: Could not resolve tokenize
using imports: [, org.apache.pig.builtin.,
com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
at
org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
ption.java:34)
at
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
at
org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
... 24 more
Caused by: java.lang.ClassNotFoundException: Could not
resolve tokenize using imports: [, org.apache.pig.builtin.,
com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
at
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
... 25 more

2009-02-19 10:47:11,143 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
Cannot instantiate:tokenize
grunt>


thanks,
Avram


-----Original Message-----
From: Alan Gates
Sent: Thursday, February 19, 2009 9:49 AM
To: pig-user@hadoop.apache.org
Subject: Re: date treatment & date level aggregations

Date is not a separate type in pig.

If you want to group on date, I think what you want is this:

A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
B = foreach A generate tokenize(A.v2) as (date,time), v3; C =
foreach B generate date, v3; D = group C by date; E = foreach
D generate group, SUM(C.v3); dump E;

This script will first tokenize the datestamp into date and
time, then project just the date and data you're going to
sum, and then do the grouping.

Alan.
On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

Hello,

I have a question regarding treatment of dates with PIG.

My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
format (e.g. 20090201 14:42:00 ) within a comma delimited file. I
want to aggregate to day-level relying on extracting the
date portion
(e.g. yyyymmdd, so the 20090201 ) of the timestamp only. I have been
experimenting with the tokenize function but I am unclear how to
accomplish an aggregation by date.

What am I doing wrong? How can I get a date-level aggregation?
Is there a 'Date' data type?


Here are the details:


Input Data:

4,20090201 23:59:56,8,1
3,20090202 23:59:56,101,1
4,20090201 23:59:56,114,1
5,20090202 23:59:56,29,1

Desired Output:
20090201, 122
20090202, 130

--My attempt in Pig:
A = load 'atest.csv' using PigStorage(',') as
(v1,v2,v3,v4); describe
A; B = foreach A generate group, tokenize(A.v2) as (date,time);
--fails here.
describe B;
C = group B by B.date;
describe C;
D = foreach C generate B.date, SUM(A.v3); dump D;


grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
grunt> describe A;
A: (v1, v2, v3, v4 )
grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
2009-02-18 15:11:44,278 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
Invalid alias: group in A: (v1, v2, v3, v4 )
at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
475)
at
org
.apache
.pig
.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
233)
at
org
.apache
.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
Invalid alias: group in A: (v1, v2, v3, v4 )
at
org
.apache
.pig
.impl
.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
3301)
at
org
.apache
.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
3225)
at
org
.apache
.pig
.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
2236)
at
org
.apache
.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
2175)
at
org
.apache
.pig
.impl
.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
2106)
at
org
.apache
.pig
.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
2038)
at
org
.apache
.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
2006)
at
org
.apache
.pig
.impl
.logicalLayer
.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
at
org
.apache
.pig
.impl
.logicalLayer
.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
at
org
.apache
.pig
.impl
.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
1862)
at
org
.apache
.pig
.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
1604)
at
org
.apache
.pig
.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
1569)
at
org
.apache
.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
711)
at
org
.apache
.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
at
org
.apache
.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
at
org
.apache
.pig
.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
47)
at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
... 5 more

2009-02-18 15:11:44,279 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
Invalid alias: group in A: (v1, v2, v3, v4 )
grunt>


Thanks in advance,
Avram

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 10 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedFeb 18, '09 at 11:20p
activeFeb 24, '09 at 5:10p
posts10
users5
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase