FAQ
Hello,

I have a question regarding treatment of dates with PIG.

My input files contain a timestamp field in 'yyyymmdd hh:mm:ss' format (e.g. 20090201 14:42:00 ) within a comma delimited file. I want to aggregate to day-level relying on extracting the date portion (e.g. yyyymmdd, so the 20090201 ) of the timestamp only. I have been experimenting with the tokenize function but I am unclear how to accomplish an aggregation by date.

What am I doing wrong? How can I get a date-level aggregation?
Is there a 'Date' data type?


Here are the details:


Input Data:

4,20090201 23:59:56,8,1
3,20090202 23:59:56,101,1
4,20090201 23:59:56,114,1
5,20090202 23:59:56,29,1

Desired Output:
20090201, 122
20090202, 130

--My attempt in Pig:
A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
describe A;
B = foreach A generate group, tokenize(A.v2) as (date,time); --fails here.
describe B;
C = group B by B.date;
describe C;
D = foreach C generate B.date, SUM(A.v3);
dump D;


grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
grunt> describe A;
A: (v1, v2, v3, v4 )
grunt> B = foreach A generate group, tokenize(A.v2) as (date,time);
2009-02-18 15:11:44,278 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid alias: group in A: (v1, v2, v3, v4 )
at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:475)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
at org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid alias: group in A: (v1, v2, v3, v4 )
at org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:3301)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:3225)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:2236)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:2175)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:2106)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:2038)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:2006)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:1862)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:1604)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:1569)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:711)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:47)
at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
... 5 more

2009-02-18 15:11:44,279 [main] ERROR org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid alias: group in A: (v1, v2, v3, v4 )
grunt>


Thanks in advance,
Avram

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 10 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedFeb 18, '09 at 11:20p
activeFeb 24, '09 at 5:10p
posts10
users5
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase