FAQ
Hi Olga,

Thanks for your message. I will have fairly particular needs, so I will take a leap into learning what it takes to develop needed UDFs. If it works out well and they are generic enough that they might be useful to others, I will see if I can get authorization to contribute back to piggybank.

-Avram


-----Original Message-----
From: Olga Natkovich
Sent: Thursday, February 19, 2009 11:28 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

TOKENIZE is not broken. It has particular semantics that might not work
for this query but are used in other contexts.

If a function with different semantics is needed, it can be written and
contributed to piggybank.

Olga
-----Original Message-----
From: Avram Aelony
Sent: Thursday, February 19, 2009 11:08 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

Thanks for identifying that the TOKENIZE builtin needs a fix
and filing the bug report.
I should have noted in my original email that I had tried
uppercase and that uppercase had also failed.

Thanks for everyone's help & I look forward to the fix.

Regards,
-Avram


-----Original Message-----
From: Santhosh Srinivasan
Sent: Thursday, February 19, 2009 11:01 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

Hi Avram,

A few things to note:

1. The builtin functions in Pig are Java UDFs, making them
case sensitive. You should use TOKENIZE instead of tokenize
2. It looks like the builtin TOKENIZE has to be fixed to
support your current usage. I have a filed a bug report to
track this : PIG-683
(https://issues.apache.org/jira/browse/PIG-683)

When PIG-683 is fixed, you should then be able to do the following:


A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
B = foreach A generate flatten(TOKENIZE(v2)) as (date,time),
v3; C = foreach B generate date, v3; D = group C by date; E =
foreach D generate group, SUM(C.v3); dump E;

Thanks,
Santhosh

-----Original Message-----
From: Avram Aelony
Sent: Thursday, February 19, 2009 10:59 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

I tried the capitalized version, that still leads to an
error. Now it appears to be a problem with the alias.



grunt> B = foreach A generate TOKENIZE(A.v2) as (date,time), v3;
2009-02-19 10:56:05,075 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException: Invalid
alias: A in A: (v1, v2, v3, v4 )
at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
java:475)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
gScriptPar
ser.java:233)
at
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
arser.java
:91)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
Invalid alias: A in A: (v1, v2, v3, v4 )
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasField
OrSpec(Que
ryParser.java:3301)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(
QueryParse
r.java:3225)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
ec(QueryPa
rser.java:2236)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
QueryParse
r.java:2175)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
tiveExpr(Q
ueryParser.java:2106)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
pr(QueryPa
rser.java:2038)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
QueryParse
r.java:2006)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgsIt
em(QueryPa
rser.java:2456)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalArgs(Q
ueryParser
.java:2397)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
ec(QueryPa
rser.java:2356)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
ec(QueryPa
rser.java:2230)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
QueryParse
r.java:2175)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
tiveExpr(Q
ueryParser.java:2106)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
pr(QueryPa
rser.java:2038)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
QueryParse
r.java:2006)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
enerateIte
m(QueryParser.java:1955)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
enerateIte
mList(QueryParser.java:1894)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
atement(Qu
eryParser.java:1862)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
k(QueryPar
ser.java:1604)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
use(QueryP
arser.java:1569)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
ueryParser
.java:711)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
Parser.jav
a:512)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
yParser.ja
va:362)
at
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
calPlanBui
lder.java:47)
at org.apache.pig.PigServer.registerQuery(PigServer.java:275)
... 5 more


-----Original Message-----
From: Olga Natkovich
Sent: Thursday, February 19, 2009 10:54 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

Functions in pig are case sensitive. The function name is TOKENIZE.
Please, refer to PigLatin Manula for details:
http://wiki.apache.org/pig-data/attachments/FrontPage/attachme
nts/plrm.h
tm.

Olga
-----Original Message-----
From: Avram Aelony
Sent: Thursday, February 19, 2009 10:51 AM
To: pig-user@hadoop.apache.org
Subject: RE: date treatment & date level aggregations

Unfortunately, step B of the solution you proposed fails
for me. Any
thoughts on how to remedy?


grunt> A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4);
grunt> describe A;
A: (v1, v2, v3, v4 )
grunt> B = foreach A generate tokenize(A.v2) as (date,time), v3;
2009-02-19 10:47:11,142 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
Cannot instantiate:tokenize
at
org.apache.pig.PigServer.registerQuery(PigServer.java:278)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.
java:475)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(Pi
gScriptParser.java:233)
at
org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntP
arser.java:91)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
at org.apache.pig.Main.main(Main.java:270)
Caused by:
org.apache.pig.impl.logicalLayer.parser.ParseException:
Cannot instantiate:tokenize
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
on(QueryParser.java:2818)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FuncEvalSp
ec(QueryParser.java:2354)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSp
ec(QueryParser.java:2230)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(
QueryParser.java:2175)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Multiplica
tiveExpr(QueryParser.java:2106)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveEx
pr(QueryParser.java:2038)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(
QueryParser.java:2006)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
enerateItem(QueryParser.java:1955)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedG
enerateItemList(QueryParser.java:1894)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateSt
atement(QueryParser.java:1862)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBloc
k(QueryParser.java:1604)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachCla
use(QueryParser.java:1569)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(Q
ueryParser.java:711)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(Query
Parser.java:512)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(Quer
yParser.java:362)
at
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(Logi
calPlanBuilder.java:47)
at
org.apache.pig.PigServer.registerQuery(PigServer.java:275)
... 5 more
Caused by: java.lang.RuntimeException: Cannot instantiate:tokenize
at
org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:456)
at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigCont
ext.java:506)
at
org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigCon
text.java:528)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncti
on(QueryParser.java:2815)
... 21 more
Caused by: java.io.IOException: Could not resolve tokenize using
imports: [, org.apache.pig.builtin., com.yahoo.pig.yst.sds.ULT.,
org.apache.pig.impl.builtin.]
at
org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOExce
ption.java:34)
at
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:421)
at
org.apache.pig.impl.PigContext.instantiateFunc(PigContext.java:453)
... 24 more
Caused by: java.lang.ClassNotFoundException: Could not resolve
tokenize using imports: [, org.apache.pig.builtin.,
com.yahoo.pig.yst.sds.ULT., org.apache.pig.impl.builtin.]
at
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:420)
... 25 more

2009-02-19 10:47:11,143 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
Cannot instantiate:tokenize
grunt>


thanks,
Avram


-----Original Message-----
From: Alan Gates
Sent: Thursday, February 19, 2009 9:49 AM
To: pig-user@hadoop.apache.org
Subject: Re: date treatment & date level aggregations

Date is not a separate type in pig.

If you want to group on date, I think what you want is this:

A = load 'atest.csv' using PigStorage(',') as (v1,v2,v3,v4); B =
foreach A generate tokenize(A.v2) as (date,time), v3; C = foreach B
generate date, v3; D = group C by date; E = foreach D
generate group,
SUM(C.v3); dump E;

This script will first tokenize the datestamp into date and
time, then
project just the date and data you're going to sum, and then do the
grouping.

Alan.
On Feb 18, 2009, at 3:19 PM, Avram Aelony wrote:

Hello,

I have a question regarding treatment of dates with PIG.

My input files contain a timestamp field in 'yyyymmdd hh:mm:ss'
format (e.g. 20090201 14:42:00 ) within a comma delimited
file. I
want to aggregate to day-level relying on extracting the
date portion
(e.g. yyyymmdd, so the 20090201 ) of the timestamp only. I have been
experimenting with the tokenize function but I am unclear how to
accomplish an aggregation by date.

What am I doing wrong? How can I get a date-level aggregation?
Is there a 'Date' data type?


Here are the details:


Input Data:

4,20090201 23:59:56,8,1
3,20090202 23:59:56,101,1
4,20090201 23:59:56,114,1
5,20090202 23:59:56,29,1

Desired Output:
20090201, 122
20090202, 130

--My attempt in Pig:
A = load 'atest.csv' using PigStorage(',') as
(v1,v2,v3,v4); describe
A; B = foreach A generate group, tokenize(A.v2) as (date,time);
--fails here.
describe B;
C = group B by B.date;
describe C;
D = foreach C generate B.date, SUM(A.v3); dump D;


grunt> A = load 'atest.csv' using PigStorage(',') as
(v1,v2,v3,v4);
grunt> describe A;
A: (v1, v2, v3, v4 )
grunt> B = foreach A generate group, tokenize(A.v2) as
(date,time);
2009-02-18 15:11:44,278 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
Invalid alias: group in A: (v1, v2, v3, v4 )
at
org.apache.pig.PigServer.registerQuery(PigServer.java:278)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
475)
at
org
.apache
.pig
.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
233)
at
org
.apache
.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:91)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
at org.apache.pig.Main.main(Main.java:270)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:
Invalid alias: group in A: (v1, v2, v3, v4 )
at
org
.apache
.pig
.impl
.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:
3301)
at
org
.apache
.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:
3225)
at
org
.apache
.pig
.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:
2236)
at
org
.apache
.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:
2175)
at
org
.apache
.pig
.impl
.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:
2106)
at
org
.apache
.pig
.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:
2038)
at
org
.apache
.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:
2006)
at
org
.apache
.pig
.impl
.logicalLayer
.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:1955)
at
org
.apache
.pig
.impl
.logicalLayer
.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:1894)
at
org
.apache
.pig
.impl
.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:
1862)
at
org
.apache
.pig
.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:
1604)
at
org
.apache
.pig
.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:
1569)
at
org
.apache
.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:
711)
at
org
.apache
.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:512)
at
org
.apache
.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:362)
at
org
.apache
.pig
.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:
47)
at
org.apache.pig.PigServer.registerQuery(PigServer.java:275)
... 5 more

2009-02-18 15:11:44,279 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - java.io.IOException:
Invalid alias: group in A: (v1, v2, v3, v4 )
grunt>


Thanks in advance,
Avram

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 10 of 10 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedFeb 18, '09 at 11:20p
activeFeb 24, '09 at 5:10p
posts10
users5
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase