FAQ
So right off the bat, I fixed the regex patterns in my split, but what I
kept getting an error from the multiquery optimize. Specifically, the
following:

ERROR 2146: Internal Error. Inconsistency in key index found during
optimization. + stacktrace

As a temporary fix, I re-ran without multiquery optimization. Obviously as a
result, the script is running much slower. The question I have then is in
what exactly is causing this issue? How can I fix my script to be able to
run my queries and take advantage of the optimizer?
On Thu, Sep 3, 2009 at 4:03 PM, zaki rahaman wrote:

Hi all,

I'm becoming a bit more comfortable writing scripts, but still not always
sure what the best way to structure/frame my statements in order to optimize
performance. When it comes to Split and Filter, for example, one could
filter multiple times on a raw set of data or condense it into one split
statement, but it's not clear from the docs what the best practice in this
case is. Below is my script as it stands. Your input would be greatly
appreciated.

-- Queries for August by Day/Month/Week

REGISTER mypigudfs.jar;

raw = LOAD 'data' AS (timestamp:chararray, ip:chararray, userid:chararray);


dailyraw = FOREACH raw GENERATE userid, mypigudfs.ExtractDay(timestamp) AS
day;
SPLIT dailyraw INTO broken IF (userid matches '*BROKEN*'), noperm IF
(userid matches '*NOPERM*'), daily IF (NOT ((userid matches '*BROKEN*') OR
(userid matches '*NOPERM*')));


-- Daily Count(s)

daygrp = GROUP daily BY day PARALLEL 36;
daycnts = FOREACH daygrp GENERATE group, COUNT(daily);


-- NoPerm
npgrp = GROUP noperm BY day;
npcnts = FOREACH npgrp GENERATE group, COUNT(noperm);

--Broken
brkgrp = GROUP broken BY day;
brkcnts = FOREACH brkgrp GENERATE group, COUNT(broken);


-- Weekly Count(s)

weekly = FOREACH daily GENERATE userid, mypigudfs.ExtractWeek(day) AS week;
wkgrp = GROUP weekly By week PARALLEL 36;
wkcnts = FOREACH wkgrp GENERATE group, COUNT(weekly);

--Broken
broken2 = FOREACH broken GENERATE userid, mypigudfs.ExtractWeek(day) AS
week;
brkgrp2 = GROUP broken2 BY week;
brkcnts2 = FOREACH brkgrp2 GENERATE group, COUNT(broken2);


--NoPerm
noperm2 = FOREACH noperm GENERATE userid, mypigudfs.ExtractWeek(day) AS
week;
npgrp2 = GROUP noperm2 BY week;
npcnts2 = FOREACH npgrp2 GENERATE group, COUNT(noperm2);


-- Monthly Count

month = GROUP weekly ALL;
mcnt = FOREACH month GENERATE COUNT(weekly);

npmonth = GROUP noperm2 ALL;
npmcnt = FOREACH npmonth GENERATE COUNT(noperm2);

brkmonth = GROUP broken2 ALL;
brkmcnt = FOREACH brkmonth GENERATE COUNT(broken2);

// Store Output

--
Zaki Rahaman

--
Zaki Rahaman

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedSep 4, '09 at 6:35p
activeSep 4, '09 at 6:35p
posts1
users1
websitepig.apache.org

1 user in discussion

Zaki rahaman: 1 post

People

Translate

site design / logo © 2021 Grokbase