Grokbase Groups Pig user June 2011
FAQ
Hi all,

I'm getting the exception (at the end) from the following using Pig:

eLine = FOREACH logLine
GENERATE
FLATTEN(
REGEX_EXTRACT_ALL(
$0,
'.*Output.Count\\s*\\-\\s*([A-Za-z\\.]+)\\s*(\\d+)'
)
) AS (ename:CHARARRAY, ecount:DOUBLE);

nameGroup = GROUP eLine BY eventName;

lines = FOREACH nameGroup GENERATE group as name,
MAX(com.example.BagToTupleUDF((tuple)eLine.ecount)) as maxCount;

My UDF is converting the values from a bag {(12),(4),(7),(190)} to a tuple
of doubles (12,4,7,190).

Can anybody help explain how i can use the Pig builtin functions MAX, MIN,
AVG over this kind of data extracted from a regex?

Many thanks,
Jon.

---

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during
parsing. Invalid alias: MAX in {group: chararray,eLine: {ename:
chararray,ecount: double}}
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1617)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1561)
at org.apache.pig.PigServer.registerQuery(PigServer.java:533)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:868)
at org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:388)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:53)
at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160)
at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:244)
at
message_archiver.reporting.pig.functions.OEPigTest.singleRawTextFile(OEPigTest.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
at
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49)
at
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:

Invalid alias: MAX in {group: chararray,line: {name: chararray,count:
double}}
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:7415)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:7226)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:5297)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:5187)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:5133)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:5042)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:4968)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:4934)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:4861)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:4760)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:4704)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:4030)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3433)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
at
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
at
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1611)
... 34 more

Search Discussions

  • Dmitriy Ryaboy at Jun 24, 2011 at 10:46 pm
    Why are you casting eLine.ecount as a tuple? It's a bag (all ecounts with
    this eventName)

    D
    On Thu, Jun 23, 2011 at 9:41 AM, Jonathan Holloway wrote:

    Hi all,

    I'm getting the exception (at the end) from the following using Pig:

    eLine = FOREACH logLine
    GENERATE
    FLATTEN(
    REGEX_EXTRACT_ALL(
    $0,
    '.*Output.Count\\s*\\-\\s*([A-Za-z\\.]+)\\s*(\\d+)'
    )
    ) AS (ename:CHARARRAY, ecount:DOUBLE);

    nameGroup = GROUP eLine BY eventName;

    lines = FOREACH nameGroup GENERATE group as name,
    MAX(com.example.BagToTupleUDF((tuple)eLine.ecount)) as maxCount;

    My UDF is converting the values from a bag {(12),(4),(7),(190)} to a tuple
    of doubles (12,4,7,190).

    Can anybody help explain how i can use the Pig builtin functions MAX, MIN,
    AVG over this kind of data extracted from a regex?

    Many thanks,
    Jon.

    ---

    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error
    during
    parsing. Invalid alias: MAX in {group: chararray,eLine: {ename:
    chararray,ecount: double}}
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1617)
    at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1561)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:533)
    at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:868)
    at org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61)
    at

    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:388)
    at

    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
    at org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:53)
    at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160)
    at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:244)
    at

    message_archiver.reporting.pig.functions.OEPigTest.singleRawTextFile(OEPigTest.java:78)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at

    org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
    at

    org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at

    org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
    at

    org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    at

    org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
    at
    org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
    at

    org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    at

    org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
    at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
    at

    org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
    at
    org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
    at

    org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49)
    at

    org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at

    org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
    at

    org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
    at

    org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
    at

    org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
    Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException:

    Invalid alias: MAX in {group: chararray,line: {name: chararray,count:
    double}}
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:7415)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:7226)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:5297)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:5187)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:5133)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:5042)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:4968)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:4934)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:4861)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:4760)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:4704)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:4030)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3433)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
    at

    org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
    at

    org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1611)
    ... 34 more
  • Jonathan Holloway at Jun 24, 2011 at 11:11 pm
    I ended up fixing this issue - i did change it to a bag after but the main problem was that regexextractall was returning everything as a string (bia group) which meant that max, avg etc... was not matched as a matching function for a bag of tuple doubles.

    I ended up writing a new udf for extractall to return types based on whether \d or \w was used in the regexp. Flattening that to specfic types didnt work.

    That solved the issue, would appreciate the feedback on the udf and approach - will post it early next week on pastebin. If there's a better way then please let me know.

    This whole solution was because I wanted to get around the issue of creating a new udf for each log line type I needed to parse.

    Many thanks,
    Jon
    On 24 Jun 2011, at 23:45, Dmitriy Ryaboy wrote:

    <mime-attachment.txt>
  • Dmitriy Ryaboy at Jun 25, 2011 at 12:21 am
    you can cast to longs and doubles from strings, that should've helped.
    On Fri, Jun 24, 2011 at 4:10 PM, Jonathan Holloway wrote:

    I ended up fixing this issue - i did change it to a bag after but the main
    problem was that regexextractall was returning everything as a string (bia
    group) which meant that max, avg etc... was not matched as a matching
    function for a bag of tuple doubles.

    I ended up writing a new udf for extractall to return types based on
    whether \d or \w was used in the regexp. Flattening that to specfic types
    didnt work.

    That solved the issue, would appreciate the feedback on the udf and
    approach - will post it early next week on pastebin. If there's a better way
    then please let me know.

    This whole solution was because I wanted to get around the issue of
    creating a new udf for each log line type I needed to parse.

    Many thanks,
    Jon
    On 24 Jun 2011, at 23:45, Dmitriy Ryaboy wrote:

    <mime-attachment.txt>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 23, '11 at 4:42p
activeJun 25, '11 at 12:21a
posts4
users2
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase