FAQ
Is there anything that allows someone to do ad hoc Hadoop / pig job and
have the results emailed to a users email account via web interface?

The app would ideally allow for passing parameters to pig jobs.
Something like askaban, but with fields for a few dynamic pieces for
scripts.

Search Discussions

  • Jeff Hammerbacher at Nov 18, 2010 at 5:07 am
    Hey Brian,

    The Job Designer application in Cloudera's Hue (
    http://archive.cloudera.com/cdh/3/hue for the release;
    https://github.com/cloudera/hue for the source) allows you to parameterize
    jobs and register for email notifications. Also, the Beeswax application
    allows for Hive queries to be submitted from Hue. If you have further
    comments or feature requests, you can submit them to the Hue JIRA at
    https://issues.cloudera.org/browse/HUE or the user mailing list at
    https://groups.google.com/a/cloudera.org/group/hue-user/topics.

    Regards,
    Jeff
    On Wed, Nov 17, 2010 at 10:00 AM, Brian Adams wrote:

    Is there anything that allows someone to do ad hoc Hadoop / pig job and
    have the results emailed to a users email account via web interface?

    The app would ideally allow for passing parameters to pig jobs.
    Something like askaban, but with fields for a few dynamic pieces for
    scripts.
  • Brian Adams at Nov 18, 2010 at 3:14 pm
    Thanks a lot Jeff. I will check those out.


    -----Original Message-----
    From: Jeff Hammerbacher
    Sent: Thursday, November 18, 2010 12:08 AM
    To: user@pig.apache.org
    Subject: Re: Ad-Hoc Reporting Interface/App

    Hey Brian,

    The Job Designer application in Cloudera's Hue (
    http://archive.cloudera.com/cdh/3/hue for the release;
    https://github.com/cloudera/hue for the source) allows you to
    parameterize jobs and register for email notifications. Also, the
    Beeswax application allows for Hive queries to be submitted from Hue. If
    you have further comments or feature requests, you can submit them to
    the Hue JIRA at https://issues.cloudera.org/browse/HUE or the user
    mailing list at
    https://groups.google.com/a/cloudera.org/group/hue-user/topics.

    Regards,
    Jeff

    On Wed, Nov 17, 2010 at 10:00 AM, Brian Adams
    wrote:
    Is there anything that allows someone to do ad hoc Hadoop / pig job
    and have the results emailed to a users email account via web
    interface?
    The app would ideally allow for passing parameters to pig jobs.
    Something like askaban, but with fields for a few dynamic pieces for
    scripts.
  • Mallya, Ashok at Nov 18, 2010 at 11:02 pm
    Hello,
    I have a dataset with more than 180 columns to which I want to join (based on two columns) to another.

    I would like not to have to enumerate all the 180 column names in a schema. What other options do I have?

    Here is my script:

    -- This has 180 columns which I do not want to explicitly declare
    wide_data = LOAD '/wide/' USING PigStorage('\t');
    DESCRIBE wide_data ;

    narrow_data =
    LOAD'/narrow/'
    USING PigStorage('\t')
    AS (
    a : chararray,
    b : chararray,
    c : long,
    d : double
    );

    narrow_data = FOREACH narrow_data GENERATE a, c, d ;
    DESCRIBE narrow_data;

    -- join based on two columns
    j = JOIN wide_data BY ((chararray)$20, (long)$172), narrow_data BY (a, c) PARALLEL 1800 ;
    DESCRIBE j;

    STORE j into '/output/';
    ====

    When I execute pig -x, it complains because it does not know the schema:

    Schema for wide_data unknown.
    narrow_data: {a: chararray,c: long,d: double}
    Schema for j unknown.
    2010-11-18 15:58:03,589 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2189: Expect schema
    Details at logfile: /hadoop/home/amallya/JARs/pig_1290121082836.log


    The log file says:

    more pig_1290121082836.log
    Pig Stack Trace
    ---------------
    ERROR 2189: Expect schema

    org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2185: Unable to prune columns when processing node (Name: ForEach 1-72 Operator Key: 1-7
    2)
    at org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.processNode(PruneColumns.java:515)
    at org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.transform(PruneColumns.java:150)
    at org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:232)
    at org.apache.pig.PigServer.compileLp(PigServer.java:857)
    at org.apache.pig.PigServer.compileLp(PigServer.java:793)
    at org.apache.pig.PigServer.execute(PigServer.java:762)
    at org.apache.pig.PigServer.access$100(PigServer.java:90)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:952)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:249)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:386)
    Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2185: Unable to prune columns when processing node (Name: Load 1-47 Operator
    Key: 1-47)
    at org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.processNode(PruneColumns.java:515)
    at org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.processNode(PruneColumns.java:510)
    ... 13 more
    Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2188: Cannot prune columns for (Name: LOJoin 1-62 Operator Key: 1-62)
    at org.apache.pig.impl.logicalLayer.ColumnPruner.prune(ColumnPruner.java:226)
    at org.apache.pig.impl.logicalLayer.ColumnPruner.visit(ColumnPruner.java:251)
    at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:206)
    at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
    at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.pruneLoader(PruneColumns.java:762)
    at org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.processNode(PruneColumns.java:198)
    ... 14 more
    Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2189: Expect schema
    at org.apache.pig.impl.logicalLayer.ColumnPruner.prune(ColumnPruner.java:78)
    ... 21 more

    Thanks
    Ashok.
  • Daniel Dai at Nov 18, 2010 at 11:41 pm
    It is a bug, which is addressed in Pig 0.8 soon to come. You can use the
    option "-t PruneColumns" to run it with 0.7.

    Daniel

    Mallya, Ashok wrote:
    Hello,
    I have a dataset with more than 180 columns to which I want to join (based on two columns) to another.

    I would like not to have to enumerate all the 180 column names in a schema. What other options do I have?

    Here is my script:

    -- This has 180 columns which I do not want to explicitly declare
    wide_data = LOAD '/wide/' USING PigStorage('\t');
    DESCRIBE wide_data ;

    narrow_data =
    LOAD'/narrow/'
    USING PigStorage('\t')
    AS (
    a : chararray,
    b : chararray,
    c : long,
    d : double
    );

    narrow_data = FOREACH narrow_data GENERATE a, c, d ;
    DESCRIBE narrow_data;

    -- join based on two columns
    j = JOIN wide_data BY ((chararray)$20, (long)$172), narrow_data BY (a, c) PARALLEL 1800 ;
    DESCRIBE j;

    STORE j into '/output/';
    ====

    When I execute pig -x, it complains because it does not know the schema:

    Schema for wide_data unknown.
    narrow_data: {a: chararray,c: long,d: double}
    Schema for j unknown.
    2010-11-18 15:58:03,589 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2189: Expect schema
    Details at logfile: /hadoop/home/amallya/JARs/pig_1290121082836.log


    The log file says:

    more pig_1290121082836.log
    Pig Stack Trace
    ---------------
    ERROR 2189: Expect schema

    org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2185: Unable to prune columns when processing node (Name: ForEach 1-72 Operator Key: 1-7
    2)
    at org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.processNode(PruneColumns.java:515)
    at org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.transform(PruneColumns.java:150)
    at org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:232)
    at org.apache.pig.PigServer.compileLp(PigServer.java:857)
    at org.apache.pig.PigServer.compileLp(PigServer.java:793)
    at org.apache.pig.PigServer.execute(PigServer.java:762)
    at org.apache.pig.PigServer.access$100(PigServer.java:90)
    at org.apache.pig.PigServer$Graph.execute(PigServer.java:952)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:249)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:115)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:172)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
    at org.apache.pig.Main.main(Main.java:386)
    Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2185: Unable to prune columns when processing node (Name: Load 1-47 Operator
    Key: 1-47)
    at org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.processNode(PruneColumns.java:515)
    at org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.processNode(PruneColumns.java:510)
    ... 13 more
    Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2188: Cannot prune columns for (Name: LOJoin 1-62 Operator Key: 1-62)
    at org.apache.pig.impl.logicalLayer.ColumnPruner.prune(ColumnPruner.java:226)
    at org.apache.pig.impl.logicalLayer.ColumnPruner.visit(ColumnPruner.java:251)
    at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:206)
    at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
    at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
    at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
    at org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.pruneLoader(PruneColumns.java:762)
    at org.apache.pig.impl.logicalLayer.optimizer.PruneColumns.processNode(PruneColumns.java:198)
    ... 14 more
    Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2189: Expect schema
    at org.apache.pig.impl.logicalLayer.ColumnPruner.prune(ColumnPruner.java:78)
    ... 21 more

    Thanks
    Ashok.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedNov 17, '10 at 6:00p
activeNov 18, '10 at 11:41p
posts5
users4
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase