Grokbase Groups Pig user August 2011
FAQ
I profiled Pig running in single-JVM local mode (using "-x local") and
75% of processing time is spent building JARs and optimizing the
execution plan:



50% in org.apache.pig.impl.util.JarManager.createJar

25% in org.apache.pig.newplan.optimizer.PlanOptimizer.optimize



The remaining 25% is spent in Hadoop running map/reduce. To improve
performance I want to avoid generating Hadoop code every time I run a
Pig script. I want to be able to run different data through the same
Pig script, where the data changes frequently but the Pig script never
changes. Is there a way to reuse the generated JAR files instead of
regenerating them every time?

Regards,

John

Search Discussions

  • Dmitriy Ryaboy at Aug 16, 2011 at 8:53 pm
    For local mode, it is not necessary to generate the jars. This is fixed in
    trunk, and the trivial patch can be applied to 8 or 9 if you like (
    https://issues.apache.org/jira/browse/PIG-2128).
    There's a lot more optimization we can do there for the MR mode, such as
    stick the jars into distributed cache instead of unjarring them and
    re-packaging everything every time. There are tickets for this (with
    patches, even).
    Storing the optimized plan and reusing it is a good idea, we should consider
    caching plans.. open a jira?

    D
    On Tue, Aug 16, 2011 at 10:08 AM, John Amos wrote:

    I profiled Pig running in single-JVM local mode (using "-x local") and
    75% of processing time is spent building JARs and optimizing the
    execution plan:



    50% in org.apache.pig.impl.util.JarManager.createJar

    25% in org.apache.pig.newplan.optimizer.PlanOptimizer.optimize



    The remaining 25% is spent in Hadoop running map/reduce. To improve
    performance I want to avoid generating Hadoop code every time I run a
    Pig script. I want to be able to run different data through the same
    Pig script, where the data changes frequently but the Pig script never
    changes. Is there a way to reuse the generated JAR files instead of
    regenerating them every time?

    Regards,

    John

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 16, '11 at 5:09p
activeAug 16, '11 at 8:53p
posts2
users2
websitepig.apache.org

2 users in discussion

John Amos: 1 post Dmitriy Ryaboy: 1 post

People

Translate

site design / logo © 2021 Grokbase