I profiled Pig running in single-JVM local mode (using "-x local") and
75% of processing time is spent building JARs and optimizing the
execution plan:
50% in org.apache.pig.impl.util.JarManager.createJar
25% in org.apache.pig.newplan.optimizer.PlanOptimizer.optimize
The remaining 25% is spent in Hadoop running map/reduce. To improve
performance I want to avoid generating Hadoop code every time I run a
Pig script. I want to be able to run different data through the same
Pig script, where the data changes frequently but the Pig script never
changes. Is there a way to reuse the generated JAR files instead of
regenerating them every time?
Regards,
John