You are mostly correct. All of those jars are not required to be in
there in pig-withouthadoop.jar. I see no reason why junit needs to be
there. Jackson and Joda are piggybank dependencies and as such should
be included in piggybank.jar not in pig-withouthadoop.jar. No idea
from where hamcrest and jshell are getting included. Looks like they
should be removed as well. I think even jline can be removed since its
only required at client side where users will be either using pig.jar
(which contains everything in any case) or setting up there own
classpath to use pig-withouthadoop.jar. So, it seems all the jars you
pointed out can be removed from pig-withouthadoop and that will lower
the distribution cost of it to all tasktracker node.
Lets open a jira and continue the discussion over there. Scott, would
you mind opening one?
On Sun, Aug 8, 2010 at 12:41, Scott Carey wrote:
That ant target is still a problem.
It may have removed most hadoop jars, but still has useless dependencies. Why is junit in there? Why is jackson in there? I don't see why I need to push Junit out to the cluster with each submitted job. I don't see where Pig is using JSON form Jackson.
The latter makes it impossible to use Pig with Avro unless you order the classpath right or build a custom jar.
Are hamcrest and jshell used?
I get the jline, and joda inclusions, but even then those should probably be external jars on the classpath from a lib directory.
Setting Pig up with a proper maven POM or ivy configuration would be a big plus to those consuming Pig.
On Jul 31, 2010, at 1:21 PM, Ashutosh Chauhan wrote:
There is an ant target pig-withouthadoop which generates
pig-withouthadoop.jar which contains minimal classes to run Pig and
has none of dependencies in it. It's 5.4M compared to 13M of pig.jar
You may want to try that.
The default target builds pig.jar since we dont want our new users to
deal with classpath issues when they are just starting off and thus
build a self-contained jar for them.
On Sat, Jul 31, 2010 at 09:30, Scott Carey wrote:
It has about 10x. The jars necessary in it including hadoop and all its dependencies. I had to build a sanitized pig jar the build.xml has some targets with reduced output.
----- Reply message -----
From: "Xavier Stevens" <firstname.lastname@example.org>
Date: Fri, Jul 30, 2010 9:30 am
Subject: Removing Jetty classes from Pig JAR
To: "email@example.com" <firstname.lastname@example.org>
It seems the Pig 0.7.0 JAR contains Jetty classes. It's causing some
classloader problems for a webapp of mine that happens to include the
Pig JAR. Is there some reason why this has to be this way? If not they
should probably be removed.