by me) need other jars at runtime (around 150) some of which have
conflicting resource names. Hence, trying to unpack all of them and
repacking into a single jar doesn't work. My solution is to create a
single top-level jar that names all the dependencies in Class-Path in
the MANIFEST.MF. This is also simpler from a user's point of view. Of
course this requires the top-level jar and all the dependencies to be
created with a certain directory structure that I can control.
Currently, I have a structure where I have a root directory which
contains the top-level jar and a directory called lib, and all the
dependencies are in lib, and the top-level jar names the dependencies as
lib/x.jar lib/y.jar etc. I package all of this as a single zip file for
Just to be clear this is the dir structure:
--- top-level.jarI can't register top-level.jar in my PIG script (this is the recommended
approach) because PIG then unpacks & repackages everything into a single
jar, instead of including the jar on the classpath. I can't use
distributed cache because if I specify top-level.jar and lib separately
in mapred.cache.files, then the relative directory locations aren't
preserved. If I use the mapred.cache.archives option and specify the zip
file, I can't add the top-level jar to the classpath (because the
entries in mapred.job.classpath.files must be something from
If mapred.child.java.opts also allowed java.class.path to be augmented
(similar to java.library.path, which I am using for native libs that I
store in another dir parallel to lib), it would have solved my problem.
I could have specified the zip in mapred.cache.archives, and added the
jar to the classpath. Right now I can't see any solution, other than
using a shared file system and adding top-level.jar to HADOOP_CLASSPATH
- this works because I am using a small cluster that has a shared file
system but clearly it's not always feasible (and of course, it's
modifying Hadoop's environment).
Please suggest any alternatives you can think of.