FAQ
Sometimes when running hadoop jobs using the 'hadoop jar' command there
are issues with the classloader. I presume these are caused by classes
that are loaded BEFORE the commands main is invoced. For example, when
relying on the MapWritable in the command, it is not possible to use a
class that is not in the default idToClassMap. MapWritable.class is
loaded before the user job is unpacked and therefore it's classloader
will not be able to find custom classes. (At least classes that are in
the RunJar it's classloader classpath).

I could not find any issues or discussion about this so I assume it is
somewhat of an obscure issue (please correct me if I'm wrong). Anyway I
would like to hear what you think of this and perhaps discuss a possible
solution. Such as spawning the command in a new JVM. MapWritable or
rather AbstractMapWritable uses a Class.forName(className) construction,
maybe this can be changed so that uses the classloader of the current
thread instead of its own class. (Will this work?)

A workaround for now is to explicitely put the jar itself on the
classpath, i.e. 'env HADOOP_CLASSPATH=myJar hadoop jar myJar'.

Search Discussions

  • Ferdy Galema at Oct 5, 2011 at 8:09 am
    Bumping this thread because currently I'm more aware of what is actually
    happening. If I understand correctly, when submitting jobs using RunJar
    the classpath is extended using a new classloader. This classloader adds
    the unzipped contents from the jar to the current thread classpath
    (contextClassLoader). This brings 2 issues to mind:

    1) In RunJar, when constructing the new URLClassLoader, would it not be
    better to chain the *previously* contextClassLoader instead of using the
    system classloader? (The latter is used when the classloader argument is
    omitted in the ctor of URLClassLoader, which is what RunJar does). This
    is a truely a minor issue, since most of the times RunJar is used as a
    result of invocating 'hadoop jar' from the command line and therefore
    the previous thread contextClassLoader actually will be the system
    classloader. I bring this up for at least trying to understand the process.

    2) To proceed on my previous findings on AbstractMapWritable, I think
    the problem of it unable to find classes is because it is loaded by a
    parent classloader (system classloader) instead of the new child
    classloader set by RunJar. The classloader of AbstractMapWritable is not
    this child classloader because it is already loaded (indirectly in
    Configuration) before the thread contextClassLoader is replaced in
    RunJar, therefore it is unable to find certain extracted classes. So why
    does AbstractMapWritable use the classloader of it's class
    [Class.forName(className)] instead of the current thread
    [Class.forName(className, true,
    Thread.currentThread().getContextClassLoader())]. Is it not wiser to
    always use the latter construction in general classloading code?

    Ferdy.
    On 09/09/2011 11:54 AM, Ferdy Galema wrote:
    Sometimes when running hadoop jobs using the 'hadoop jar' command
    there are issues with the classloader. I presume these are caused by
    classes that are loaded BEFORE the commands main is invoced. For
    example, when relying on the MapWritable in the command, it is not
    possible to use a class that is not in the default idToClassMap.
    MapWritable.class is loaded before the user job is unpacked and
    therefore it's classloader will not be able to find custom classes.
    (At least classes that are in the RunJar it's classloader classpath).

    I could not find any issues or discussion about this so I assume it is
    somewhat of an obscure issue (please correct me if I'm wrong). Anyway
    I would like to hear what you think of this and perhaps discuss a
    possible solution. Such as spawning the command in a new JVM.
    MapWritable or rather AbstractMapWritable uses a
    Class.forName(className) construction, maybe this can be changed so
    that uses the classloader of the current thread instead of its own
    class. (Will this work?)

    A workaround for now is to explicitely put the jar itself on the
    classpath, i.e. 'env HADOOP_CLASSPATH=myJar hadoop jar myJar'.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedSep 9, '11 at 9:54a
activeOct 5, '11 at 8:09a
posts2
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Ferdy Galema: 2 posts

People

Translate

site design / logo © 2022 Grokbase