FAQ
Hadoop Shell Script causes ClassNotFoundException for Nutch processes
---------------------------------------------------------------------

Key: HADOOP-964
URL: https://issues.apache.org/jira/browse/HADOOP-964
Project: Hadoop
Issue Type: Bug
Components: scripts
Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
Reporter: Dennis Kubes
Priority: Critical
Fix For: 0.11.0


In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.

I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Dennis Kubes (JIRA) at Jan 31, 2007 at 11:51 pm
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Dennis Kubes updated HADOOP-964:
    --------------------------------

    Attachment: classpath.patch

    Patch added
    Hadoop Shell Script causes ClassNotFoundException for Nutch processes
    ---------------------------------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: scripts
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Critical
    Fix For: 0.11.0

    Attachments: classpath.patch


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dennis Kubes (JIRA) at Jan 31, 2007 at 11:51 pm
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Dennis Kubes updated HADOOP-964:
    --------------------------------

    Status: Patch Available (was: Open)

    This patch adds any jars in the HADOOP_HOME directory to the hadoop classpath that starts the Tasktracker.
    Hadoop Shell Script causes ClassNotFoundException for Nutch processes
    ---------------------------------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: scripts
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Critical
    Fix For: 0.11.0

    Attachments: classpath.patch


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Nigel Daley at Feb 1, 2007 at 12:09 am
    -1

    It sounds like this undoes
    http://issues.apache.org/jira/browse/HADOOP-700
    On Jan 31, 2007, at 3:51 PM, Dennis Kubes (JIRA) wrote:


    [ https://issues.apache.org/jira/browse/HADOOP-964?
    page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Dennis Kubes updated HADOOP-964:
    --------------------------------

    Status: Patch Available (was: Open)

    This patch adds any jars in the HADOOP_HOME directory to the hadoop
    classpath that starts the Tasktracker.
    Hadoop Shell Script causes ClassNotFoundException for Nutch processes
    ---------------------------------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: scripts
    Environment: windows xp and fedora core 6 linux, java
    1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Critical
    Fix For: 0.11.0

    Attachments: classpath.patch


    In the ReduceTaskRunner constructor lin 339 a sorter is created
    that attempts to get the map output key and value classes from the
    configuration object. This is before the TaskTracker$Child
    process is spawned off into into own separate JVM so here the
    classpath for the configuration is the classpath that started the
    TaskTracker. The current hadoop script includes the hadoop jars,
    meaning that any hadoop writable type will be found, but it
    doesn't include nutch jars so any nutch writable type or any
    other writable type will not be found and will throw a
    ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific
    Nutch jars in the Hadoop script but it is a good idea to allow
    jars to be included if they are in specific locations, such as the
    HADOOP_HOME where the nutch jar resides. I have attached a patch
    that adds any jars in the HADOOP_HOME directory to the hadoop
    classpath. This fixes the issues with getting
    ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dennis Kubes at Feb 1, 2007 at 12:28 am
    Yes it does. We could get around this by changing it to look for
    nutch-*.jar instead of *.jar but then there are nutch references in the
    hadoop scripts. Is there a better way to do this?

    Dennis Kubes

    Nigel Daley wrote:
    -1

    It sounds like this undoes
    http://issues.apache.org/jira/browse/HADOOP-700
    On Jan 31, 2007, at 3:51 PM, Dennis Kubes (JIRA) wrote:


    [
    https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
    ]

    Dennis Kubes updated HADOOP-964:
    --------------------------------

    Status: Patch Available (was: Open)

    This patch adds any jars in the HADOOP_HOME directory to the hadoop
    classpath that starts the Tasktracker.
    Hadoop Shell Script causes ClassNotFoundException for Nutch processes
    ---------------------------------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: scripts
    Environment: windows xp and fedora core 6 linux, java
    1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Critical
    Fix For: 0.11.0

    Attachments: classpath.patch


    In the ReduceTaskRunner constructor lin 339 a sorter is created that
    attempts to get the map output key and value classes from the
    configuration object. This is before the TaskTracker$Child process
    is spawned off into into own separate JVM so here the classpath for
    the configuration is the classpath that started the TaskTracker. The
    current hadoop script includes the hadoop jars, meaning that any
    hadoop writable type will be found, but it doesn't include nutch
    jars so any nutch writable type or any other writable type will not
    be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch
    jars in the Hadoop script but it is a good idea to allow jars to be
    included if they are in specific locations, such as the HADOOP_HOME
    where the nutch jar resides. I have attached a patch that adds any
    jars in the HADOOP_HOME directory to the hadoop classpath. This
    fixes the issues with getting ClassNotFoundExceptions inside of Nutch
    processes.
    --This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Feb 1, 2007 at 12:11 am
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469275 ]

    Hadoop QA commented on HADOOP-964:
    ----------------------------------

    +1, because http://issues.apache.org/jira/secure/attachment/12350088/classpath.patch applied and successfully tested against trunk revision r502030.
    Hadoop Shell Script causes ClassNotFoundException for Nutch processes
    ---------------------------------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: scripts
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Critical
    Fix For: 0.11.0

    Attachments: classpath.patch


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dennis Kubes (JIRA) at Feb 1, 2007 at 3:47 am
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Dennis Kubes updated HADOOP-964:
    --------------------------------

    Attachment: classpath2.path

    Fixes the classpath in the ReduceTaskRunner instead of the shell scripts
    Hadoop Shell Script causes ClassNotFoundException for Nutch processes
    ---------------------------------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: scripts
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Critical
    Fix For: 0.11.0

    Attachments: classpath.patch, classpath2.path


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dennis Kubes (JIRA) at Feb 1, 2007 at 3:59 am
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469327 ]

    Dennis Kubes commented on HADOOP-964:
    -------------------------------------

    The second patch classpath2.path (sorry should be patch) attacks the problem from the ReduceTaskRunner instead of the hadoop shell script. The problem is that Writable classes are not being found by the ReduceTaskRunner upon initialization. It needs these Writable classes to perform sorting, etc in the prepare stage. The first solution was to change the hadoop script to load any jars in the HADOOP_HOME. The hadoop script sets the classpath for the TaskTracker which is then passed to the ReduceTaskRunner and therefore by loading any jars in the home directory the necessary jars would be in the classpath and accessible. There are a few issues with that fix. First this reverses HADOOP-700 which we don't want to do. Second is we went down this path of setting classpath through the script for Writable classes then anytime new classes were added we would have to restart the TaskTracker nodes. Again not a good solution.

    So instead what I did with this patch is to change the ReduceTaskRunner to dynamically configure it classpath from the local unjarred work directory. It does this through creating a new URLClassLoader and adding in the same elements that are added to classpath of the TaskTracker$Child spawns while keeping the old context class loader as its parent. The new URLClassLoader is then set into the current JobConf as its classloader and it is used for the sorting, etc. This allows us to not have to change the hadoop script and two allows new writable classes to by dynamically added to the system without restarting TaskTracker nodes.

    I have run this patch on a development system using the Nutch injector as well as ran the TestMapRed unit tests. Both completed sucessfully.
    Hadoop Shell Script causes ClassNotFoundException for Nutch processes
    ---------------------------------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: scripts
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Critical
    Fix For: 0.11.0

    Attachments: classpath.patch, classpath2.path


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dennis Kubes (JIRA) at Feb 1, 2007 at 4:01 am
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Dennis Kubes updated HADOOP-964:
    --------------------------------

    Summary: ClassNotFoundException in ReduceTaskRunner (was: Hadoop Shell Script causes ClassNotFoundException for Nutch processes)

    This is really more of a ReduceTaskRunner than a shell script issue so changing title.
    ClassNotFoundException in ReduceTaskRunner
    ------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: scripts
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Critical
    Fix For: 0.11.0

    Attachments: classpath.patch, classpath2.path


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Feb 1, 2007 at 5:49 pm
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469523 ]

    Doug Cutting commented on HADOOP-964:
    -------------------------------------

    This is a serious bug, introduced by the new merge code. Previously, comparators were only used in the child process: ReduceTaskRunner.prepare() only copied binary data before, but not it does some sorting too. Hence, this logic should move into the child process. (Architecturally, the goal is to keep user code out of long-running daemon processes.)

    I think we should proceed as follows:

    1. Add a unit test where the comparator is in the jar file.
    2. Make a short-term fix that loads these classes into the TaskTracker.
    3. Add another bug to move all comparator access into the child process.


    ClassNotFoundException in ReduceTaskRunner
    ------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: scripts
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Critical
    Fix For: 0.11.0

    Attachments: classpath.patch, classpath2.path


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Feb 1, 2007 at 5:51 pm
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doug Cutting updated HADOOP-964:
    --------------------------------

    Component/s: (was: scripts)
    mapred
    Priority: Blocker (was: Critical)
    ClassNotFoundException in ReduceTaskRunner
    ------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Blocker
    Fix For: 0.11.0

    Attachments: classpath.patch, classpath2.path


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dennis Kubes (JIRA) at Feb 1, 2007 at 7:44 pm
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469560 ]

    Dennis Kubes commented on HADOOP-964:
    -------------------------------------

    Are we going to be able to move all comparator access int the child process. This is occurring during the sorting and merging. Would that process have to be moved to the child?

    The short term fix for loading the classes into the ReduceTaskRunner is classpath2.path, should I change this to occur in TaskTracker? It would essentially just be moving the method over to the TaskTracker and changing where it gets called.

    I will create a unit test for the comparator in the jar file now , the new bug has already been created as HADOOP-968.
    ClassNotFoundException in ReduceTaskRunner
    ------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Blocker
    Fix For: 0.11.0

    Attachments: classpath.patch, classpath2.path


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Feb 1, 2007 at 8:10 pm
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469568 ]

    Doug Cutting commented on HADOOP-964:
    -------------------------------------
    Are we going to be able to move all comparator access int the child process.
    Yes, I think so. That's the subject of HADOOP-968.
    should I change this to occur in TaskTracker?
    No. I was referring to the TaskTracker process, which is wherre ReduceTaskRunner runs. I have not yet looked closely at your patch, but it is certainly a candidate for (2), the short-term fix. HADOOP-968 is the long-term fix.
    I will create a unit test for the comparator in the jar file now
    Great! That would be most welcome. There are already some unit tests that use a job jar file, so this can probably be bundled into one of those.

    I think Owen's planning to review your patch more closely.
    ClassNotFoundException in ReduceTaskRunner
    ------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Blocker
    Fix For: 0.11.0

    Attachments: classpath.patch, classpath2.path


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dennis Kubes (JIRA) at Feb 2, 2007 at 4:20 am
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Dennis Kubes updated HADOOP-964:
    --------------------------------

    Attachment: classpath2_unittest.patch

    The unit tests that verify the classpath2.path, dynamically loading external writable from job.jar.
    ClassNotFoundException in ReduceTaskRunner
    ------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Blocker
    Fix For: 0.11.0

    Attachments: classpath.patch, classpath2.path, classpath2_unittest.patch


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Feb 2, 2007 at 4:52 pm
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469798 ]

    Owen O'Malley commented on HADOOP-964:
    --------------------------------------

    I think the classpath2.patch looks like the right approach for the short term fix. Thanks for the unit test.

    +1
    ClassNotFoundException in ReduceTaskRunner
    ------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Blocker
    Fix For: 0.11.0

    Attachments: classpath.patch, classpath2.path, classpath2_unittest.patch


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Feb 2, 2007 at 4:54 pm
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469800 ]

    Owen O'Malley commented on HADOOP-964:
    --------------------------------------

    (Although the preferred style is to make a new patch that includes all of the proposed changes in a single patch.)
    ClassNotFoundException in ReduceTaskRunner
    ------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Blocker
    Fix For: 0.11.0

    Attachments: classpath.patch, classpath2.path, classpath2_unittest.patch


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dennis Kubes (JIRA) at Feb 2, 2007 at 6:40 pm
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Dennis Kubes updated HADOOP-964:
    --------------------------------

    Attachment: classpath3.patch

    This patch contains both the classloader (classpath2.path) and the unit test (classpath2_unittest.patch) in a single file. Thanks for the heads up Owen!
    ClassNotFoundException in ReduceTaskRunner
    ------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Blocker
    Fix For: 0.11.0

    Attachments: classpath.patch, classpath2.path, classpath2_unittest.patch, classpath3.patch


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at Feb 2, 2007 at 7:24 pm
    [ https://issues.apache.org/jira/browse/HADOOP-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doug Cutting updated HADOOP-964:
    --------------------------------

    Resolution: Fixed
    Status: Resolved (was: Patch Available)

    I just fixed this. Thanks, Dennis!
    ClassNotFoundException in ReduceTaskRunner
    ------------------------------------------

    Key: HADOOP-964
    URL: https://issues.apache.org/jira/browse/HADOOP-964
    Project: Hadoop
    Issue Type: Bug
    Components: mapred
    Environment: windows xp and fedora core 6 linux, java 1.5.10...should affect all systems
    Reporter: Dennis Kubes
    Priority: Blocker
    Fix For: 0.11.0

    Attachments: classpath.patch, classpath2.path, classpath2_unittest.patch, classpath3.patch


    In the ReduceTaskRunner constructor lin 339 a sorter is created that attempts to get the map output key and value classes from the configuration object. This is before the TaskTracker$Child process is spawned off into into own separate JVM so here the classpath for the configuration is the classpath that started the TaskTracker. The current hadoop script includes the hadoop jars, meaning that any hadoop writable type will be found, but it doesn't include nutch jars so any nutch writable type or any other writable type will not be found and will throw a ClassNotFoundException.
    I don't think it is a good idea to have a dependecy on specific Nutch jars in the Hadoop script but it is a good idea to allow jars to be included if they are in specific locations, such as the HADOOP_HOME where the nutch jar resides. I have attached a patch that adds any jars in the HADOOP_HOME directory to the hadoop classpath. This fixes the issues with getting ClassNotFoundExceptions inside of Nutch processes.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJan 31, '07 at 11:49p
activeFeb 2, '07 at 7:24p
posts18
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase