FAQ
[ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576388#action_12576388 ]

Mahadev konar commented on HADOOP-1622:
---------------------------------------

dennis, any updates on this bug?
Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
--------------------------------------------------------------------------------------------

Key: HADOOP-1622
URL: https://issues.apache.org/jira/browse/HADOOP-1622
Project: Hadoop Core
Issue Type: Improvement
Components: mapred
Reporter: Runping Qi
Assignee: Dennis Kubes
Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


More likely than not, a user's job may depend on multiple jars.
Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
(like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
of job submission. Someting like:
bin/hadoop .... --depending_jars j1.jar:j2.jar
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Dennis Kubes (JIRA) at Mar 9, 2008 at 1:09 am
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576671#action_12576671 ]

    Dennis Kubes commented on HADOOP-1622:
    --------------------------------------

    No updates yet, but I should have time to start working on this again in the next couple of days, right after I finish some working on converting hadoop RPC to NIO.
    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Dennis Kubes
    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Mahadev konar (JIRA) at Mar 9, 2008 at 1:17 am
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576673#action_12576673 ]

    Mahadev konar commented on HADOOP-1622:
    ---------------------------------------

    great.... we also need this feature to get into 0.17. let me know if you need any hep getting this into 0.17...

    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Dennis Kubes
    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Milind A Bhandarkar at Mar 9, 2008 at 1:44 am
    Great ! Looking forward to a patch soon.


    ----- Original Message -----
    From: Dennis Kubes (JIRA) <jira@apache.org>
    To: core-dev@hadoop.apache.org <core-dev@hadoop.apache.org>
    Sent: Sat Mar 08 17:07:46 2008
    Subject: [jira] Commented: (HADOOP-1622) Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on


    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576671#action_12576671 ]

    Dennis Kubes commented on HADOOP-1622:
    --------------------------------------

    No updates yet, but I should have time to start working on this again in the next couple of days, right after I finish some working on converting hadoop RPC to NIO.
    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Dennis Kubes
    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Owen O'Malley (JIRA) at Mar 12, 2008 at 5:02 am
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577708#action_12577708 ]

    Owen O'Malley commented on HADOOP-1622:
    ---------------------------------------

    Dennis,
    Upon looking at this, I'm getting worried. This looks like a lot of special cases. What we really need is to support 3 kinds of files:

    * simple files
    * archives
    * jar files

    for each of these things, we would like them to be able to come from a URI and most convenient would be a default of a local file. So, I propose something like:

    {code}
    -file foo,bar,hdfs:baz
    {code}

    will upload foo and bar to an upload area and download foo, bar, and baz to the slave nodes as the tasks are run on them.

    {code}
    -archive foo.zip,hdfs:baz.zip
    {code}

    will download foo.zip and baz.zip and expand them.

    Finally, the -jar option would download them and put them on the class path. So,

    {code}
    -jar myjar.jar,hadoop-0.16.1-streaming.jar
    {code}

    would upload the files in the job client, download them to the slaves, and add them to the class path in the given order.

    I think I'd leave the rsync functionality out and just use hdfs:_upload/$jobid/... as transient storage and delete it when the job is done. If the user wants to save the bandwidth they can upload the files to hdfs themselves, in which case they don't need to be uploaded.
    Thoughts?

    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Dennis Kubes
    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Mahadev konar (JIRA) at Mar 13, 2008 at 6:26 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578433#action_12578433 ]

    Mahadev konar commented on HADOOP-1622:
    ---------------------------------------

    am starting working on this ... dennis if you are already working on this please let me know..
    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Dennis Kubes (JIRA) at Mar 13, 2008 at 7:18 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578454#action_12578454 ]

    Dennis Kubes commented on HADOOP-1622:
    --------------------------------------

    I have not resumed working on this as of yet. Am currently neck deep in reworking NIO for hadoop RPC. I was planning on finishing on this as soon as I had completed the NIO code in the next 2-3 days. I would like to continue working on this if possible. When is 0.17 scheduled for release?

    Owen, the first pass at this didn't distinguish between jar or regular files on the command line. Instead there was detection code that identified files as such. Also the first pass supported directories as well as files (don't know if you are including that in file). I think the ability to include directories for job input is extremely important. What were the special cases that you were seeing?

    The idea behind this code is much like streaming you could upload and cache files from any type of resource (file, directory, jar, etc.) from any file system. So, for instance people could store common jars or file resources on S3 and pull them down into a job.


    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Mahadev konar (JIRA) at Mar 13, 2008 at 8:54 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578487#action_12578487 ]

    Mahadev konar commented on HADOOP-1622:
    ---------------------------------------

    alos owen, what would the command line look like with your suggestions?

    hadoop jar -file <files> -jar <jars> -archive <archives> ?

    Also, if that is the case then we could make it generic for streaming which uses its own options for -file , -archives and others .... though we do not need to do that in this patch...
    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Mahadev konar (JIRA) at Mar 19, 2008 at 4:48 am
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580227#action_12580227 ]

    Mahadev konar commented on HADOOP-1622:
    ---------------------------------------

    i like owens idea. its simple and gives the users the flexibility they need.

    here is how I am implementing this --

    the hadoop command line will have the following options

    hadoop jar -file <comma seperated files> -jar <comma seperated jars> -archive <comma seperated archives>

    all of these can be comma seperated uri's -- defaulting to local file system if not specified.

    jobclient uploads the files / jars / archives onto HDFS ..... or the filesystem used by mapreduce. ... under the job directory

    given that these files/jars/archives might have the same name and different uris....
    example : hadoop jar -file file:///file1,hdfs://somehost:port/file1
    we would store these files as
    jobdir/file/file/file1
    jobdir/hdfs_somehost_port/file1

    To keep these files in different directories with the directory name as the uri would give us the ability to just use DistributedCache as it is.

    so we could say DistributedCache.addFiles(jobdir/file/file/file1, jobdir/hdfs_somehost_port/file1);
    something like this ...

    so the job directory would like

    jobdir/jars/urischeme/<jarfiles>
    jobdir/archives/urischeme/<archivefiles>
    jobdir/file/urischeme/<files>

    the one in jars will be added to the classpath of all the tasks in order they were mentioned.
    the others will be copied once per job and symlinked from the current working directory of the task..

    comments?
    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Runping Qi (JIRA) at Mar 19, 2008 at 1:36 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580366#action_12580366 ]

    Runping Qi commented on HADOOP-1622:
    ------------------------------------

    Sounds good.

    A couple comments:

    It seems weird to have jar and -jar as arguments/option
    in the command line "hadoop jar -file <comma seperated files> -jar <comma seperated jars>"
    Will it be better to use "-classpath" instead?

    When the job dir changes to

    jobdir/jars/urischeme/<jarfiles>
    jobdir/archives/urischeme/<archivefiles>
    jobdir/file/urischeme/<files>


    will that break the current applications that assume their files loaded using -file and -archive options in the jobdir?

    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Mahadev konar (JIRA) at Mar 19, 2008 at 6:12 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580482#action_12580482 ]

    Mahadev konar commented on HADOOP-1622:
    ---------------------------------------

    ill leave the jar option to keep it backwards compatible. I dont want to break backwards compatibilty for users.

    - as for the job directory changes this is the directory structure in HDFS... the local job directory structure would not change.
    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Mahadev konar (JIRA) at Mar 19, 2008 at 6:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580483#action_12580483 ]

    Mahadev konar commented on HADOOP-1622:
    ---------------------------------------

    how about

    hadoop jar -file <..> -libjar <comma sep jars> -archives<comma seperated>

    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Devaraj Das (JIRA) at Mar 23, 2008 at 12:53 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581376#action_12581376 ]

    Devaraj Das commented on HADOOP-1622:
    -------------------------------------

    Do you think the dns resolution is going to be a big hit. I don't think so with dns caching in place, etc.
    Can Pipes make use of this feature (this patch doesn't support pipes). I am ok with having a separate issue to address pipes if required.
    Otherwise, the patch looks fine.
    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, HADOOP-1622_1.patch, HADOOP-1622_2.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 23, 2008 at 1:55 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581384#action_12581384 ]

    Hadoop QA commented on HADOOP-1622:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12378428/HADOOP-1622_2.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 16 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs -1. The patch appears to introduce 3 new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2030/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2030/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2030/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2030/console

    This message is automatically generated.
    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, HADOOP-1622_1.patch, HADOOP-1622_2.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Mahadev konar (JIRA) at Mar 24, 2008 at 9:33 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581677#action_12581677 ]

    Mahadev konar commented on HADOOP-1622:
    ---------------------------------------

    - I think it might not be a big overhead... I just wanted to avoid it since it would be a common utility and should be filed as a seperate jira ... (since finding out if two filesystems are the same seems like a nice thing to have). I wnated to keep this patch simple :) ..

    - I dont think pipes can make use of it ... Ill create another jira for that as well.

    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, HADOOP-1622_1.patch, HADOOP-1622_2.patch, HADOOP-1622_3.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 24, 2008 at 11:03 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581714#action_12581714 ]

    Hadoop QA commented on HADOOP-1622:
    -----------------------------------

    -1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12378510/HADOOP-1622_3.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 16 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs -1. The patch appears to introduce 1 new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2040/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2040/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2040/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2040/console

    This message is automatically generated.
    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, HADOOP-1622_1.patch, HADOOP-1622_2.patch, HADOOP-1622_3.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 25, 2008 at 2:37 am
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12581764#action_12581764 ]

    Hadoop QA commented on HADOOP-1622:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12378527/HADOOP-1622_4.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 16 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2042/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2042/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2042/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2042/console

    This message is automatically generated.
    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, HADOOP-1622_1.patch, HADOOP-1622_2.patch, HADOOP-1622_3.patch, HADOOP-1622_4.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 25, 2008 at 10:49 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582076#action_12582076 ]

    Hadoop QA commented on HADOOP-1622:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12378580/HADOOP-1622_5.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 16 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2053/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2053/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2053/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2053/console

    This message is automatically generated.
    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, HADOOP-1622_1.patch, HADOOP-1622_2.patch, HADOOP-1622_3.patch, HADOOP-1622_4.patch, HADOOP-1622_5.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at Mar 26, 2008 at 9:51 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582441#action_12582441 ]

    Hadoop QA commented on HADOOP-1622:
    -----------------------------------

    +1 overall. Here are the results of testing the latest attachment
    http://issues.apache.org/jira/secure/attachment/12378649/HADOOP-1622_6.patch
    against trunk revision 619744.

    @author +1. The patch does not contain any @author tags.

    tests included +1. The patch appears to include 16 new or modified tests.

    javadoc +1. The javadoc tool did not generate any warning messages.

    javac +1. The applied patch does not generate any new javac compiler warnings.

    release audit +1. The applied patch does not generate any new release audit warnings.

    findbugs +1. The patch does not introduce any new Findbugs warnings.

    core tests +1. The patch passed core unit tests.

    contrib tests +1. The patch passed contrib unit tests.

    Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2067/testReport/
    Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2067/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
    Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2067/artifact/trunk/build/test/checkstyle-errors.html
    Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2067/console

    This message is automatically generated.
    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, HADOOP-1622_1.patch, HADOOP-1622_2.patch, HADOOP-1622_3.patch, HADOOP-1622_4.patch, HADOOP-1622_5.patch, HADOOP-1622_6.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hudson (JIRA) at Mar 27, 2008 at 12:21 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582613#action_12582613 ]

    Hudson commented on HADOOP-1622:
    --------------------------------

    Integrated in Hadoop-trunk #443 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/443/])
    Hadoop should provide a way to allow the user to specify jar file(s) the user job depends on
    --------------------------------------------------------------------------------------------

    Key: HADOOP-1622
    URL: https://issues.apache.org/jira/browse/HADOOP-1622
    Project: Hadoop Core
    Issue Type: Improvement
    Components: mapred
    Reporter: Runping Qi
    Assignee: Mahadev konar
    Fix For: 0.17.0

    Attachments: hadoop-1622-4-20071008.patch, HADOOP-1622-5.patch, HADOOP-1622-6.patch, HADOOP-1622-7.patch, HADOOP-1622-8.patch, HADOOP-1622-9.patch, HADOOP-1622_1.patch, HADOOP-1622_2.patch, HADOOP-1622_3.patch, HADOOP-1622_4.patch, HADOOP-1622_5.patch, HADOOP-1622_6.patch, multipleJobJars.patch, multipleJobResources.patch, multipleJobResources2.patch


    More likely than not, a user's job may depend on multiple jars.
    Right now, when submitting a job through bin/hadoop, there is no way for the user to specify that.
    A walk around for that is to re-package all the dependent jars into a new jar or put the dependent jar files in the lib dir of the new jar.
    This walk around causes unnecessary inconvenience to the user. Furthermore, if the user does not own the main function
    (like the case when the user uses Aggregate, or datajoin, streaming), the user has to re-package those system jar files too.
    It is much desired that hadoop provides a clean and simple way for the user to specify a list of dependent jar files at the time
    of job submission. Someting like:
    bin/hadoop .... --depending_jars j1.jar:j2.jar
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 7, '08 at 9:53p
activeMar 27, '08 at 12:21p
posts20
users2
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase