FAQ
Hi All

I have created a map reduce job and to run on it on the cluster, i have
bundled all jars(hadoop, hbase etc) into single jar which increases the size
of overall file. During the development process, i need to copy again and
again this complete file which is very time consuming so is there any way
that i just copy the program jar only and do not need to copy the lib files
again and again. i am using net beans to develop the program.

kindly let me know how to solve this issue?

Thanks

--
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Search Discussions

  • Mark Kerzner at Apr 4, 2011 at 3:17 pm
    Shuja,

    here is what I do in NB environment

    #!/bin/sh
    cd ../dist
    jar -xf Chapter1.jar
    jar -cmf META-INF/MANIFEST.MF ../Chapter3-for-Hadoop.jar *
    cd ../bin
    echo "Repackaged for Hadoop"

    and it does the job. I run it only when I want to build this jar.

    Mark
    On Mon, Apr 4, 2011 at 10:06 AM, Shuja Rehman wrote:

    Hi All

    I have created a map reduce job and to run on it on the cluster, i have
    bundled all jars(hadoop, hbase etc) into single jar which increases the
    size
    of overall file. During the development process, i need to copy again and
    again this complete file which is very time consuming so is there any way
    that i just copy the program jar only and do not need to copy the lib files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?

    Thanks

    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>
  • Mark Kerzner at Apr 4, 2011 at 3:19 pm
    That was for my book (chapter 1 attached, you may find other things useful),
    but you would substitute it with your project name.

    Mark
    On Mon, Apr 4, 2011 at 10:17 AM, Mark Kerzner wrote:

    Shuja,

    here is what I do in NB environment

    #!/bin/sh
    cd ../dist
    jar -xf Chapter1.jar
    jar -cmf META-INF/MANIFEST.MF ../Chapter3-for-Hadoop.jar *
    cd ../bin
    echo "Repackaged for Hadoop"

    and it does the job. I run it only when I want to build this jar.

    Mark
    On Mon, Apr 4, 2011 at 10:06 AM, Shuja Rehman wrote:

    Hi All

    I have created a map reduce job and to run on it on the cluster, i have
    bundled all jars(hadoop, hbase etc) into single jar which increases the
    size
    of overall file. During the development process, i need to copy again and
    again this complete file which is very time consuming so is there any way
    that i just copy the program jar only and do not need to copy the lib
    files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?

    Thanks

    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>
  • Miguel Costa at Apr 4, 2011 at 3:44 pm
    Hi,



    I need some help to a schema design on HBase.



    I have 5 dimensions (Time,Site,Referrer Keyword,Country).

    My row key is Site+Time.



    Now I want to answer some questions like what is the top Referrer by Keyword
    for a site on a Period of Time.

    Basically I want to cross all the dimensions that I have. And if I have 30
    dimensions?



    What is the best schema design.



    Please let me know if this isn't the right mailing list.



    Thank you for your time.



    Miguel
  • Ted Dunning at Apr 4, 2011 at 4:00 pm
    The hbase list would be more appropriate.

    See http://hbase.apache.org/mail-lists.html

    <http://hbase.apache.org/mail-lists.html>There is an active IRC channel, but
    your question fits the mailing list better so pop on over and I will give
    you some comments.

    In the meantime, take a look at OpenTSDB who are doing something very much
    like what you want to do.
    On Mon, Apr 4, 2011 at 8:43 AM, Miguel Costa wrote:

    Hi,



    I need some help to a schema design on HBase.



    I have 5 dimensions (Time,Site,Referrer Keyword,Country).

    My row key is Site+Time.



    Now I want to answer some questions like what is the top Referrer by
    Keyword for a site on a Period of Time.

    Basically I want to cross all the dimensions that I have. And if I have 3
    dimensions?



    What is the best schema design.



    Please let me know if this isn’t the right mailing list.



    Thank you for your time.



    Miguel










  • Mark Kerzner at Apr 4, 2011 at 4:40 pm
    Then it seems you want to do the opposite of what I have done in this
    script. I AM combining all the jars in one jar, and you already have that.

    Rather, you want to distribute only your app jar, and put the other ones in
    the lib folder on the server.

    I know that when you run a standard MR job, you only need to mention your
    jar, and the other Hadoop jars already come from the lib. In other words,
    you should be able to run it like this:

    hadoop jar your-jar parameters

    Since you are using Cloudera distro, this runs the following

    /usr/bin/hadoop-0.20

    which in turn runs this script

    #!/bin/sh
    export HADOOP_HOME=/usr/lib/hadoop-0.20
    exec /usr/lib/hadoop-0.20/bin/hadoop "$@"

    Since HADOOP_HOME is set, it knows that the libraries are in here

    /usr/lib/hadoop-0.20/lib/

    therefore, I think that if you put your additional libraries in the same
    folder, it should just pick them up.

    Sincerely,
    Mark

    On Mon, Apr 4, 2011 at 11:31 AM, Shuja Rehman wrote:

    hi,
    i do not understand it. can u take my explain it with my example?

    I have following jars in lib folder of dist created by netbeans
    (dist/lib/).

    commons-logging-1.1.1.jar
    guava-r07.jar
    hadoop-0.20.2+737-core.jar
    hbase.jar
    hbase-0.89.20100924+28.jar
    log4j-1.2.15.jar
    mysql-connector-java-5.1.7-bin.jar
    UIDataTransporter.jar
    zookeeper.jar

    and dist folder contains only

    MyProgram.jar


    at the moment, i am combining all jars files to produce the single file.
    but now i want to just put the dist/lib/ *.jars for once on server and only
    MyProgram.jar should be copied everytime i change the code.

    so can u transfer ur code according to my example???
    Thanks



    On Mon, Apr 4, 2011 at 8:17 PM, Mark Kerzner wrote:

    Shuja,

    here is what I do in NB environment

    #!/bin/sh
    cd ../dist
    jar -xf Chapter1.jar
    jar -cmf META-INF/MANIFEST.MF ../Chapter3-for-Hadoop.jar *
    cd ../bin
    echo "Repackaged for Hadoop"

    and it does the job. I run it only when I want to build this jar.

    Mark
    On Mon, Apr 4, 2011 at 10:06 AM, Shuja Rehman wrote:

    Hi All

    I have created a map reduce job and to run on it on the cluster, i have
    bundled all jars(hadoop, hbase etc) into single jar which increases the
    size
    of overall file. During the development process, i need to copy again and
    again this complete file which is very time consuming so is there any way
    that i just copy the program jar only and do not need to copy the lib
    files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?

    Thanks

    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>

    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>
  • Allen Wittenauer at Apr 4, 2011 at 5:04 pm

    On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:

    Hi All

    I have created a map reduce job and to run on it on the cluster, i have
    bundled all jars(hadoop, hbase etc) into single jar which increases the size
    of overall file. During the development process, i need to copy again and
    again this complete file which is very time consuming so is there any way
    that i just copy the program jar only and do not need to copy the lib files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?
    This was in the FAQ, but in a non-obvious place. I've updated it to be more visible (hopefully):

    http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
  • Marco Didonna at Apr 4, 2011 at 5:17 pm

    On 04/04/2011 07:06 PM, Allen Wittenauer wrote:
    On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:

    Hi All

    I have created a map reduce job and to run on it on the cluster, i have
    bundled all jars(hadoop, hbase etc) into single jar which increases the size
    of overall file. During the development process, i need to copy again and
    again this complete file which is very time consuming so is there any way
    that i just copy the program jar only and do not need to copy the lib files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?
    This was in the FAQ, but in a non-obvious place. I've updated it to be more visible (hopefully):

    http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
    Does the same apply to jar containing libraries? Let's suppose I need
    lucene-core.jar to run my project. Can I put my this jar into my job jar
    and have hadoop "see" lucene's classes? Or should I use distributed cache??

    MD
  • Mark Kerzner at Apr 4, 2011 at 5:20 pm
    I think you can put them either in your jar or in distributed cache.

    As Allen pointed out, my idea of putting them into hadoop lib jar was wrong.

    Mark
    On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna wrote:
    On 04/04/2011 07:06 PM, Allen Wittenauer wrote:


    On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:

    Hi All
    I have created a map reduce job and to run on it on the cluster, i have
    bundled all jars(hadoop, hbase etc) into single jar which increases the
    size
    of overall file. During the development process, i need to copy again and
    again this complete file which is very time consuming so is there any way
    that i just copy the program jar only and do not need to copy the lib
    files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?
    This was in the FAQ, but in a non-obvious place. I've updated it
    to be more visible (hopefully):


    http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
    Does the same apply to jar containing libraries? Let's suppose I need
    lucene-core.jar to run my project. Can I put my this jar into my job jar and
    have hadoop "see" lucene's classes? Or should I use distributed cache??

    MD
  • Shuja Rehman at Apr 4, 2011 at 6:32 pm
    well...i think to put in distributed cache is good idea. do u have any
    working example how to put extra jars in distributed cache and how to make
    available these jars for job?
    Thanks
    On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner wrote:

    I think you can put them either in your jar or in distributed cache.

    As Allen pointed out, my idea of putting them into hadoop lib jar was
    wrong.

    Mark

    On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <m.didonna86@gmail.com
    wrote:
    On 04/04/2011 07:06 PM, Allen Wittenauer wrote:


    On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:

    Hi All
    I have created a map reduce job and to run on it on the cluster, i have
    bundled all jars(hadoop, hbase etc) into single jar which increases the
    size
    of overall file. During the development process, i need to copy again
    and
    again this complete file which is very time consuming so is there any
    way
    that i just copy the program jar only and do not need to copy the lib
    files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?
    This was in the FAQ, but in a non-obvious place. I've updated it
    to be more visible (hopefully):

    http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
    Does the same apply to jar containing libraries? Let's suppose I need
    lucene-core.jar to run my project. Can I put my this jar into my job jar and
    have hadoop "see" lucene's classes? Or should I use distributed cache??

    MD


    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>
  • James Seigel at Apr 4, 2011 at 6:40 pm
    James’ quick and dirty, get your job running guideline:

    -libjars <-- for jars you want accessible by the mappers and reducers
    classpath or bundled in the main jar <-- for jars you want accessible to the runner

    Cheers
    James.


    On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:

    well...i think to put in distributed cache is good idea. do u have any
    working example how to put extra jars in distributed cache and how to make
    available these jars for job?
    Thanks
    On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner wrote:

    I think you can put them either in your jar or in distributed cache.

    As Allen pointed out, my idea of putting them into hadoop lib jar was
    wrong.

    Mark

    On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <m.didonna86@gmail.com
    wrote:
    On 04/04/2011 07:06 PM, Allen Wittenauer wrote:


    On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:

    Hi All
    I have created a map reduce job and to run on it on the cluster, i have
    bundled all jars(hadoop, hbase etc) into single jar which increases the
    size
    of overall file. During the development process, i need to copy again
    and
    again this complete file which is very time consuming so is there any
    way
    that i just copy the program jar only and do not need to copy the lib
    files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?
    This was in the FAQ, but in a non-obvious place. I've updated it
    to be more visible (hopefully):

    http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
    Does the same apply to jar containing libraries? Let's suppose I need
    lucene-core.jar to run my project. Can I put my this jar into my job jar and
    have hadoop "see" lucene's classes? Or should I use distributed cache??

    MD


    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>
  • Bill Graham at Apr 4, 2011 at 9:00 pm
    Shuja, I haven't tried this, but from what I've read it seems you
    could just add all your jars required by the Mapper and Reducer to
    HDFS and then add them to the classpath in your run() method like
    this:

    DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);

    I think that's all there is to it, but like I said, I haven't tried
    it. Just be sure your run() method isn't in the same class as your
    mapper/reducer if they import packages from any of the distributed
    cache jars.

    On Mon, Apr 4, 2011 at 11:40 AM, James Seigel wrote:
    James’ quick and dirty, get your job running guideline:

    -libjars <-- for jars you want accessible by the mappers and reducers
    classpath or bundled in the main jar <-- for jars you want accessible to the runner

    Cheers
    James.


    On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:

    well...i think to put in distributed cache is good idea. do u have any
    working example how to put extra jars in distributed cache and how to make
    available these jars for job?
    Thanks
    On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner wrote:

    I think you can put them either in your jar or in distributed cache.

    As Allen pointed out, my idea of putting them into hadoop lib jar was
    wrong.

    Mark

    On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <m.didonna86@gmail.com
    wrote:
    On 04/04/2011 07:06 PM, Allen Wittenauer wrote:


    On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:

    Hi All
    I have created a map reduce job and to run on it on the cluster, i have
    bundled all jars(hadoop, hbase etc) into single jar which increases the
    size
    of overall file. During the development process, i need to copy again
    and
    again this complete file which is very time consuming so is there any
    way
    that i just copy the program jar only and do not need to copy the lib
    files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?
    This was in the FAQ, but in a non-obvious place.  I've updated it
    to be more visible (hopefully):

    http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
    Does the same apply to jar containing libraries? Let's suppose I need
    lucene-core.jar to run my project. Can I put my this jar into my job jar and
    have hadoop "see" lucene's classes? Or should I use distributed cache??

    MD


    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>
  • Shuja Rehman at Apr 6, 2011 at 8:44 am
    -libjars is not working nor distributed cache, any other
    solution??????????????????????????????????????????
    On Mon, Apr 4, 2011 at 11:40 PM, James Seigel wrote:

    James’ quick and dirty, get your job running guideline:

    -libjars <-- for jars you want accessible by the mappers and reducers
    classpath or bundled in the main jar <-- for jars you want accessible to
    the runner

    Cheers
    James.


    On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:

    well...i think to put in distributed cache is good idea. do u have any
    working example how to put extra jars in distributed cache and how to make
    available these jars for job?
    Thanks
    On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner wrote:

    I think you can put them either in your jar or in distributed cache.

    As Allen pointed out, my idea of putting them into hadoop lib jar was
    wrong.

    Mark

    On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <m.didonna86@gmail.com
    wrote:
    On 04/04/2011 07:06 PM, Allen Wittenauer wrote:


    On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:

    Hi All
    I have created a map reduce job and to run on it on the cluster, i
    have
    bundled all jars(hadoop, hbase etc) into single jar which increases
    the
    size
    of overall file. During the development process, i need to copy again
    and
    again this complete file which is very time consuming so is there any
    way
    that i just copy the program jar only and do not need to copy the lib
    files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?
    This was in the FAQ, but in a non-obvious place. I've updated
    it
    to be more visible (hopefully):

    http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
    Does the same apply to jar containing libraries? Let's suppose I need
    lucene-core.jar to run my project. Can I put my this jar into my job
    jar
    and
    have hadoop "see" lucene's classes? Or should I use distributed cache??

    MD


    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>

    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>
  • Bill Graham at Apr 6, 2011 at 3:30 pm
    If you could share more specifics regarding just how it's not working
    (i.e., job specifics, stack traces, how you're invoking it, etc), you
    might get more assistance in troubleshooting.

    On Wed, Apr 6, 2011 at 1:44 AM, Shuja Rehman wrote:
    -libjars is not working nor distributed cache, any other
    solution??????????????????????????????????????????
    On Mon, Apr 4, 2011 at 11:40 PM, James Seigel wrote:

    James’ quick and dirty, get your job running guideline:

    -libjars <-- for jars you want accessible by the mappers and reducers
    classpath or bundled in the main jar <-- for jars you want accessible to
    the runner

    Cheers
    James.


    On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:

    well...i think to put in distributed cache is good idea. do u have any
    working example how to put extra jars in distributed cache and how to make
    available these jars for job?
    Thanks

    On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner <markkerzner@gmail.com>
    wrote:
    I think you can put them either in your jar or in distributed cache.

    As Allen pointed out, my idea of putting them into hadoop lib jar was
    wrong.

    Mark

    On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <m.didonna86@gmail.com
    wrote:
    On 04/04/2011 07:06 PM, Allen Wittenauer wrote:


    On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:

    Hi All
    I have created a map reduce job and to run on it on the cluster, i
    have
    bundled all jars(hadoop, hbase etc) into single jar which increases
    the
    size
    of overall file. During the development process, i need to copy again
    and
    again this complete file which is very time consuming so is there any
    way
    that i just copy the program jar only and do not need to copy the lib
    files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?
    This was in the FAQ, but in a non-obvious place.  I've updated
    it
    to be more visible (hopefully):

    http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
    Does the same apply to jar containing libraries? Let's suppose I need
    lucene-core.jar to run my project. Can I put my this jar into my job
    jar
    and
    have hadoop "see" lucene's classes? Or should I use distributed cache??

    MD


    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>

    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>
  • Shuja Rehman at Apr 6, 2011 at 6:32 pm
    i am using the following command

    *hadoop jar myjar.jar -libjars /home/shuja/lib/mylib.jar param1 param2
    param3*

    but the program still giving the error and does not find the mylib.jar. can
    u confirm the syntax of command?
    thnx


    On Wed, Apr 6, 2011 at 8:29 PM, Bill Graham wrote:

    If you could share more specifics regarding just how it's not working
    (i.e., job specifics, stack traces, how you're invoking it, etc), you
    might get more assistance in troubleshooting.

    On Wed, Apr 6, 2011 at 1:44 AM, Shuja Rehman wrote:
    -libjars is not working nor distributed cache, any other
    solution??????????????????????????????????????????
    On Mon, Apr 4, 2011 at 11:40 PM, James Seigel wrote:

    James’ quick and dirty, get your job running guideline:

    -libjars <-- for jars you want accessible by the mappers and reducers
    classpath or bundled in the main jar <-- for jars you want accessible to
    the runner

    Cheers
    James.


    On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:

    well...i think to put in distributed cache is good idea. do u have any
    working example how to put extra jars in distributed cache and how to make
    available these jars for job?
    Thanks

    On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner <markkerzner@gmail.com>
    wrote:
    I think you can put them either in your jar or in distributed cache.

    As Allen pointed out, my idea of putting them into hadoop lib jar was
    wrong.

    Mark

    On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna <
    m.didonna86@gmail.com
    wrote:
    On 04/04/2011 07:06 PM, Allen Wittenauer wrote:


    On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:

    Hi All
    I have created a map reduce job and to run on it on the cluster, i
    have
    bundled all jars(hadoop, hbase etc) into single jar which
    increases
    the
    size
    of overall file. During the development process, i need to copy
    again
    and
    again this complete file which is very time consuming so is there
    any
    way
    that i just copy the program jar only and do not need to copy the
    lib
    files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?
    This was in the FAQ, but in a non-obvious place. I've
    updated
    it
    to be more visible (hopefully):

    http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
    Does the same apply to jar containing libraries? Let's suppose I
    need
    lucene-core.jar to run my project. Can I put my this jar into my job
    jar
    and
    have hadoop "see" lucene's classes? Or should I use distributed
    cache??
    MD


    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>

    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>


    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>
  • Bill Graham at Apr 6, 2011 at 8:18 pm
    You need to pass the mainClass after the jar:

    http://hadoop.apache.org/common/docs/r0.21.0/commands_manual.html#jar
    On Wed, Apr 6, 2011 at 11:31 AM, Shuja Rehman wrote:
    i am using the following command

    hadoop jar myjar.jar -libjars /home/shuja/lib/mylib.jar  param1 param2
    param3

    but the program still giving the error and does not find the mylib.jar. can
    u confirm the syntax of command?
    thnx


    On Wed, Apr 6, 2011 at 8:29 PM, Bill Graham wrote:

    If you could share more specifics regarding just how it's not working
    (i.e., job specifics, stack traces, how you're invoking it, etc), you
    might get more assistance in troubleshooting.


    On Wed, Apr 6, 2011 at 1:44 AM, Shuja Rehman <shujamughal@gmail.com>
    wrote:
    -libjars is not working nor distributed cache, any other
    solution??????????????????????????????????????????
    On Mon, Apr 4, 2011 at 11:40 PM, James Seigel wrote:

    James’ quick and dirty, get your job running guideline:

    -libjars <-- for jars you want accessible by the mappers and reducers
    classpath or bundled in the main jar <-- for jars you want accessible
    to
    the runner

    Cheers
    James.


    On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:

    well...i think to put in distributed cache is good idea. do u have
    any
    working example how to put extra jars in distributed cache and how to make
    available these jars for job?
    Thanks

    On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner <markkerzner@gmail.com>
    wrote:
    I think you can put them either in your jar or in distributed cache.

    As Allen pointed out, my idea of putting them into hadoop lib jar
    was
    wrong.

    Mark

    On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna
    <m.didonna86@gmail.com
    wrote:
    On 04/04/2011 07:06 PM, Allen Wittenauer wrote:


    On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:

    Hi All
    I have created a map reduce job and to run on it on the cluster,
    i
    have
    bundled all jars(hadoop, hbase etc) into single jar which
    increases
    the
    size
    of overall file. During the development process, i need to copy
    again
    and
    again this complete file which is very time consuming so is there
    any
    way
    that i just copy the program jar only and do not need to copy the
    lib
    files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?
    This was in the FAQ, but in a non-obvious place.  I've
    updated
    it
    to be more visible (hopefully):

    http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
    Does the same apply to jar containing libraries? Let's suppose I
    need
    lucene-core.jar to run my project. Can I put my this jar into my
    job
    jar
    and
    have hadoop "see" lucene's classes? Or should I use distributed
    cache??

    MD


    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>

    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>


    --
    Regards
    Shuja-ur-Rehman Baig

  • Guy Doulberg at Apr 7, 2011 at 7:27 am
    Or to set the Main class in the manifest of the Jar,



    -----Original Message-----
    From: Bill Graham
    Sent: Wednesday, April 06, 2011 11:17 PM
    To: Shuja Rehman
    Cc: common-user@hadoop.apache.org
    Subject: Re: Including Additional Jars

    You need to pass the mainClass after the jar:

    http://hadoop.apache.org/common/docs/r0.21.0/commands_manual.html#jar
    On Wed, Apr 6, 2011 at 11:31 AM, Shuja Rehman wrote:
    i am using the following command

    hadoop jar myjar.jar -libjars /home/shuja/lib/mylib.jar  param1 param2
    param3

    but the program still giving the error and does not find the mylib.jar. can
    u confirm the syntax of command?
    thnx


    On Wed, Apr 6, 2011 at 8:29 PM, Bill Graham wrote:

    If you could share more specifics regarding just how it's not working
    (i.e., job specifics, stack traces, how you're invoking it, etc), you
    might get more assistance in troubleshooting.


    On Wed, Apr 6, 2011 at 1:44 AM, Shuja Rehman <shujamughal@gmail.com>
    wrote:
    -libjars is not working nor distributed cache, any other
    solution??????????????????????????????????????????
    On Mon, Apr 4, 2011 at 11:40 PM, James Seigel wrote:

    James’ quick and dirty, get your job running guideline:

    -libjars <-- for jars you want accessible by the mappers and reducers
    classpath or bundled in the main jar <-- for jars you want accessible
    to
    the runner

    Cheers
    James.


    On 2011-04-04, at 12:31 PM, Shuja Rehman wrote:

    well...i think to put in distributed cache is good idea. do u have
    any
    working example how to put extra jars in distributed cache and how to make
    available these jars for job?
    Thanks

    On Mon, Apr 4, 2011 at 10:20 PM, Mark Kerzner <markkerzner@gmail.com>
    wrote:
    I think you can put them either in your jar or in distributed cache.

    As Allen pointed out, my idea of putting them into hadoop lib jar
    was
    wrong.

    Mark

    On Mon, Apr 4, 2011 at 12:16 PM, Marco Didonna
    <m.didonna86@gmail.com
    wrote:
    On 04/04/2011 07:06 PM, Allen Wittenauer wrote:


    On Apr 4, 2011, at 8:06 AM, Shuja Rehman wrote:

    Hi All
    I have created a map reduce job and to run on it on the cluster,
    i
    have
    bundled all jars(hadoop, hbase etc) into single jar which
    increases
    the
    size
    of overall file. During the development process, i need to copy
    again
    and
    again this complete file which is very time consuming so is there
    any
    way
    that i just copy the program jar only and do not need to copy the
    lib
    files
    again and again. i am using net beans to develop the program.

    kindly let me know how to solve this issue?
    This was in the FAQ, but in a non-obvious place.  I've
    updated
    it
    to be more visible (hopefully):

    http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F
    Does the same apply to jar containing libraries? Let's suppose I
    need
    lucene-core.jar to run my project. Can I put my this jar into my
    job
    jar
    and
    have hadoop "see" lucene's classes? Or should I use distributed
    cache??

    MD


    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>

    --
    Regards
    Shuja-ur-Rehman Baig
    <http://pk.linkedin.com/in/shujamughal>


    --
    Regards
    Shuja-ur-Rehman Baig

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedApr 4, '11 at 3:06p
activeApr 7, '11 at 7:27a
posts17
users9
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase