FAQ
I can not for the life of me figure out how to get the libs for
Pentaho to load in a map/reduce job. When I run a job is barfs with
class not found errors for various Kettle libs. I have added the
following export to

/etc/hadoop-0.20/conf.my_cluster/hadoop_env.sh
export HADOOP_TASKTRACKER_OPTS="-classpath /opt/pentaho/pentaho-
mapreduce/lib/*"

Added the following to /etc/hadoop-0.20/conf.my_cluster/mapred-
site.xml
<property>
<name>pentaho.kettle.home</name>
<value>/opt/pentaho/pentaho-mapreduce</value>
</property>

<property>
<name>pentaho.kettle.plugins.dir</name>
<value>/opt/pentaho/pentaho-mapreduce/plugins</value>
</property>

Made sure the directory is r/w by the mapred user and the other Kettle
prere's are in place.

Thanks,
Chris

Search Discussions

  • Harsh J at Mar 31, 2012 at 10:09 am
    Hey Chris,

    The ideal way would be to get Pentaho (or its community) to provide
    CDH-friendly symlinking RPM packages which install at their own
    location and also symlink the libs into $HADOOP_HOME/lib/ when
    installed (Like Todd's hadoop-lzo-packager at
    github.com/toddlipcon/hadoop-lzo-packager).

    But otherwise, use this via /etc/hadoop-0.20/conf.my_cluster/hadoop_env.sh:

    export HADOOP_CLASSPATH=/path/to/your/addon/libs/:$HADOOP_CLASSPATH

    However, if your goal is to provide these libs for just a few jobs,
    simply follow this instead:
    http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
    On Sat, Mar 31, 2012 at 5:11 AM, ChrisK wrote:
    I can not for the life of me figure out how to get the libs for
    Pentaho to load in a map/reduce job. When I run a job is barfs with
    class not found errors for various Kettle libs. I have added the
    following export to

    /etc/hadoop-0.20/conf.my_cluster/hadoop_env.sh
    export HADOOP_TASKTRACKER_OPTS="-classpath /opt/pentaho/pentaho-
    mapreduce/lib/*"

    Added the following to /etc/hadoop-0.20/conf.my_cluster/mapred-
    site.xml
    <property>
    <name>pentaho.kettle.home</name>
    <value>/opt/pentaho/pentaho-mapreduce</value>
    </property>

    <property>
    <name>pentaho.kettle.plugins.dir</name>
    <value>/opt/pentaho/pentaho-mapreduce/plugins</value>
    </property>

    Made sure the directory is r/w by the mapred user and the other Kettle
    prere's are in place.

    Thanks,
    Chris


    --
    Harsh J
  • ChrisK at Mar 31, 2012 at 5:31 pm
    Harsh,

    Thanks for the reply. What you suggested below is exactly what I did
    but the Kettle jobs still barf. I should have mentioned in original
    post that I am using Cloudera Manager free edition to manage the
    nodes. I am not sure how add-on libs are handled by Cloudera Manager
    since it seems to create the configs in /var/run/cloudera-scm-agent/
    process/XXX-mapreduce-TASKTRACKER and execute task tracker with that
    as the config directory. It seems to ignore any local settings on the
    nodes and I can't find anything in the Cloudera Manager GUI to set
    options.

    Thanks Again,
    Chris
    On Mar 31, 6:08 am, Harsh J wrote:
    Hey Chris,

    The ideal way would be to get Pentaho (or its community) to provide
    CDH-friendly symlinking RPM packages which install at their own
    location and also symlink the libs into $HADOOP_HOME/lib/ when
    installed (Like Todd's hadoop-lzo-packager at
    github.com/toddlipcon/hadoop-lzo-packager).

    But otherwise, use this via /etc/hadoop-0.20/conf.my_cluster/hadoop_env.sh:

    export HADOOP_CLASSPATH=/path/to/your/addon/libs/:$HADOOP_CLASSPATH

    However, if your goal is to provide these libs for just a few jobs,
    simply follow this instead:http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libra...








    On Sat, Mar 31, 2012 at 5:11 AM, ChrisK wrote:
    I can not for the life of me figure out how to get the libs for
    Pentaho to load in a map/reduce job. When I run a job is barfs with
    class not found errors for various Kettle libs. I have added the
    following export to
    /etc/hadoop-0.20/conf.my_cluster/hadoop_env.sh
    export HADOOP_TASKTRACKER_OPTS="-classpath /opt/pentaho/pentaho-
    mapreduce/lib/*"
    Added the following to /etc/hadoop-0.20/conf.my_cluster/mapred-
    site.xml
    <property>
    <name>pentaho.kettle.home</name>
    <value>/opt/pentaho/pentaho-mapreduce</value>
    </property>
    <property>
    <name>pentaho.kettle.plugins.dir</name>
    <value>/opt/pentaho/pentaho-mapreduce/plugins</value>
    </property>
    Made sure the directory is r/w by the mapred user and the other Kettle
    prere's are in place.
    Thanks,
    Chris
    --
    Harsh J
  • Jasper at Apr 1, 2012 at 10:13 am
    Hi Chris,

    Maybe it is not exactly what you want but when you add all the Pentaho jars
    in /usr/lib/hadoop/lib on all your nodes the Pentaho MR jobs will run just
    fine.

    I recognize your problems though when using all of the other suggestions in
    the Cloudera blog post. It seems there are issues with the TOOL and
    GENERICOPTIONSPARSER (-libjars argument) options because whatever I try I
    can't get them to work.


    Jasper

    Op zaterdag 31 maart 2012 01:41:12 UTC+2 schreef ChrisK het volgende:
    I can not for the life of me figure out how to get the libs for
    Pentaho to load in a map/reduce job. When I run a job is barfs with
    class not found errors for various Kettle libs. I have added the
    following export to

    /etc/hadoop-0.20/conf.my_cluster/hadoop_env.sh
    export HADOOP_TASKTRACKER_OPTS="-classpath /opt/pentaho/pentaho-
    mapreduce/lib/*"

    Added the following to /etc/hadoop-0.20/conf.my_cluster/mapred-
    site.xml
    <property>
    <name>pentaho.kettle.home</name>
    <value>/opt/pentaho/pentaho-mapreduce</value>
    </property>

    <property>
    <name>pentaho.kettle.plugins.dir</name>
    <value>/opt/pentaho/pentaho-mapreduce/plugins</value>
    </property>

    Made sure the directory is r/w by the mapred user and the other Kettle
    prere's are in place.

    Thanks,
    Chris

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcdh-user @
categorieshadoop
postedMar 30, '12 at 11:41p
activeApr 1, '12 at 10:13a
posts4
users3
websitecloudera.com
irc#hadoop

3 users in discussion

ChrisK: 2 posts Jasper: 1 post Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase