FAQ

[Hadoop-common-user] problems with start-all.sh

Keith Thompson
May 10, 2011 at 3:20 pm
I have installed hadoop-0.20.2 (using quick start guide) and mahout. I am
running OpenSuse Linux 11.1 (but am a newbie to Linux).
My JAVA_HOME is set to usr/java/jdk1.6.0_21. When I run bin/hadoop
start-all.sh I get the following error message:

Exception in thread "main" java.lang.NoClassDefFoundError: start-all/sh
Caused by: java.lang.ClassNotFoundException: start-all.sh
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
Could not find the main class: start-all.sh. Program will exit.
reply

Search Discussions

15 responses

  • Dieter Plaetinck at May 10, 2011 at 3:24 pm

    On Tue, 10 May 2011 11:20:15 -0400 Keith Thompson wrote:

    I have installed hadoop-0.20.2 (using quick start guide) and mahout.
    I am running OpenSuse Linux 11.1 (but am a newbie to Linux).
    My JAVA_HOME is set to usr/java/jdk1.6.0_21. When I run bin/hadoop
    start-all.sh I get the following error message:

    Exception in thread "main" java.lang.NoClassDefFoundError:
    start-all/sh Caused by: java.lang.ClassNotFoundException: start-all.sh
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at
    sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at
    java.lang.ClassLoader.loadClass(ClassLoader.java:248) Could not find
    the main class: start-all.sh. Program will exit.
    start-all.sh is a shell script, meant to be executed directly in your
    terminal. Not as a java program through bin/hadoop.

    Dieter
  • Luca Pireddu at May 10, 2011 at 3:25 pm
    Hi Keith,
    On May 10, 2011 17:20:15 Keith Thompson wrote:
    I have installed hadoop-0.20.2 (using quick start guide) and mahout. I am
    running OpenSuse Linux 11.1 (but am a newbie to Linux).
    My JAVA_HOME is set to usr/java/jdk1.6.0_21. When I run bin/hadoop
    start-all.sh I get the following error message:

    Exception in thread "main" java.lang.NoClassDefFoundError: start-all/sh
    Caused by: java.lang.ClassNotFoundException: start-all.sh
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    Could not find the main class: start-all.sh. Program will exit.
    Try running: bin/start-all.sh


    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
  • Keith Thompson at May 10, 2011 at 3:39 pm
    Hi Luca,

    Thank you. That worked ... at least I didn't get the same error. Now I
    get:

    k_thomp@linux-8awa:/usr/local/hadoop-0.20.2> sudo bin/start-all.sh
    starting namenode, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-linux-8awa.out
    cat: /usr/local/hadoop-0,20.2/conf/slaves: No such file or directory
    Password:
    localhost: starting secondarynamenode, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-linux-8awa.out
    localhost: Exception in thread "main" java.lang.NullPointerException
    localhost: at
    org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:134)
    localhost: at
    org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:156)
    localhost: at
    org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:160)
    localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:131)
    localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:115)
    localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:469)
    starting jobtracker, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-linux-8awa.out
    cat: /usr/local/hadoop-0,20.2/conf/slaves: No such file or directory
    On Tue, May 10, 2011 at 11:25 AM, Luca Pireddu wrote:

    Hi Keith,
    On May 10, 2011 17:20:15 Keith Thompson wrote:
    I have installed hadoop-0.20.2 (using quick start guide) and mahout. I am
    running OpenSuse Linux 11.1 (but am a newbie to Linux).
    My JAVA_HOME is set to usr/java/jdk1.6.0_21. When I run bin/hadoop
    start-all.sh I get the following error message:

    Exception in thread "main" java.lang.NoClassDefFoundError: start-all/sh
    Caused by: java.lang.ClassNotFoundException: start-all.sh
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    Could not find the main class: start-all.sh. Program will exit.
    Try running: bin/start-all.sh


    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
  • Luca Pireddu at May 10, 2011 at 3:47 pm

    On May 10, 2011 17:39:12 Keith Thompson wrote:
    Hi Luca,

    Thank you. That worked ... at least I didn't get the same error. Now I
    get:

    k_thomp@linux-8awa:/usr/local/hadoop-0.20.2> sudo bin/start-all.sh
    starting namenode, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-linux-8awa.out
    cat: /usr/local/hadoop-0,20.2/conf/slaves: No such file or directory
    Password:
    localhost: starting secondarynamenode, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-linux-8a
    wa.out localhost: Exception in thread "main" java.lang.NullPointerException
    localhost: at
    org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:134)
    localhost: at
    org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:15
    6) localhost: at
    org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:16
    0) localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(Seconda
    ryNameNode.java:131) localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNa
    meNode.java:115) localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryName
    Node.java:469) starting jobtracker, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-linux-8awa.out
    cat: /usr/local/hadoop-0,20.2/conf/slaves: No such file or directory
    Don't try to run it as root with "sudo". Just run it as your regular user.
    If you try to run it as a different user then you'll have to set up the ssh
    keys for that user (notice the "Password" prompt because ssh was unable to
    perform a password-less login into localhost).

    Also, make sure you've correctly set HADOOP_HOME to the path where you
    extracted the Hadoop archive. I'm seeing a comma in the path shown in the
    error ("/usr/local/hadoop-0,20.2/conf/slaves") that probably shouldn't be
    there :-)


    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
  • Keith Thompson at May 10, 2011 at 3:54 pm
    Thanks for catching that comma. It was actually my HADOOP_CONF_DIR rather
    than HADOOP_HOME that was the culprit. :)
    As for sudo ... I am not sure how to run it as a regular user. I set up ssh
    for a passwordless login (and am able to ssh localhost without password) but
    I installed hadoop to /usr/local so every time I try to run it, it says
    permission denied. So, I have to run hadoop using sudo (and it prompts for
    password as super user). I should have installed hadoop to my home
    directory instead I guess ... :/
    On Tue, May 10, 2011 at 11:47 AM, Luca Pireddu wrote:
    On May 10, 2011 17:39:12 Keith Thompson wrote:
    Hi Luca,

    Thank you. That worked ... at least I didn't get the same error. Now I
    get:

    k_thomp@linux-8awa:/usr/local/hadoop-0.20.2> sudo bin/start-all.sh
    starting namenode, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-linux-8awa.out
    cat: /usr/local/hadoop-0,20.2/conf/slaves: No such file or directory
    Password:
    localhost: starting secondarynamenode, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-linux-8a
    wa.out localhost: Exception in thread "main"
    java.lang.NullPointerException
    localhost: at
    org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:134)
    localhost: at
    org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:15
    6) localhost: at
    org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:16
    0) localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(Seconda
    ryNameNode.java:131) localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNa
    meNode.java:115) localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryName
    Node.java:469) starting jobtracker, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-linux-8awa.out
    cat: /usr/local/hadoop-0,20.2/conf/slaves: No such file or directory
    Don't try to run it as root with "sudo". Just run it as your regular user.
    If you try to run it as a different user then you'll have to set up the ssh
    keys for that user (notice the "Password" prompt because ssh was unable to
    perform a password-less login into localhost).

    Also, make sure you've correctly set HADOOP_HOME to the path where you
    extracted the Hadoop archive. I'm seeing a comma in the path shown in the
    error ("/usr/local/hadoop-0,20.2/conf/slaves") that probably shouldn't be
    there :-)


    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
  • GOEKE, MATTHEW [AG/1000] at May 10, 2011 at 4:03 pm
    Keith if you have a chance you might want to look at Hadoop: The
    Definitive guide or other various faqs around for rolling a cluster from
    tarball. One thing that most recommend is to setup a hadoop user and
    then to chown all of the files / directories it needs over to it. Right
    now what you are running into is that you have not chown'ed the folder
    to your user or correctly chmod'ed the directories.

    User / Group permissions will be become increasingly important when you
    move into DFS setup so it is important to get the core setup correctly.

    Matt

    -----Original Message-----
    From: Keith Thompson
    Sent: Tuesday, May 10, 2011 10:54 AM
    To: common-user@hadoop.apache.org
    Subject: Re: problems with start-all.sh

    Thanks for catching that comma. It was actually my HADOOP_CONF_DIR
    rather
    than HADOOP_HOME that was the culprit. :)
    As for sudo ... I am not sure how to run it as a regular user. I set up
    ssh
    for a passwordless login (and am able to ssh localhost without password)
    but
    I installed hadoop to /usr/local so every time I try to run it, it says
    permission denied. So, I have to run hadoop using sudo (and it prompts
    for
    password as super user). I should have installed hadoop to my home
    directory instead I guess ... :/
    On Tue, May 10, 2011 at 11:47 AM, Luca Pireddu wrote:
    On May 10, 2011 17:39:12 Keith Thompson wrote:
    Hi Luca,

    Thank you. That worked ... at least I didn't get the same error.
    Now I
    get:

    k_thomp@linux-8awa:/usr/local/hadoop-0.20.2> sudo bin/start-all.sh
    starting namenode, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-linux-8awa.out
    cat: /usr/local/hadoop-0,20.2/conf/slaves: No such file or directory
    Password:
    localhost: starting secondarynamenode, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-linux
    -8a
    wa.out localhost: Exception in thread "main"
    java.lang.NullPointerException
    localhost: at
    org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:134)
    localhost: at
    org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java
    :15
    6) localhost: at
    org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java
    :16
    0) localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(Seco
    nda
    ryNameNode.java:131) localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(Secondar
    yNa
    meNode.java:115) localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryN
    ame
    Node.java:469) starting jobtracker, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-linux-8awa.o
    ut
    cat: /usr/local/hadoop-0,20.2/conf/slaves: No such file or directory
    Don't try to run it as root with "sudo". Just run it as your regular user.
    If you try to run it as a different user then you'll have to set up the ssh
    keys for that user (notice the "Password" prompt because ssh was unable to
    perform a password-less login into localhost).

    Also, make sure you've correctly set HADOOP_HOME to the path where you
    extracted the Hadoop archive. I'm seeing a comma in the path shown in the
    error ("/usr/local/hadoop-0,20.2/conf/slaves") that probably shouldn't be
    there :-)


    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
    This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
    to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
    all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

    All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
    subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
    Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
    this e-mail or any attachment.


    The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
    including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
    Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all
    applicable U.S. export laws and regulations.
  • Keith Thompson at May 10, 2011 at 4:11 pm
    Thanks Matt. That makes sense. I will read up on those topics.
    On Tue, May 10, 2011 at 12:02 PM, GOEKE, MATTHEW [AG/1000] wrote:

    Keith if you have a chance you might want to look at Hadoop: The
    Definitive guide or other various faqs around for rolling a cluster from
    tarball. One thing that most recommend is to setup a hadoop user and
    then to chown all of the files / directories it needs over to it. Right
    now what you are running into is that you have not chown'ed the folder
    to your user or correctly chmod'ed the directories.

    User / Group permissions will be become increasingly important when you
    move into DFS setup so it is important to get the core setup correctly.

    Matt

    -----Original Message-----
    From: Keith Thompson
    Sent: Tuesday, May 10, 2011 10:54 AM
    To: common-user@hadoop.apache.org
    Subject: Re: problems with start-all.sh

    Thanks for catching that comma. It was actually my HADOOP_CONF_DIR
    rather
    than HADOOP_HOME that was the culprit. :)
    As for sudo ... I am not sure how to run it as a regular user. I set up
    ssh
    for a passwordless login (and am able to ssh localhost without password)
    but
    I installed hadoop to /usr/local so every time I try to run it, it says
    permission denied. So, I have to run hadoop using sudo (and it prompts
    for
    password as super user). I should have installed hadoop to my home
    directory instead I guess ... :/
    On Tue, May 10, 2011 at 11:47 AM, Luca Pireddu wrote:
    On May 10, 2011 17:39:12 Keith Thompson wrote:
    Hi Luca,

    Thank you. That worked ... at least I didn't get the same error.
    Now I
    get:

    k_thomp@linux-8awa:/usr/local/hadoop-0.20.2> sudo bin/start-all.sh
    starting namenode, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-linux-8awa.out
    cat: /usr/local/hadoop-0,20.2/conf/slaves: No such file or directory
    Password:
    localhost: starting secondarynamenode, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-linux
    -8a
    wa.out localhost: Exception in thread "main"
    java.lang.NullPointerException
    localhost: at
    org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:134)
    localhost: at
    org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java
    :15
    6) localhost: at
    org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java
    :16
    0) localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(Seco
    nda
    ryNameNode.java:131) localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(Secondar
    yNa
    meNode.java:115) localhost: at
    org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryN
    ame
    Node.java:469) starting jobtracker, logging to
    /usr/local/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-linux-8awa.o
    ut
    cat: /usr/local/hadoop-0,20.2/conf/slaves: No such file or directory
    Don't try to run it as root with "sudo". Just run it as your regular user.
    If you try to run it as a different user then you'll have to set up the ssh
    keys for that user (notice the "Password" prompt because ssh was unable to
    perform a password-less login into localhost).

    Also, make sure you've correctly set HADOOP_HOME to the path where you
    extracted the Hadoop archive. I'm seeing a comma in the path shown in the
    error ("/usr/local/hadoop-0,20.2/conf/slaves") that probably shouldn't be
    there :-)


    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
    This e-mail message may contain privileged and/or confidential information,
    and is intended to be received only by persons entitled
    to receive such information. If you have received this e-mail in error,
    please notify the sender immediately. Please delete it and
    all attachments from any servers, hard drives or any other media. Other use
    of this e-mail by you is strictly prohibited.

    All e-mails and attachments sent and received are subject to monitoring,
    reading and archival by Monsanto, including its
    subsidiaries. The recipient of this e-mail is solely responsible for
    checking for the presence of "Viruses" or other "Malware".
    Monsanto, along with its subsidiaries, accepts no liability for any damage
    caused by any such code transmitted by or accompanying
    this e-mail or any attachment.


    The information contained in this email may be subject to the export
    control laws and regulations of the United States, potentially
    including but not limited to the Export Administration Regulations (EAR)
    and sanctions regulations issued by the U.S. Department of
    Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this
    information you are obligated to comply with all
    applicable U.S. export laws and regulations.
  • Luca Pireddu at May 10, 2011 at 4:08 pm

    On May 10, 2011 17:54:27 Keith Thompson wrote:
    Thanks for catching that comma. It was actually my HADOOP_CONF_DIR rather
    than HADOOP_HOME that was the culprit. :)
    As for sudo ... I am not sure how to run it as a regular user. I set up
    ssh for a passwordless login (and am able to ssh localhost without
    password) but I installed hadoop to /usr/local so every time I try to run
    it, it says permission denied. So, I have to run hadoop using sudo (and it
    prompts for password as super user). I should have installed hadoop to my
    home directory instead I guess ... :/
    I'd say, for running tests with a pseudo-cluster on a single machine it would
    be easiest for you to extract the archive somewhere in your home directory, as
    your regular user.


    If you extracted the archive as root into /usr/local, then your regular user
    is probably missing read and execute permissions on various files and
    directories. With the default configuration, the hadoop user also needs to
    have write permission to the logs directory.

    I'd say it seems reasonable to delay those concerns for when you'll run on
    more than one machine.

    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
  • Keith Thompson at May 10, 2011 at 4:17 pm
    Yes, that does seem easier. Perhaps I will go back and extract to my home
    directory. Is there a simple way to uninstall the version in my root
    directory? Note: I also installed Maven and Mahout there as well.
    /usr/local/hadoop-0.20.2
    /usr/local/apache-maven-2.2.1
    /usr/local/trunk/bin/mahout.sh ... might not have installed this one
    correctly
    On Tue, May 10, 2011 at 12:08 PM, Luca Pireddu wrote:
    On May 10, 2011 17:54:27 Keith Thompson wrote:
    Thanks for catching that comma. It was actually my HADOOP_CONF_DIR rather
    than HADOOP_HOME that was the culprit. :)
    As for sudo ... I am not sure how to run it as a regular user. I set up
    ssh for a passwordless login (and am able to ssh localhost without
    password) but I installed hadoop to /usr/local so every time I try to run
    it, it says permission denied. So, I have to run hadoop using sudo (and it
    prompts for password as super user). I should have installed hadoop to my
    home directory instead I guess ... :/
    I'd say, for running tests with a pseudo-cluster on a single machine it
    would
    be easiest for you to extract the archive somewhere in your home directory,
    as
    your regular user.


    If you extracted the archive as root into /usr/local, then your regular
    user
    is probably missing read and execute permissions on various files and
    directories. With the default configuration, the hadoop user also needs to
    have write permission to the logs directory.

    I'd say it seems reasonable to delay those concerns for when you'll run on
    more than one machine.

    --
    Luca Pireddu
    CRS4 - Distributed Computing Group
    Loc. Pixina Manna Edificio 1
    Pula 09010 (CA), Italy
    Tel: +39 0709250452
  • Gang Luo at May 10, 2011 at 4:57 pm
    Hi,

    I was confused by the configuration and file system in hadoop. when we create a
    FileSystem object and read/write something through it, are we writing to or
    reading from HDFS? Could it be local file system? If yes, what determines which
    file system it is? Configuration object we used to create the FileSystem object?
    When I write mapreduce program which "extends Configured implements Tool", I can
    get the right configuration by calling getConf() and use the FileSystem object
    to communicate with HDFS. What if I want to read/write HDFS in a separate
    Utility class? Where does the configuration come from?

    Thanks.


    -Gang
  • Allen Wittenauer at May 10, 2011 at 5:46 pm

    On May 10, 2011, at 9:57 AM, Gang Luo wrote:
    I was confused by the configuration and file system in hadoop. when we create a
    FileSystem object and read/write something through it, are we writing to or
    reading from HDFS?
    Typically, yes.
    Could it be local file system?
    Yes.

    If yes, what determines which
    file system it is? Configuration object we used to create the FileSystem object? Yes.
    When I write mapreduce program which "extends Configured implements Tool", I can
    get the right configuration by calling getConf() and use the FileSystem object
    to communicate with HDFS. What if I want to read/write HDFS in a separate
    Utility class? Where does the configuration come from?
    You need to supply it.
  • Chris Stier at May 10, 2011 at 5:49 pm
    Hadoop newbie here,
    I have a few of the same questions that Gang has. I have the single
    node configuration installed but every time I restart my computer I
    loose my name node because it's location is defaulted to /tmp. I'm
    using 0.21.0. Is there a write up some where that shows what config
    files to change so that the it's not using temp?
    Thanks!
    JC
  • Hadoopman at May 10, 2011 at 5:56 pm
    When we load data into hive sometimes we've run into situations where
    the load fails and the logs show a heap out of memory error. If I load
    just a few days (or months) of data then no problem. But then if I try
    to load two years (for example) of data then I've seen it fail. Not
    with every feed but certain ones.

    Sometimes I've been able to split the data and get it to load. An
    example of one type of feed I'm working on is the apache web server
    access logs. Generally it works. But there are times when I need to
    load more than a few months of data and get the memory heap errors in
    the task logs.

    Generally how do people load their data into Hive? We have a process
    where we first copy it to hdfs then from there we run a staging process
    to get it into hive. Once that completes we perform a union all then
    overwrite table partition. Usually it's during the union all stage that
    we see these errors appear.

    Also is there a log which tells you which log it fails on? I can see
    which task/job failed but not finding which file it's complaining
    about. I figure that might help a bit..

    Thanks!
  • Amit jaiswal at May 11, 2011 at 6:07 am
    Hi,

    What is the meaning of 'union' over here. Is there any hadoop job with 1 (or few) reducer that combines all data together. Have you tried external (dynamic) partitions for combining data?

    -amit


    ----- Original Message -----
    From: hadoopman <hadoopman@gmail.com>
    To: common-user@hadoop.apache.org
    Cc:
    Sent: Tuesday, 10 May 2011 11:26 PM
    Subject: hadoop/hive data loading

    When we load data into hive sometimes we've run into situations where the load fails and the logs show a heap out of memory error.  If I load just a few days (or months) of data then no problem.  But then if I try to load two years (for example) of data then I've seen it fail.  Not with every feed but certain ones.

    Sometimes I've been able to split the data and get it to load.  An example of one type of feed I'm working on is the apache web server access logs.  Generally it works.  But there are times when I need to load more than a few months of data and get the memory heap errors in the task logs.

    Generally how do people load their data into Hive?  We have a process where we first copy it to hdfs then from there we run a staging process to get it into hive.  Once that completes we perform a union all then overwrite table partition.  Usually it's during the union all stage that we see these errors appear.

    Also is there a log which tells you which log it fails on?  I can see which task/job failed but not finding which file it's complaining about.  I figure that might help a bit..

    Thanks!
  • Fei Pan at May 12, 2011 at 9:12 am
    hi,hadoopman

    you can put the large data into your hdfs using "hadoop fs -put src dest"
    and then you can use "alter table xxx add partition(xxxxx) location 'desc'"



    2011/5/11 amit jaiswal <amit_jus@yahoo.com>
    Hi,

    What is the meaning of 'union' over here. Is there any hadoop job with 1
    (or few) reducer that combines all data together. Have you tried external
    (dynamic) partitions for combining data?

    -amit


    ----- Original Message -----
    From: hadoopman <hadoopman@gmail.com>
    To: common-user@hadoop.apache.org
    Cc:
    Sent: Tuesday, 10 May 2011 11:26 PM
    Subject: hadoop/hive data loading

    When we load data into hive sometimes we've run into situations where the
    load fails and the logs show a heap out of memory error. If I load just a
    few days (or months) of data then no problem. But then if I try to load two
    years (for example) of data then I've seen it fail. Not with every feed but
    certain ones.

    Sometimes I've been able to split the data and get it to load. An example
    of one type of feed I'm working on is the apache web server access logs.
    Generally it works. But there are times when I need to load more than a few
    months of data and get the memory heap errors in the task logs.

    Generally how do people load their data into Hive? We have a process where
    we first copy it to hdfs then from there we run a staging process to get it
    into hive. Once that completes we perform a union all then overwrite table
    partition. Usually it's during the union all stage that we see these errors
    appear.

    Also is there a log which tells you which log it fails on? I can see which
    task/job failed but not finding which file it's complaining about. I figure
    that might help a bit..

    Thanks!

    --
    Stay Hungry. Stay Foolish.

Related Discussions