FAQ
Does anyone have an expected or experienced write speed to HDFS outside
of Map/Reduce? Any recommendations on properties to tweak in
hadoop-site.xml?

Currently I have a multi-threaded writer where each thread is writing to
a different file. But after a while I get this:

java.io.IOException: Could not get block locations. Aborting...
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS
Client.java:2081)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.ja
va:1702)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie
nt.java:1818)

Which is perhaps indicating that the namenode is overwhelmed?


Thanks,

-Xavier

Search Discussions

  • Philipp Dobrigkeit at Feb 18, 2009 at 7:30 pm
    I am currently trying Map/Reduce in Eclipse. The input comes from an hbase table. The performance of my jobs is terrible. Even when only done on a single row it takes around 10 seconds to complete the job. My current guess is that the reporting done to the eclipse console might play a role in here.

    I am looking for a way to disable the printing of status to the console.

    Or of course any other ideas what is going wrong here.

    This is a single node cluster, pretty common desktop hardware and writing to the hbase is a breeze.

    Thanks
    Philipp
    --
    Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL
    für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
  • Jason hadoop at Feb 18, 2009 at 7:49 pm
    There is a moderate a mount of setup and tear down in any hadoop job. It may
    be that your 10 seconds are primarily that.

    On Wed, Feb 18, 2009 at 11:29 AM, Philipp Dobrigkeit wrote:

    I am currently trying Map/Reduce in Eclipse. The input comes from an hbase
    table. The performance of my jobs is terrible. Even when only done on a
    single row it takes around 10 seconds to complete the job. My current guess
    is that the reporting done to the eclipse console might play a role in here.

    I am looking for a way to disable the printing of status to the console.

    Or of course any other ideas what is going wrong here.

    This is a single node cluster, pretty common desktop hardware and writing
    to the hbase is a breeze.

    Thanks
    Philipp
    --
    Jetzt 1 Monat kostenlos! GMX FreeDSL - Telefonanschluss + DSL
    für nur 17,95 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K11308T4569a
  • Raghu Angadi at Feb 18, 2009 at 7:50 pm
    what is the hadoop version?

    You could check log on a datanode around that time. You could post any
    suspicious errors. For e.g. you can trace a particular block in client
    and datanode logs.

    Most likely it not a NameNode issue, but you can check NameNode log as well.

    Raghu.

    Xavier Stevens wrote:
    Does anyone have an expected or experienced write speed to HDFS outside
    of Map/Reduce? Any recommendations on properties to tweak in
    hadoop-site.xml?

    Currently I have a multi-threaded writer where each thread is writing to
    a different file. But after a while I get this:

    java.io.IOException: Could not get block locations. Aborting...
    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS
    Client.java:2081)
    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.ja
    va:1702)
    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie
    nt.java:1818)

    Which is perhaps indicating that the namenode is overwhelmed?


    Thanks,

    -Xavier
  • Xavier Stevens at Feb 18, 2009 at 8:12 pm
    Raghu,

    I was using 0.17.2.1, but I installed 0.18.3 a couple of days ago. I
    also separated out my secondarynamenode and jobtracker to another
    machine. In addition my network operations people had misconfigured
    some switches which ended up being my bottleneck.

    After all of that my writer and Hadoop is working great.


    -Xavier


    -----Original Message-----
    From: Raghu Angadi
    Sent: Wednesday, February 18, 2009 11:49 AM
    To: core-user@hadoop.apache.org
    Subject: Re: Hadoop Write Performance


    what is the hadoop version?

    You could check log on a datanode around that time. You could post any
    suspicious errors. For e.g. you can trace a particular block in client
    and datanode logs.

    Most likely it not a NameNode issue, but you can check NameNode log as
    well.

    Raghu.

    Xavier Stevens wrote:
    Does anyone have an expected or experienced write speed to HDFS
    outside of Map/Reduce? Any recommendations on properties to tweak in
    hadoop-site.xml?

    Currently I have a multi-threaded writer where each thread is writing
    to a different file. But after a while I get this:

    java.io.IOException: Could not get block locations. Aborting...
    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(D
    FS
    Client.java:2081)
    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.
    ja
    va:1702)
    at
    org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCl
    ie
    nt.java:1818)

    Which is perhaps indicating that the namenode is overwhelmed?


    Thanks,

    -Xavier

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 13, '09 at 4:39p
activeFeb 18, '09 at 8:12p
posts5
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2021 Grokbase