FAQ
Hi,

Could you please sanity check this:

In Hadoop-site.xml I add:

<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1G</value>
<description>Increasing the size of the heap to allow for large in
memory index of polygons</description>
</property>

Is this all required to increase the -Xmx for processes running Maps?

(During Mapper.configure() I build a large hashtree lookup object...)

Many thanks

Tim

Search Discussions

  • Dennis Kubes at Nov 26, 2008 at 12:21 pm
    I have always seen -Xmx set in megabytes versus gigabytes. It does work
    for me on Ubuntu as G but tt may depend on the JVM and OS, -Xmx1024M
    version -Xmx1G. Other than that I think it looks good

    Dennis

    tim robertson wrote:
    Hi,

    Could you please sanity check this:

    In Hadoop-site.xml I add:

    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1G</value>
    <description>Increasing the size of the heap to allow for large in
    memory index of polygons</description>
    </property>

    Is this all required to increase the -Xmx for processes running Maps?

    (During Mapper.configure() I build a large hashtree lookup object...)

    Many thanks

    Tim
  • Tim robertson at Nov 26, 2008 at 12:46 pm
    Thanks!

    Just making sure that this was the only parameter needing set.

    Cheers

    Tim

    On Wed, Nov 26, 2008 at 1:20 PM, Dennis Kubes wrote:
    I have always seen -Xmx set in megabytes versus gigabytes. It does work for
    me on Ubuntu as G but tt may depend on the JVM and OS, -Xmx1024M version
    -Xmx1G. Other than that I think it looks good

    Dennis

    tim robertson wrote:
    Hi,

    Could you please sanity check this:

    In Hadoop-site.xml I add:

    <property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1G</value>
    <description>Increasing the size of the heap to allow for large in
    memory index of polygons</description>
    </property>

    Is this all required to increase the -Xmx for processes running Maps?

    (During Mapper.configure() I build a large hashtree lookup object...)

    Many thanks

    Tim
  • Ricky Ho at Nov 26, 2008 at 4:11 pm
    Does Hadoop support the environment where nodes join and leave without a preconfigured file like "hadoop-site.xml" ? The characteristic is that none of the IP addresses and node names of any machines are stable. They will change after the machine is reboot after crash.

    Before that, I use a simple way of just configuring my hadoop-site.xml and use the startup scripts that takes care of everything. But for the dynamic IP address scenario, that doesn't seem to work. Can someone suggest a solution how to deal with this scenario ?

    Here are the considerations ...

    Startup Discovery Scenario
    ===========================
    How does a NameNode knows a newly joined DataNode ?
    How does a new DataNode knows the existing NameNode ?
    How does a JobTracker knows a newly joined TaskTracker ?
    How does a new TaskTracker knows the existing JobTracker ?

    Fail Recovery Scenario
    =======================
    Lets say a NameNode crash, and then another NameNode (at a different address) starts up. How does the new NameNode learnt about other DataNodes ?
    How does other DataNodes learn about this new NameNode ?

    Lets say a JobTracker crash, and then another JobTracker (at a different address) starts up. How does the new JobTracker learnt about other TaskTrackers ?
    How does other TaskTrackers learn about this new JobTracker ?

    Lets say a DataNode crash, and then another DataNode (at a different address) starts up. How does the new DataNode learnt about the existing NameNode ?
    How does the existing NameNode learn about this new DataNode ?

    Lets say a TaskTracker crash, and then another TaskTracker (at a different address) starts up. How does the new TaskTracker learnt about the existing JobTracker ?
    How does the existing JobTracker learn about this new TaskTracker ?


    Rgds,
    Ricky
  • Steve Loughran at Nov 26, 2008 at 4:23 pm

    Ricky Ho wrote:
    Does Hadoop support the environment where nodes join and leave without a preconfigured file like "hadoop-site.xml" ? The characteristic is that none of the IP addresses and node names of any machines are stable. They will change after the machine is reboot after crash.

    Before that, I use a simple way of just configuring my hadoop-site.xml and use the startup scripts that takes care of everything. But for the dynamic IP address scenario, that doesn't seem to work. Can someone suggest a solution how to deal with this scenario ?

    Here are the considerations ...

    Startup Discovery Scenario
    ===========================
    How does a NameNode knows a newly joined DataNode ?
    How does a new DataNode knows the existing NameNode ?
    How does a JobTracker knows a newly joined TaskTracker ?
    How does a new TaskTracker knows the existing JobTracker ?

    Fail Recovery Scenario
    =======================
    Lets say a NameNode crash, and then another NameNode (at a different address) starts up. How does the new NameNode learnt about other DataNodes ?
    How does other DataNodes learn about this new NameNode ?

    Lets say a JobTracker crash, and then another JobTracker (at a different address) starts up. How does the new JobTracker learnt about other TaskTrackers ?
    How does other TaskTrackers learn about this new JobTracker ?

    Lets say a DataNode crash, and then another DataNode (at a different address) starts up. How does the new DataNode learnt about the existing NameNode ?
    How does the existing NameNode learn about this new DataNode ?

    Lets say a TaskTracker crash, and then another TaskTracker (at a different address) starts up. How does the new TaskTracker learnt about the existing JobTracker ?
    How does the existing JobTracker learn about this new TaskTracker ?
    You need something to do the discovery, for them to find their settings.

    - we use Anubis - http://wiki.smartfrog.org/wiki/display/sf/Anubis - it
    works in places where multicast works
    - Zookeeper may work here too; you should look at that
    - I caught an interesting talk recently where someone on EC2 used
    SimpleDB as the node registration API. EC2 doesn't support multicast Ip,
    so instead every node talks to a simpledb table and registers there,
    looks up its peers. You could push out a site.xml equivalent there too.

    If you can use dynamic dns your life is fairly simple. Your namenode and
    job tracker need to register with the DNS servers; everything else needs
    to pick them up. You will need to run Hadoop with limited caching of
    valid hostnames though, so that after a server restart the changed
    addresses are picked up.

    -steve

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedNov 26, '08 at 11:54a
activeNov 26, '08 at 4:23p
posts5
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase