FAQ
heartbeat monitor thread goea away
----------------------------------

Key: HADOOP-1312
URL: https://issues.apache.org/jira/browse/HADOOP-1312
Project: Hadoop
Issue Type: Bug
Components: dfs
Reporter: dhruba borthakur


The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:

1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • dhruba borthakur (JIRA) at May 1, 2007 at 11:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    dhruba borthakur updated HADOOP-1312:
    -------------------------------------

    Summary: heartbeat monitor thread goes away (was: heartbeat monitor thread goea away)
    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur

    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at May 1, 2007 at 11:22 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492980 ]

    dhruba borthakur commented on HADOOP-1312:
    ------------------------------------------

    namenode .out file.
    Exception in thread
    "org.apache.hadoop.dfs.FSNamesystem$HeartbeatMonitor@5b9d2de4" java.util.ConcurrentModificationException
    at
    java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
    at java.util.AbstractList$Itr.next(AbstractList.java:343)
    at
    org.apache.hadoop.dfs.FSNamesystem.heartbeatCheck(FSNamesystem.java:1933)
    at
    org.apache.hadoop.dfs.FSNamesystem$HeartbeatMonitor.run(FSNamesystem.java:1697)
    at java.lang.Thread.run(Thread.java:619)
    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur

    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Koji Noguchi (JIRA) at May 1, 2007 at 11:30 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Koji Noguchi updated HADOOP-1312:
    ---------------------------------

    Priority: Blocker (was: Major)

    Namenode just prints to stderr(.out file) and keeps on running without HeartbeatMonitor thread.
    As a result, namenode tries to assign blocks to the dead datanodes.
    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur
    Priority: Blocker

    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at May 1, 2007 at 11:36 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    dhruba borthakur updated HADOOP-1312:
    -------------------------------------

    Attachment: heartbeatmonitor.patch

    Use a try-catch to ensure that heartbeat monitor continues to run. Protec removeDataNodes by using the heartbeats monitor lock.
    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur
    Priority: Blocker
    Attachments: heartbeatmonitor.patch


    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at May 2, 2007 at 7:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    dhruba borthakur updated HADOOP-1312:
    -------------------------------------

    Attachment: heartbeatmonitor2.patch

    Incorporated Raghu's comments of protecting the node.isAlive field by using the heartbeats monitor lock.
    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur
    Priority: Blocker
    Attachments: heartbeatmonitor.patch, heartbeatmonitor2.patch


    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at May 2, 2007 at 7:14 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    dhruba borthakur updated HADOOP-1312:
    -------------------------------------

    Assignee: dhruba borthakur
    Status: Patch Available (was: Open)
    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur
    Assigned To: dhruba borthakur
    Priority: Blocker
    Attachments: heartbeatmonitor.patch, heartbeatmonitor2.patch


    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Raghu Angadi (JIRA) at May 2, 2007 at 7:37 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493184 ]

    Raghu Angadi commented on HADOOP-1312:
    --------------------------------------

    Another minor change:

    Also since this patch catches all exceptions inside couple of threads (just like other threads), could we log the exceptions at error level instead of info? This way we can differentiate these unexpected exceptions from other expected ones while grepping the logs.

    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur
    Assigned To: dhruba borthakur
    Priority: Blocker
    Attachments: heartbeatmonitor.patch, heartbeatmonitor2.patch


    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at May 2, 2007 at 7:39 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493186 ]

    Hadoop QA commented on HADOOP-1312:
    -----------------------------------

    +1

    http://issues.apache.org/jira/secure/attachment/12356658/heartbeatmonitor2.patch applied and successfully tested against trunk revision r534234.

    Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/104/testReport/
    Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/104/console
    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur
    Assigned To: dhruba borthakur
    Priority: Blocker
    Attachments: heartbeatmonitor.patch, heartbeatmonitor2.patch


    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at May 2, 2007 at 7:46 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    dhruba borthakur updated HADOOP-1312:
    -------------------------------------

    Attachment: heartbeatmonitor3.patch

    Incorporated Raghu's comments about logging levels.
    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur
    Assigned To: dhruba borthakur
    Priority: Blocker
    Attachments: heartbeatmonitor3.patch


    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at May 2, 2007 at 7:46 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    dhruba borthakur updated HADOOP-1312:
    -------------------------------------

    Attachment: (was: heartbeatmonitor.patch)
    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur
    Assigned To: dhruba borthakur
    Priority: Blocker
    Attachments: heartbeatmonitor3.patch


    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at May 2, 2007 at 7:46 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    dhruba borthakur updated HADOOP-1312:
    -------------------------------------

    Attachment: (was: heartbeatmonitor2.patch)
    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur
    Assigned To: dhruba borthakur
    Priority: Blocker
    Attachments: heartbeatmonitor3.patch


    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Doug Cutting (JIRA) at May 2, 2007 at 9:38 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Doug Cutting updated HADOOP-1312:
    ---------------------------------

    Resolution: Fixed
    Fix Version/s: 0.13.0
    Status: Resolved (was: Patch Available)

    I just committed this. Thanks, Dhruba!
    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur
    Assigned To: dhruba borthakur
    Priority: Blocker
    Fix For: 0.13.0

    Attachments: heartbeatmonitor3.patch


    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Hadoop QA (JIRA) at May 3, 2007 at 11:26 am
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493345 ]

    Hadoop QA commented on HADOOP-1312:
    -----------------------------------

    Integrated in Hadoop-Nightly #77 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/77/)
    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur
    Assigned To: dhruba borthakur
    Priority: Blocker
    Fix For: 0.13.0

    Attachments: heartbeatmonitor3.patch


    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • dhruba borthakur (JIRA) at May 4, 2007 at 10:43 pm
    [ https://issues.apache.org/jira/browse/HADOOP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    dhruba borthakur updated HADOOP-1312:
    -------------------------------------

    Attachment: heartbeatmonitor-0.12.3.patch

    Patch for 0.12.3 release.
    heartbeat monitor thread goes away
    ----------------------------------

    Key: HADOOP-1312
    URL: https://issues.apache.org/jira/browse/HADOOP-1312
    Project: Hadoop
    Issue Type: Bug
    Components: dfs
    Reporter: dhruba borthakur
    Assigned To: dhruba borthakur
    Priority: Blocker
    Fix For: 0.13.0

    Attachments: heartbeatmonitor-0.12.3.patch, heartbeatmonitor3.patch


    The heartbeat monitor thread encounters a ConcurrentModificationException while iterating over the "heartbeats" data structure. This occurs when the namenode was getting restarted. There are actuallt two bugs here:
    1. The Heartbeat Monitor thread needs to catch Exceptions and continue, instead of exiting.
    2. The heartbeats data structures is protected by the heartbeats lock. The registerDatanode() method invokes removeDatanode() without acquiring the heartbeats monitor lock. This causes the ConcurrentModificationException.
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMay 1, '07 at 11:14p
activeMay 4, '07 at 10:43p
posts15
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

dhruba borthakur (JIRA): 15 posts

People

Translate

site design / logo © 2022 Grokbase