FAQ
Hi.

I'm want to keep a checkpoint data on several separate machines for backup,
and deliberating between exporting these machines disks via NFS, or actually
running Secondary Name Nodes there.

Can anyone advice what would be better in my case?

Regards.

Search Discussions

  • Stas Oskin at Oct 21, 2009 at 2:46 pm
    To clarify, it's either let single SecNameNode to write to multiple NFS
    exports, or actually have multiple SecNameNodes.

    Thanks again.
    On Wed, Oct 21, 2009 at 4:43 PM, Stas Oskin wrote:

    Hi.

    I'm want to keep a checkpoint data on several separate machines for backup,
    and deliberating between exporting these machines disks via NFS, or actually
    running Secondary Name Nodes there.

    Can anyone advice what would be better in my case?

    Regards.
  • Patrick Angeles at Oct 22, 2009 at 8:04 pm
    From what I understand, it's rather tricky to set up multiple secondary
    namenodes. In either case, running multiple 2ndary NNs doesn't get you much.
    See this thread:
    http://www.mail-archive.com/core-user@hadoop.apache.org/msg06280.html
    On Wed, Oct 21, 2009 at 10:44 AM, Stas Oskin wrote:

    To clarify, it's either let single SecNameNode to write to multiple NFS
    exports, or actually have multiple SecNameNodes.

    Thanks again.
    On Wed, Oct 21, 2009 at 4:43 PM, Stas Oskin wrote:

    Hi.

    I'm want to keep a checkpoint data on several separate machines for backup,
    and deliberating between exporting these machines disks via NFS, or actually
    running Secondary Name Nodes there.

    Can anyone advice what would be better in my case?

    Regards.
  • Stas Oskin at Oct 26, 2009 at 3:15 pm
    Hi.

    Thanks for the advice, it seems that the initial approach of having single
    SecNameNode writing to exports is the way to go.

    By the way, I asked this already, but wanted to clarify:

    * It's possible to set how often SecNameNode checkpoints the data (what is
    the setting by the way)?

    * It's possible to let NameNode write to exports as well together with local
    disk, which ensures the latest possible meta-data in case of disk crash
    (compared to pereodic check-pointing), but it's going to slow down the
    operations due to network read/writes.

    Thanks again.

    On Thu, Oct 22, 2009 at 10:03 PM, Patrick Angeles
    wrote:
    From what I understand, it's rather tricky to set up multiple secondary
    namenodes. In either case, running multiple 2ndary NNs doesn't get you
    much.
    See this thread:
    http://www.mail-archive.com/core-user@hadoop.apache.org/msg06280.html
    On Wed, Oct 21, 2009 at 10:44 AM, Stas Oskin wrote:

    To clarify, it's either let single SecNameNode to write to multiple NFS
    exports, or actually have multiple SecNameNodes.

    Thanks again.
    On Wed, Oct 21, 2009 at 4:43 PM, Stas Oskin wrote:

    Hi.

    I'm want to keep a checkpoint data on several separate machines for backup,
    and deliberating between exporting these machines disks via NFS, or actually
    running Secondary Name Nodes there.

    Can anyone advice what would be better in my case?

    Regards.
  • Jason Venner at Oct 27, 2009 at 2:49 pm
    We have been having some trouble with the secondary on a cluster that has
    one edit log partition on an nfs server, with the namenode rejecting the
    merged images due to timestamp missmatches.

    On Mon, Oct 26, 2009 at 10:14 AM, Stas Oskin wrote:

    Hi.

    Thanks for the advice, it seems that the initial approach of having single
    SecNameNode writing to exports is the way to go.

    By the way, I asked this already, but wanted to clarify:

    * It's possible to set how often SecNameNode checkpoints the data (what is
    the setting by the way)?

    * It's possible to let NameNode write to exports as well together with
    local
    disk, which ensures the latest possible meta-data in case of disk crash
    (compared to pereodic check-pointing), but it's going to slow down the
    operations due to network read/writes.

    Thanks again.

    On Thu, Oct 22, 2009 at 10:03 PM, Patrick Angeles
    wrote:
    From what I understand, it's rather tricky to set up multiple secondary
    namenodes. In either case, running multiple 2ndary NNs doesn't get you
    much.
    See this thread:
    http://www.mail-archive.com/core-user@hadoop.apache.org/msg06280.html
    On Wed, Oct 21, 2009 at 10:44 AM, Stas Oskin wrote:

    To clarify, it's either let single SecNameNode to write to multiple NFS
    exports, or actually have multiple SecNameNodes.

    Thanks again.

    On Wed, Oct 21, 2009 at 4:43 PM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    I'm want to keep a checkpoint data on several separate machines for backup,
    and deliberating between exporting these machines disks via NFS, or actually
    running Secondary Name Nodes there.

    Can anyone advice what would be better in my case?

    Regards.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Stas Oskin at Oct 27, 2009 at 3:52 pm
    Hi.

    You mean, you couldn't recover the NameNode from checkpoints because of
    timestamps?

    Regards.
    On Tue, Oct 27, 2009 at 4:49 PM, Jason Venner wrote:

    We have been having some trouble with the secondary on a cluster that has
    one edit log partition on an nfs server, with the namenode rejecting the
    merged images due to timestamp missmatches.

    On Mon, Oct 26, 2009 at 10:14 AM, Stas Oskin wrote:

    Hi.

    Thanks for the advice, it seems that the initial approach of having single
    SecNameNode writing to exports is the way to go.

    By the way, I asked this already, but wanted to clarify:

    * It's possible to set how often SecNameNode checkpoints the data (what is
    the setting by the way)?

    * It's possible to let NameNode write to exports as well together with
    local
    disk, which ensures the latest possible meta-data in case of disk crash
    (compared to pereodic check-pointing), but it's going to slow down the
    operations due to network read/writes.

    Thanks again.

    On Thu, Oct 22, 2009 at 10:03 PM, Patrick Angeles
    wrote:
    From what I understand, it's rather tricky to set up multiple secondary
    namenodes. In either case, running multiple 2ndary NNs doesn't get you
    much.
    See this thread:
    http://www.mail-archive.com/core-user@hadoop.apache.org/msg06280.html

    On Wed, Oct 21, 2009 at 10:44 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    To clarify, it's either let single SecNameNode to write to multiple
    NFS
    exports, or actually have multiple SecNameNodes.

    Thanks again.

    On Wed, Oct 21, 2009 at 4:43 PM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    I'm want to keep a checkpoint data on several separate machines for backup,
    and deliberating between exporting these machines disks via NFS, or actually
    running Secondary Name Nodes there.

    Can anyone advice what would be better in my case?

    Regards.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Jason Venner at Dec 5, 2009 at 5:44 am
    I have dug into this more, it turns out the problem is unrelated to nfs or
    solaris.
    The issue is that if there is a meta data change, while the secondary is
    rebuilding the fsimage, the rebuilt image is rejected.
    On our production cluster, there is almost never a moment where there is not
    a file being created or altered, and as such the secondary is never make a
    fresh fsimage for the cluster.

    I have checked this with several hadoop variants and with vanilla
    distributions with the namenode, secondary and a datanode all running on the
    same machine.
    On Tue, Oct 27, 2009 at 8:03 PM, Jason Venner wrote:

    The namenode would never accept the rebuild fsimage from the secondary, so
    the edit logs grew with outbounds.

    On Tue, Oct 27, 2009 at 10:51 AM, Stas Oskin wrote:

    Hi.

    You mean, you couldn't recover the NameNode from checkpoints because of
    timestamps?

    Regards.

    On Tue, Oct 27, 2009 at 4:49 PM, Jason Venner <jason.hadoop@gmail.com
    wrote:
    We have been having some trouble with the secondary on a cluster that has
    one edit log partition on an nfs server, with the namenode rejecting the
    merged images due to timestamp missmatches.


    On Mon, Oct 26, 2009 at 10:14 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    Thanks for the advice, it seems that the initial approach of having single
    SecNameNode writing to exports is the way to go.

    By the way, I asked this already, but wanted to clarify:

    * It's possible to set how often SecNameNode checkpoints the data
    (what
    is
    the setting by the way)?

    * It's possible to let NameNode write to exports as well together with
    local
    disk, which ensures the latest possible meta-data in case of disk
    crash
    (compared to pereodic check-pointing), but it's going to slow down the
    operations due to network read/writes.

    Thanks again.

    On Thu, Oct 22, 2009 at 10:03 PM, Patrick Angeles
    wrote:
    From what I understand, it's rather tricky to set up multiple
    secondary
    namenodes. In either case, running multiple 2ndary NNs doesn't get
    you
    much.
    See this thread:
    http://www.mail-archive.com/core-user@hadoop.apache.org/msg06280.html
    On Wed, Oct 21, 2009 at 10:44 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    To clarify, it's either let single SecNameNode to write to
    multiple
    NFS
    exports, or actually have multiple SecNameNodes.

    Thanks again.

    On Wed, Oct 21, 2009 at 4:43 PM, Stas Oskin <stas.oskin@gmail.com
    wrote:
    Hi.

    I'm want to keep a checkpoint data on several separate machines
    for
    backup,
    and deliberating between exporting these machines disks via NFS,
    or
    actually
    running Secondary Name Nodes there.

    Can anyone advice what would be better in my case?

    Regards.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Stas Oskin at Dec 23, 2009 at 7:24 pm
    Hi.

    What was your solution to this then?

    Regards.
    On Sat, Dec 5, 2009 at 7:43 AM, Jason Venner wrote:

    I have dug into this more, it turns out the problem is unrelated to nfs or
    solaris.
    The issue is that if there is a meta data change, while the secondary is
    rebuilding the fsimage, the rebuilt image is rejected.
    On our production cluster, there is almost never a moment where there is
    not
    a file being created or altered, and as such the secondary is never make a
    fresh fsimage for the cluster.

    I have checked this with several hadoop variants and with vanilla
    distributions with the namenode, secondary and a datanode all running on
    the
    same machine.

    On Tue, Oct 27, 2009 at 8:03 PM, Jason Venner <jason.hadoop@gmail.com
    wrote:
    The namenode would never accept the rebuild fsimage from the secondary, so
    the edit logs grew with outbounds.

    On Tue, Oct 27, 2009 at 10:51 AM, Stas Oskin wrote:

    Hi.

    You mean, you couldn't recover the NameNode from checkpoints because of
    timestamps?

    Regards.

    On Tue, Oct 27, 2009 at 4:49 PM, Jason Venner <jason.hadoop@gmail.com
    wrote:
    We have been having some trouble with the secondary on a cluster that has
    one edit log partition on an nfs server, with the namenode rejecting
    the
    merged images due to timestamp missmatches.


    On Mon, Oct 26, 2009 at 10:14 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    Thanks for the advice, it seems that the initial approach of having single
    SecNameNode writing to exports is the way to go.

    By the way, I asked this already, but wanted to clarify:

    * It's possible to set how often SecNameNode checkpoints the data
    (what
    is
    the setting by the way)?

    * It's possible to let NameNode write to exports as well together
    with
    local
    disk, which ensures the latest possible meta-data in case of disk
    crash
    (compared to pereodic check-pointing), but it's going to slow down
    the
    operations due to network read/writes.

    Thanks again.

    On Thu, Oct 22, 2009 at 10:03 PM, Patrick Angeles
    wrote:
    From what I understand, it's rather tricky to set up multiple
    secondary
    namenodes. In either case, running multiple 2ndary NNs doesn't get
    you
    much.
    See this thread:
    http://www.mail-archive.com/core-user@hadoop.apache.org/msg06280.html
    On Wed, Oct 21, 2009 at 10:44 AM, Stas Oskin <
    stas.oskin@gmail.com>
    wrote:
    To clarify, it's either let single SecNameNode to write to
    multiple
    NFS
    exports, or actually have multiple SecNameNodes.

    Thanks again.

    On Wed, Oct 21, 2009 at 4:43 PM, Stas Oskin <
    stas.oskin@gmail.com
    wrote:
    Hi.

    I'm want to keep a checkpoint data on several separate
    machines
    for
    backup,
    and deliberating between exporting these machines disks via
    NFS,
    or
    actually
    running Secondary Name Nodes there.

    Can anyone advice what would be better in my case?

    Regards.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Jason Venner at Dec 24, 2009 at 1:07 am
    I have no current solution.
    When I can block a few days, I am going to instrument the code a bit more to
    verify my understanding.

    I believe the issue is that the time stamp is being checked against the
    active edit log (the new one created then the checkpoint started) rather
    than the time stamp of the rolled (old) edit log.
    As long as no transactions have hit, the time stamps are the same.

    On Wed, Dec 23, 2009 at 11:23 AM, Stas Oskin wrote:

    Hi.

    What was your solution to this then?

    Regards.
    On Sat, Dec 5, 2009 at 7:43 AM, Jason Venner wrote:

    I have dug into this more, it turns out the problem is unrelated to nfs or
    solaris.
    The issue is that if there is a meta data change, while the secondary is
    rebuilding the fsimage, the rebuilt image is rejected.
    On our production cluster, there is almost never a moment where there is
    not
    a file being created or altered, and as such the secondary is never make a
    fresh fsimage for the cluster.

    I have checked this with several hadoop variants and with vanilla
    distributions with the namenode, secondary and a datanode all running on
    the
    same machine.

    On Tue, Oct 27, 2009 at 8:03 PM, Jason Venner <jason.hadoop@gmail.com
    wrote:
    The namenode would never accept the rebuild fsimage from the secondary, so
    the edit logs grew with outbounds.


    On Tue, Oct 27, 2009 at 10:51 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    You mean, you couldn't recover the NameNode from checkpoints because
    of
    timestamps?

    Regards.

    On Tue, Oct 27, 2009 at 4:49 PM, Jason Venner <jason.hadoop@gmail.com
    wrote:
    We have been having some trouble with the secondary on a cluster
    that
    has
    one edit log partition on an nfs server, with the namenode rejecting
    the
    merged images due to timestamp missmatches.


    On Mon, Oct 26, 2009 at 10:14 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    Thanks for the advice, it seems that the initial approach of
    having
    single
    SecNameNode writing to exports is the way to go.

    By the way, I asked this already, but wanted to clarify:

    * It's possible to set how often SecNameNode checkpoints the data
    (what
    is
    the setting by the way)?

    * It's possible to let NameNode write to exports as well together
    with
    local
    disk, which ensures the latest possible meta-data in case of disk
    crash
    (compared to pereodic check-pointing), but it's going to slow down
    the
    operations due to network read/writes.

    Thanks again.

    On Thu, Oct 22, 2009 at 10:03 PM, Patrick Angeles
    wrote:
    From what I understand, it's rather tricky to set up multiple
    secondary
    namenodes. In either case, running multiple 2ndary NNs doesn't
    get
    you
    much.
    See this thread:
    http://www.mail-archive.com/core-user@hadoop.apache.org/msg06280.html
    On Wed, Oct 21, 2009 at 10:44 AM, Stas Oskin <
    stas.oskin@gmail.com>
    wrote:
    To clarify, it's either let single SecNameNode to write to
    multiple
    NFS
    exports, or actually have multiple SecNameNodes.

    Thanks again.

    On Wed, Oct 21, 2009 at 4:43 PM, Stas Oskin <
    stas.oskin@gmail.com
    wrote:
    Hi.

    I'm want to keep a checkpoint data on several separate
    machines
    for
    backup,
    and deliberating between exporting these machines disks via
    NFS,
    or
    actually
    running Secondary Name Nodes there.

    Can anyone advice what would be better in my case?

    Regards.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Brian Bockelman at Dec 24, 2009 at 1:53 am
    Hey Jason,

    This analysis seems fairly unlikely - are you claiming that no edits can be merged if files are being created? Isn't this what edits.new is for?

    We roll the edits log successfully during periods of high transfer, when a new file is being created every 1 second or so.

    We have had issues with unmergeable edits before - there might be some race conditions in this area.

    Brian
    On Dec 23, 2009, at 7:07 PM, Jason Venner wrote:

    I have no current solution.
    When I can block a few days, I am going to instrument the code a bit more to
    verify my understanding.

    I believe the issue is that the time stamp is being checked against the
    active edit log (the new one created then the checkpoint started) rather
    than the time stamp of the rolled (old) edit log.
    As long as no transactions have hit, the time stamps are the same.

    On Wed, Dec 23, 2009 at 11:23 AM, Stas Oskin wrote:

    Hi.

    What was your solution to this then?

    Regards.

    On Sat, Dec 5, 2009 at 7:43 AM, Jason Venner <jason.hadoop@gmail.com>
    wrote:
    I have dug into this more, it turns out the problem is unrelated to nfs or
    solaris.
    The issue is that if there is a meta data change, while the secondary is
    rebuilding the fsimage, the rebuilt image is rejected.
    On our production cluster, there is almost never a moment where there is
    not
    a file being created or altered, and as such the secondary is never make a
    fresh fsimage for the cluster.

    I have checked this with several hadoop variants and with vanilla
    distributions with the namenode, secondary and a datanode all running on
    the
    same machine.

    On Tue, Oct 27, 2009 at 8:03 PM, Jason Venner <jason.hadoop@gmail.com
    wrote:
    The namenode would never accept the rebuild fsimage from the secondary, so
    the edit logs grew with outbounds.


    On Tue, Oct 27, 2009 at 10:51 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    You mean, you couldn't recover the NameNode from checkpoints because
    of
    timestamps?

    Regards.

    On Tue, Oct 27, 2009 at 4:49 PM, Jason Venner <jason.hadoop@gmail.com
    wrote:
    We have been having some trouble with the secondary on a cluster
    that
    has
    one edit log partition on an nfs server, with the namenode rejecting
    the
    merged images due to timestamp missmatches.


    On Mon, Oct 26, 2009 at 10:14 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    Thanks for the advice, it seems that the initial approach of
    having
    single
    SecNameNode writing to exports is the way to go.

    By the way, I asked this already, but wanted to clarify:

    * It's possible to set how often SecNameNode checkpoints the data
    (what
    is
    the setting by the way)?

    * It's possible to let NameNode write to exports as well together
    with
    local
    disk, which ensures the latest possible meta-data in case of disk
    crash
    (compared to pereodic check-pointing), but it's going to slow down
    the
    operations due to network read/writes.

    Thanks again.

    On Thu, Oct 22, 2009 at 10:03 PM, Patrick Angeles
    wrote:
    From what I understand, it's rather tricky to set up multiple
    secondary
    namenodes. In either case, running multiple 2ndary NNs doesn't
    get
    you
    much.
    See this thread:
    http://www.mail-archive.com/core-user@hadoop.apache.org/msg06280.html
    On Wed, Oct 21, 2009 at 10:44 AM, Stas Oskin <
    stas.oskin@gmail.com>
    wrote:
    To clarify, it's either let single SecNameNode to write to
    multiple
    NFS
    exports, or actually have multiple SecNameNodes.

    Thanks again.

    On Wed, Oct 21, 2009 at 4:43 PM, Stas Oskin <
    stas.oskin@gmail.com
    wrote:
    Hi.

    I'm want to keep a checkpoint data on several separate
    machines
    for
    backup,
    and deliberating between exporting these machines disks via
    NFS,
    or
    actually
    running Secondary Name Nodes there.

    Can anyone advice what would be better in my case?

    Regards.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Jason Venner at Dec 24, 2009 at 2:38 am
    I agree, it seems very wrong, that is why I need a block of time to really
    verify the behavior.

    My test case is the following, and the same test fails in 18.3 and 19.0 and
    19.1

    set up a single node cluster, 1 namenode, 1 datanode, 1 secondary, all on
    the same machine.
    set the checkpoint interval to 2 minutes (120 sec)

    make a few files, wait, and verify that a checkpoint can happen.

    recursively start coping a deep tree into hdfs, what the checkpoint fail
    with a timestamp error.

    The code explicitly uses the edits.new for the checkpoint verification
    timestamp.

    The window is the time from the take of the edit log to the return of the
    fsimage.
    On Wed, Dec 23, 2009 at 5:52 PM, Brian Bockelman wrote:

    Hey Jason,

    This analysis seems fairly unlikely - are you claiming that no edits can be
    merged if files are being created? Isn't this what edits.new is for?

    We roll the edits log successfully during periods of high transfer, when a
    new file is being created every 1 second or so.

    We have had issues with unmergeable edits before - there might be some race
    conditions in this area.

    Brian
    On Dec 23, 2009, at 7:07 PM, Jason Venner wrote:

    I have no current solution.
    When I can block a few days, I am going to instrument the code a bit more to
    verify my understanding.

    I believe the issue is that the time stamp is being checked against the
    active edit log (the new one created then the checkpoint started) rather
    than the time stamp of the rolled (old) edit log.
    As long as no transactions have hit, the time stamps are the same.

    On Wed, Dec 23, 2009 at 11:23 AM, Stas Oskin wrote:

    Hi.

    What was your solution to this then?

    Regards.

    On Sat, Dec 5, 2009 at 7:43 AM, Jason Venner <jason.hadoop@gmail.com>
    wrote:
    I have dug into this more, it turns out the problem is unrelated to nfs or
    solaris.
    The issue is that if there is a meta data change, while the secondary
    is
    rebuilding the fsimage, the rebuilt image is rejected.
    On our production cluster, there is almost never a moment where there
    is
    not
    a file being created or altered, and as such the secondary is never
    make
    a
    fresh fsimage for the cluster.

    I have checked this with several hadoop variants and with vanilla
    distributions with the namenode, secondary and a datanode all running
    on
    the
    same machine.

    On Tue, Oct 27, 2009 at 8:03 PM, Jason Venner <jason.hadoop@gmail.com
    wrote:
    The namenode would never accept the rebuild fsimage from the
    secondary,
    so
    the edit logs grew with outbounds.


    On Tue, Oct 27, 2009 at 10:51 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    You mean, you couldn't recover the NameNode from checkpoints because
    of
    timestamps?

    Regards.

    On Tue, Oct 27, 2009 at 4:49 PM, Jason Venner <
    jason.hadoop@gmail.com
    wrote:
    We have been having some trouble with the secondary on a cluster
    that
    has
    one edit log partition on an nfs server, with the namenode rejecting
    the
    merged images due to timestamp missmatches.


    On Mon, Oct 26, 2009 at 10:14 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    Thanks for the advice, it seems that the initial approach of
    having
    single
    SecNameNode writing to exports is the way to go.

    By the way, I asked this already, but wanted to clarify:

    * It's possible to set how often SecNameNode checkpoints the data
    (what
    is
    the setting by the way)?

    * It's possible to let NameNode write to exports as well together
    with
    local
    disk, which ensures the latest possible meta-data in case of disk
    crash
    (compared to pereodic check-pointing), but it's going to slow down
    the
    operations due to network read/writes.

    Thanks again.

    On Thu, Oct 22, 2009 at 10:03 PM, Patrick Angeles
    wrote:
    From what I understand, it's rather tricky to set up multiple
    secondary
    namenodes. In either case, running multiple 2ndary NNs doesn't
    get
    you
    much.
    See this thread:
    http://www.mail-archive.com/core-user@hadoop.apache.org/msg06280.html
    On Wed, Oct 21, 2009 at 10:44 AM, Stas Oskin <
    stas.oskin@gmail.com>
    wrote:
    To clarify, it's either let single SecNameNode to write to
    multiple
    NFS
    exports, or actually have multiple SecNameNodes.

    Thanks again.

    On Wed, Oct 21, 2009 at 4:43 PM, Stas Oskin <
    stas.oskin@gmail.com
    wrote:
    Hi.

    I'm want to keep a checkpoint data on several separate
    machines
    for
    backup,
    and deliberating between exporting these machines disks via
    NFS,
    or
    actually
    running Secondary Name Nodes there.

    Can anyone advice what would be better in my case?

    Regards.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals

    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Todd Lipcon at Dec 24, 2009 at 6:35 pm
    How long does the checkpoint take? It seems possible to me that if the 2NN
    checkpoint takes longer than the interval, it's possible that multiple
    checkpoints will overlap and might trigger this. (this is conjecture, so
    definitely worth testing)

    -Todd
    On Wed, Dec 23, 2009 at 6:38 PM, Jason Venner wrote:

    I agree, it seems very wrong, that is why I need a block of time to really
    verify the behavior.

    My test case is the following, and the same test fails in 18.3 and 19.0 and
    19.1

    set up a single node cluster, 1 namenode, 1 datanode, 1 secondary, all on
    the same machine.
    set the checkpoint interval to 2 minutes (120 sec)

    make a few files, wait, and verify that a checkpoint can happen.

    recursively start coping a deep tree into hdfs, what the checkpoint fail
    with a timestamp error.

    The code explicitly uses the edits.new for the checkpoint verification
    timestamp.

    The window is the time from the take of the edit log to the return of the
    fsimage.

    On Wed, Dec 23, 2009 at 5:52 PM, Brian Bockelman <bbockelm@cse.unl.edu
    wrote:
    Hey Jason,

    This analysis seems fairly unlikely - are you claiming that no edits can be
    merged if files are being created? Isn't this what edits.new is for?

    We roll the edits log successfully during periods of high transfer, when a
    new file is being created every 1 second or so.

    We have had issues with unmergeable edits before - there might be some race
    conditions in this area.

    Brian
    On Dec 23, 2009, at 7:07 PM, Jason Venner wrote:

    I have no current solution.
    When I can block a few days, I am going to instrument the code a bit
    more
    to
    verify my understanding.

    I believe the issue is that the time stamp is being checked against the
    active edit log (the new one created then the checkpoint started)
    rather
    than the time stamp of the rolled (old) edit log.
    As long as no transactions have hit, the time stamps are the same.


    On Wed, Dec 23, 2009 at 11:23 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    What was your solution to this then?

    Regards.

    On Sat, Dec 5, 2009 at 7:43 AM, Jason Venner <jason.hadoop@gmail.com>
    wrote:
    I have dug into this more, it turns out the problem is unrelated to
    nfs
    or
    solaris.
    The issue is that if there is a meta data change, while the secondary
    is
    rebuilding the fsimage, the rebuilt image is rejected.
    On our production cluster, there is almost never a moment where there
    is
    not
    a file being created or altered, and as such the secondary is never
    make
    a
    fresh fsimage for the cluster.

    I have checked this with several hadoop variants and with vanilla
    distributions with the namenode, secondary and a datanode all running
    on
    the
    same machine.

    On Tue, Oct 27, 2009 at 8:03 PM, Jason Venner <
    jason.hadoop@gmail.com
    wrote:
    The namenode would never accept the rebuild fsimage from the
    secondary,
    so
    the edit logs grew with outbounds.


    On Tue, Oct 27, 2009 at 10:51 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    You mean, you couldn't recover the NameNode from checkpoints
    because
    of
    timestamps?

    Regards.

    On Tue, Oct 27, 2009 at 4:49 PM, Jason Venner <
    jason.hadoop@gmail.com
    wrote:
    We have been having some trouble with the secondary on a cluster
    that
    has
    one edit log partition on an nfs server, with the namenode
    rejecting
    the
    merged images due to timestamp missmatches.


    On Mon, Oct 26, 2009 at 10:14 AM, Stas Oskin <
    stas.oskin@gmail.com>
    wrote:
    Hi.

    Thanks for the advice, it seems that the initial approach of
    having
    single
    SecNameNode writing to exports is the way to go.

    By the way, I asked this already, but wanted to clarify:

    * It's possible to set how often SecNameNode checkpoints the data
    (what
    is
    the setting by the way)?

    * It's possible to let NameNode write to exports as well together
    with
    local
    disk, which ensures the latest possible meta-data in case of disk
    crash
    (compared to pereodic check-pointing), but it's going to slow
    down
    the
    operations due to network read/writes.

    Thanks again.

    On Thu, Oct 22, 2009 at 10:03 PM, Patrick Angeles
    wrote:
    From what I understand, it's rather tricky to set up multiple
    secondary
    namenodes. In either case, running multiple 2ndary NNs doesn't
    get
    you
    much.
    See this thread:
    http://www.mail-archive.com/core-user@hadoop.apache.org/msg06280.html
    On Wed, Oct 21, 2009 at 10:44 AM, Stas Oskin <
    stas.oskin@gmail.com>
    wrote:
    To clarify, it's either let single SecNameNode to write to
    multiple
    NFS
    exports, or actually have multiple SecNameNodes.

    Thanks again.

    On Wed, Oct 21, 2009 at 4:43 PM, Stas Oskin <
    stas.oskin@gmail.com
    wrote:
    Hi.

    I'm want to keep a checkpoint data on several separate
    machines
    for
    backup,
    and deliberating between exporting these machines disks via
    NFS,
    or
    actually
    running Secondary Name Nodes there.

    Can anyone advice what would be better in my case?

    Regards.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals

    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Jason Venner at Dec 24, 2009 at 7:16 pm
    In my test case, the checkpoints take a small number of seconds or less.
    On Thu, Dec 24, 2009 at 10:34 AM, Todd Lipcon wrote:

    How long does the checkpoint take? It seems possible to me that if the 2NN
    checkpoint takes longer than the interval, it's possible that multiple
    checkpoints will overlap and might trigger this. (this is conjecture, so
    definitely worth testing)

    -Todd

    On Wed, Dec 23, 2009 at 6:38 PM, Jason Venner <jason.hadoop@gmail.com
    wrote:
    I agree, it seems very wrong, that is why I need a block of time to really
    verify the behavior.

    My test case is the following, and the same test fails in 18.3 and 19.0 and
    19.1

    set up a single node cluster, 1 namenode, 1 datanode, 1 secondary, all on
    the same machine.
    set the checkpoint interval to 2 minutes (120 sec)

    make a few files, wait, and verify that a checkpoint can happen.

    recursively start coping a deep tree into hdfs, what the checkpoint fail
    with a timestamp error.

    The code explicitly uses the edits.new for the checkpoint verification
    timestamp.

    The window is the time from the take of the edit log to the return of the
    fsimage.

    On Wed, Dec 23, 2009 at 5:52 PM, Brian Bockelman <bbockelm@cse.unl.edu
    wrote:
    Hey Jason,

    This analysis seems fairly unlikely - are you claiming that no edits
    can
    be
    merged if files are being created? Isn't this what edits.new is for?

    We roll the edits log successfully during periods of high transfer,
    when
    a
    new file is being created every 1 second or so.

    We have had issues with unmergeable edits before - there might be some race
    conditions in this area.

    Brian
    On Dec 23, 2009, at 7:07 PM, Jason Venner wrote:

    I have no current solution.
    When I can block a few days, I am going to instrument the code a bit
    more
    to
    verify my understanding.

    I believe the issue is that the time stamp is being checked against
    the
    active edit log (the new one created then the checkpoint started)
    rather
    than the time stamp of the rolled (old) edit log.
    As long as no transactions have hit, the time stamps are the same.


    On Wed, Dec 23, 2009 at 11:23 AM, Stas Oskin <stas.oskin@gmail.com>
    wrote:
    Hi.

    What was your solution to this then?

    Regards.

    On Sat, Dec 5, 2009 at 7:43 AM, Jason Venner <
    jason.hadoop@gmail.com>
    wrote:
    I have dug into this more, it turns out the problem is unrelated to
    nfs
    or
    solaris.
    The issue is that if there is a meta data change, while the
    secondary
    is
    rebuilding the fsimage, the rebuilt image is rejected.
    On our production cluster, there is almost never a moment where
    there
    is
    not
    a file being created or altered, and as such the secondary is never
    make
    a
    fresh fsimage for the cluster.

    I have checked this with several hadoop variants and with vanilla
    distributions with the namenode, secondary and a datanode all
    running
    on
    the
    same machine.

    On Tue, Oct 27, 2009 at 8:03 PM, Jason Venner <
    jason.hadoop@gmail.com
    wrote:
    The namenode would never accept the rebuild fsimage from the
    secondary,
    so
    the edit logs grew with outbounds.


    On Tue, Oct 27, 2009 at 10:51 AM, Stas Oskin <
    stas.oskin@gmail.com>
    wrote:
    Hi.

    You mean, you couldn't recover the NameNode from checkpoints
    because
    of
    timestamps?

    Regards.

    On Tue, Oct 27, 2009 at 4:49 PM, Jason Venner <
    jason.hadoop@gmail.com
    wrote:
    We have been having some trouble with the secondary on a cluster
    that
    has
    one edit log partition on an nfs server, with the namenode
    rejecting
    the
    merged images due to timestamp missmatches.


    On Mon, Oct 26, 2009 at 10:14 AM, Stas Oskin <
    stas.oskin@gmail.com>
    wrote:
    Hi.

    Thanks for the advice, it seems that the initial approach of
    having
    single
    SecNameNode writing to exports is the way to go.

    By the way, I asked this already, but wanted to clarify:

    * It's possible to set how often SecNameNode checkpoints the
    data
    (what
    is
    the setting by the way)?

    * It's possible to let NameNode write to exports as well
    together
    with
    local
    disk, which ensures the latest possible meta-data in case of
    disk
    crash
    (compared to pereodic check-pointing), but it's going to slow
    down
    the
    operations due to network read/writes.

    Thanks again.

    On Thu, Oct 22, 2009 at 10:03 PM, Patrick Angeles
    wrote:
    From what I understand, it's rather tricky to set up multiple
    secondary
    namenodes. In either case, running multiple 2ndary NNs doesn't
    get
    you
    much.
    See this thread:
    http://www.mail-archive.com/core-user@hadoop.apache.org/msg06280.html
    On Wed, Oct 21, 2009 at 10:44 AM, Stas Oskin <
    stas.oskin@gmail.com>
    wrote:
    To clarify, it's either let single SecNameNode to write to
    multiple
    NFS
    exports, or actually have multiple SecNameNodes.

    Thanks again.

    On Wed, Oct 21, 2009 at 4:43 PM, Stas Oskin <
    stas.oskin@gmail.com
    wrote:
    Hi.

    I'm want to keep a checkpoint data on several separate
    machines
    for
    backup,
    and deliberating between exporting these machines disks via
    NFS,
    or
    actually
    running Secondary Name Nodes there.

    Can anyone advice what would be better in my case?

    Regards.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals

    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Raymond Jennings III at Dec 24, 2009 at 1:50 pm
    I have been recently seeing a problem where jobs stop at map 0% that previously worked fine (with no code changes.) Restarting hadoop on the cluster solves this problem but there is nothing in the log files to indicate what the problem is. Has anyone seen something similar?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 21, '09 at 2:45p
activeDec 24, '09 at 7:16p
posts14
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase