FAQ
A few weeks ago, I had sent an email about the progress of HDFS federation development in HDFS-1052 branch. I am happy to announce that all the tasks related to this feature development is complete and it is ready to be integrated into trunk.

I have a merge patch attached to HDFS-1052 jira. All Hudson tests pass except for two test failures. We will fix these unit test failures in trunk, post merge. I plan on completing merge to trunk early next week. I would like to do this ASAP to avoid having to keep the patch up to date (which has been time consuming). This also avoids need for re-merging, due to SVN changes proposed by Nigel, scheduled late next week. Comments are welcome.

Regards,
Suresh

Search Discussions

  • Dhruba Borthakur at Apr 23, 2011 at 8:08 am
    Given that we will be re-organizing the svn tree very soon and the fact that
    the design and most of the implementation is complete, let's merge it into
    trunk!

    -dhruba
    On Fri, Apr 22, 2011 at 9:48 AM, Suresh Srinivas wrote:

    A few weeks ago, I had sent an email about the progress of HDFS federation
    development in HDFS-1052 branch. I am happy to announce that all the tasks
    related to this feature development is complete and it is ready to be
    integrated into trunk.

    I have a merge patch attached to HDFS-1052 jira. All Hudson tests pass
    except for two test failures. We will fix these unit test failures in trunk,
    post merge. I plan on completing merge to trunk early next week. I would
    like to do this ASAP to avoid having to keep the patch up to date (which has
    been time consuming). This also avoids need for re-merging, due to SVN
    changes proposed by Nigel, scheduled late next week. Comments are welcome.

    Regards,
    Suresh


    --
    Connect to me at http://www.facebook.com/dhruba
  • Doug Cutting at Apr 25, 2011 at 9:37 pm

    On 04/22/2011 09:48 AM, Suresh Srinivas wrote:
    A few weeks ago, I had sent an email about the progress of HDFS
    federation development in HDFS-1052 branch. I am happy to announce
    that all the tasks related to this feature development is complete
    and it is ready to be integrated into trunk.
    A couple of questions:

    1. Can you please describe the significant advantages this approach has
    over a symlink-based approach?

    It seems to me that one could run multiple namenodes on separate boxes
    and run multile datanode processes per storage box configured with
    something like:

    first datanode process configuraton
    fs.default.name = hdfs://nn1/
    dfs.data.dir = /drive1/nn1/,drive2/nn1/...

    second datanode process configuraton
    fs.default.name = hdfs://nn2/
    dfs.data.dir = /drive1/nn2/,drive2/nn2/...

    ...

    Then symlinks could be used between nn1, nn2, etc to provide a
    reasonably unified namespace. From the benefits listed in the design
    document it is not clear to me what the clear, substantial benefits are
    over such a configuration.

    2. How much testing has been performed on this? The patch modifies much
    of the logic of Hadoop's central component, upon which the performance
    and reliability of most other components of the ecosystem depend. It
    seems to me that such an invasive change should be well tested before it
    is merged to trunk. Can you please tell me how this has been tested
    beyond unit tests?

    Thanks!

    Doug
  • Suresh srinivas at Apr 26, 2011 at 5:29 pm
    Doug,

    1. Can you please describe the significant advantages this approach has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could choose to
    provide integrated namespace using symlinks. However, client side mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to go to right
    namenode based on configuration. This avoids unnecessary RPCs to the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is configured does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the cluster and
    future changes to symlinks need not be propagated to multiple namenodes. In
    client side mount tables, this information is in a central configuration.

    If a deployment still wants to use symbolic link, federation does not
    preclude it.
    It seems to me that one could run multiple namenodes on separate boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of running
    separate datanodes in terms of process resources such as memory is huge.
    # The disk i/o management and storage utilization using a single datanode is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to manage.
    However with federation, all datanodes are in a single cluster; with single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central component, upon
    which the performance and reliability of most other components of the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have another
    level to incorporate block pool ID into the hierarchy. This is not a
    significant change that should cause performance or stability concerns.
    #* datanodes use a separate thread per NN, just like the existing thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing this branch
    for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing well and have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an email about
    this merge around 2 months ago. There are 90 subtasks that have been worked
    on last couple of months under HDFS-1052. Given that there was enough time
    to ask these questions, your email a day before I am planning to merge the
    branch into trunk seems late!

    --
    Regards,
    Suresh
  • Suresh srinivas at Apr 26, 2011 at 11:06 pm
    Doug, please reply back. I am planning to commit this by tonight, as I would
    like to avoid unnecessary merge work and also avoid having to redo the merge
    if SVN is re-organized.
    On Tue, Apr 26, 2011 at 10:29 AM, suresh srinivas wrote:

    Doug,

    1. Can you please describe the significant advantages this approach has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could choose to
    provide integrated namespace using symlinks. However, client side mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to go to right
    namenode based on configuration. This avoids unnecessary RPCs to the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is configured does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the cluster
    and future changes to symlinks need not be propagated to multiple namenodes.
    In client side mount tables, this information is in a central configuration.

    If a deployment still wants to use symbolic link, federation does not
    preclude it.

    It seems to me that one could run multiple namenodes on separate boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of running
    separate datanodes in terms of process resources such as memory is huge.
    # The disk i/o management and storage utilization using a single datanode
    is much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to manage.
    However with federation, all datanodes are in a single cluster; with single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central component, upon
    which the performance and reliability of most other components of the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have another
    level to incorporate block pool ID into the hierarchy. This is not a
    significant change that should cause performance or stability concerns.
    #* datanodes use a separate thread per NN, just like the existing thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing this
    branch for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing well and have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an email
    about this merge around 2 months ago. There are 90 subtasks that have been
    worked on last couple of months under HDFS-1052. Given that there was enough
    time to ask these questions, your email a day before I am planning to merge
    the branch into trunk seems late!

    --
    Regards,
    Suresh

    --
    Regards,
    Suresh
  • Doug Cutting at Apr 27, 2011 at 4:43 am
    Suresh, Sanjay,

    Thank you very much for addressing my questions.

    Cheers,

    Doug
    On 04/26/2011 10:29 AM, suresh srinivas wrote:
    Doug,

    1. Can you please describe the significant advantages this approach has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could choose to
    provide integrated namespace using symlinks. However, client side mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to go to right
    namenode based on configuration. This avoids unnecessary RPCs to the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is configured does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the cluster and
    future changes to symlinks need not be propagated to multiple namenodes. In
    client side mount tables, this information is in a central configuration.

    If a deployment still wants to use symbolic link, federation does not
    preclude it.
    It seems to me that one could run multiple namenodes on separate boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of running
    separate datanodes in terms of process resources such as memory is huge.
    # The disk i/o management and storage utilization using a single datanode is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to manage.
    However with federation, all datanodes are in a single cluster; with single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central component, upon
    which the performance and reliability of most other components of the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have another
    level to incorporate block pool ID into the hierarchy. This is not a
    significant change that should cause performance or stability concerns.
    #* datanodes use a separate thread per NN, just like the existing thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing this branch
    for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing well and have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an email about
    this merge around 2 months ago. There are 90 subtasks that have been worked
    on last couple of months under HDFS-1052. Given that there was enough time
    to ask these questions, your email a day before I am planning to merge the
    branch into trunk seems late!
  • Konstantin Shvachko at Apr 27, 2011 at 5:27 am
    Suresh, Sanjay.

    1. I asked for benchmarks many times over the course of different
    discussions on the topic.
    I don't see any numbers attached to jira, and I was getting the same
    response,
    Doug just got from you, guys: which is "why would the performance be worse".
    And this is not an argument for me.

    2. I assume that merging requires a vote. I am sure people who know bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    It feels like you are rushing this and are not doing what you would expect
    others to
    do in the same position, and what has been done in the past for such large
    projects.

    Thanks,
    --Konstantin

    On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting wrote:

    Suresh, Sanjay,

    Thank you very much for addressing my questions.

    Cheers,

    Doug
    On 04/26/2011 10:29 AM, suresh srinivas wrote:
    Doug,

    1. Can you please describe the significant advantages this approach has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could choose to
    provide integrated namespace using symlinks. However, client side mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to go to right
    namenode based on configuration. This avoids unnecessary RPCs to the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is configured does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the cluster and
    future changes to symlinks need not be propagated to multiple namenodes. In
    client side mount tables, this information is in a central configuration.

    If a deployment still wants to use symbolic link, federation does not
    preclude it.
    It seems to me that one could run multiple namenodes on separate boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of running
    separate datanodes in terms of process resources such as memory is huge.
    # The disk i/o management and storage utilization using a single datanode is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to manage.
    However with federation, all datanodes are in a single cluster; with single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central component, upon
    which the performance and reliability of most other components of the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have another
    level to incorporate block pool ID into the hierarchy. This is not a
    significant change that should cause performance or stability concerns.
    #* datanodes use a separate thread per NN, just like the existing thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing this branch
    for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing well and have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an email about
    this merge around 2 months ago. There are 90 subtasks that have been worked
    on last couple of months under HDFS-1052. Given that there was enough time
    to ask these questions, your email a day before I am planning to merge the
    branch into trunk seems late!
  • Suresh srinivas at Apr 27, 2011 at 6:34 am
    Konstantin,

    On Tue, Apr 26, 2011 at 10:26 PM, Konstantin Shvachko
    wrote:
    Suresh, Sanjay.

    1. I asked for benchmarks many times over the course of different
    discussions on the topic.
    I don't see any numbers attached to jira, and I was getting the same
    response,
    Doug just got from you, guys: which is "why would the performance be
    worse".
    And this is not an argument for me.
    We had done testing earlier and had found that performance had not degraded.
    We are waiting for out performance team to publish the official numbers to
    post it to the jira. Unfortunately they are busy qualifying 2xx releases
    currently. I will get the perf numbers and post them.

    2. I assume that merging requires a vote. I am sure people who know bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not necessary. If
    the right procedure is to call for voting, I will do so. Owen any comments?

    It feels like you are rushing this and are not doing what you would expect
    others to
    do in the same position, and what has been done in the past for such large
    projects.
    I am not trying to rush here and not follow the procedure required. I am not
    sure about what the procedure is. Any pointers to it is appreciated.

    Thanks,
    --Konstantin

    On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting wrote:

    Suresh, Sanjay,

    Thank you very much for addressing my questions.

    Cheers,

    Doug
    On 04/26/2011 10:29 AM, suresh srinivas wrote:
    Doug,

    1. Can you please describe the significant advantages this approach
    has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could choose to
    provide integrated namespace using symlinks. However, client side mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to go to right
    namenode based on configuration. This avoids unnecessary RPCs to the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is configured does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the
    cluster
    and
    future changes to symlinks need not be propagated to multiple
    namenodes.
    In
    client side mount tables, this information is in a central
    configuration.
    If a deployment still wants to use symbolic link, federation does not
    preclude it.
    It seems to me that one could run multiple namenodes on separate boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of running
    separate datanodes in terms of process resources such as memory is
    huge.
    # The disk i/o management and storage utilization using a single
    datanode
    is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to
    manage.
    However with federation, all datanodes are in a single cluster; with single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central component,
    upon
    which the performance and reliability of most other components of the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have another
    level to incorporate block pool ID into the hierarchy. This is not a
    significant change that should cause performance or stability concerns.
    #* datanodes use a separate thread per NN, just like the existing
    thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these
    tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing this branch
    for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing well and have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an email about
    this merge around 2 months ago. There are 90 subtasks that have been worked
    on last couple of months under HDFS-1052. Given that there was enough time
    to ask these questions, your email a day before I am planning to merge the
    branch into trunk seems late!


    --
    Regards,
    Suresh
  • Suresh srinivas at Apr 27, 2011 at 6:55 am
    Konstantin,

    Could you provide me link to how this was done on a big feature, like say
    append and how benchmark info was captured? I am planning to run dfsio
    tests, btw.

    Regards,
    Suresh
    On Tue, Apr 26, 2011 at 11:34 PM, suresh srinivas wrote:

    Konstantin,

    On Tue, Apr 26, 2011 at 10:26 PM, Konstantin Shvachko <
    shv.hadoop@gmail.com> wrote:
    Suresh, Sanjay.

    1. I asked for benchmarks many times over the course of different
    discussions on the topic.
    I don't see any numbers attached to jira, and I was getting the same
    response,
    Doug just got from you, guys: which is "why would the performance be
    worse".
    And this is not an argument for me.
    We had done testing earlier and had found that performance had not
    degraded. We are waiting for out performance team to publish the official
    numbers to post it to the jira. Unfortunately they are busy qualifying 2xx
    releases currently. I will get the perf numbers and post them.

    2. I assume that merging requires a vote. I am sure people who know bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not necessary. If
    the right procedure is to call for voting, I will do so. Owen any comments?

    It feels like you are rushing this and are not doing what you would expect
    others to
    do in the same position, and what has been done in the past for such large
    projects.
    I am not trying to rush here and not follow the procedure required. I am
    not sure about what the procedure is. Any pointers to it is appreciated.

    Thanks,
    --Konstantin

    On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting wrote:

    Suresh, Sanjay,

    Thank you very much for addressing my questions.

    Cheers,

    Doug
    On 04/26/2011 10:29 AM, suresh srinivas wrote:
    Doug,

    1. Can you please describe the significant advantages this approach
    has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could choose to
    provide integrated namespace using symlinks. However, client side
    mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to go to right
    namenode based on configuration. This avoids unnecessary RPCs to the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is configured does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the
    cluster
    and
    future changes to symlinks need not be propagated to multiple
    namenodes.
    In
    client side mount tables, this information is in a central
    configuration.
    If a deployment still wants to use symbolic link, federation does not
    preclude it.
    It seems to me that one could run multiple namenodes on separate
    boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of
    running
    separate datanodes in terms of process resources such as memory is
    huge.
    # The disk i/o management and storage utilization using a single
    datanode
    is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to
    manage.
    However with federation, all datanodes are in a single cluster; with single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central component,
    upon
    which the performance and reliability of most other components of the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have
    another
    level to incorporate block pool ID into the hierarchy. This is not a
    significant change that should cause performance or stability
    concerns.
    #* datanodes use a separate thread per NN, just like the existing
    thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these
    tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing this branch
    for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing well
    and
    have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an email about
    this merge around 2 months ago. There are 90 subtasks that have been worked
    on last couple of months under HDFS-1052. Given that there was enough time
    to ask these questions, your email a day before I am planning to merge the
    branch into trunk seems late!


    --
    Regards,
    Suresh

    --
    Regards,
    Suresh
  • Suresh srinivas at Apr 27, 2011 at 5:03 pm
    I posted the TestDFSIO comparison with and without federation to HDFS-1052.
    Please let me know if it addresses your concern. I am also adding it here:

    TestDFSIO read tests
    *Without federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:04:24 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 43.62329251162561
    Average IO rate mb/sec: 44.619869232177734
    IO rate std deviation: 5.060306158158443
    Test exec time sec: 959.943

    *With federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:43:10 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 45.657513857055456
    Average IO rate mb/sec: 46.72107696533203
    IO rate std deviation: 5.455125923399539
    Test exec time sec: 924.922

    TestDFSIO write tests
    *Without federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 01:47:50 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 35.940755259031015
    Average IO rate mb/sec: 38.236236572265625
    IO rate std deviation: 5.929484960036511
    Test exec time sec: 1266.624

    *With federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 02:27:12 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 42.17884674597227
    Average IO rate mb/sec: 43.11423873901367
    IO rate std deviation: 5.357057259968647
    Test exec time sec: 1135.298
    {noformat}

    On Tue, Apr 26, 2011 at 11:55 PM, suresh srinivas wrote:

    Konstantin,

    Could you provide me link to how this was done on a big feature, like say
    append and how benchmark info was captured? I am planning to run dfsio
    tests, btw.

    Regards,
    Suresh

    On Tue, Apr 26, 2011 at 11:34 PM, suresh srinivas wrote:

    Konstantin,

    On Tue, Apr 26, 2011 at 10:26 PM, Konstantin Shvachko <
    shv.hadoop@gmail.com> wrote:
    Suresh, Sanjay.

    1. I asked for benchmarks many times over the course of different
    discussions on the topic.
    I don't see any numbers attached to jira, and I was getting the same
    response,
    Doug just got from you, guys: which is "why would the performance be
    worse".
    And this is not an argument for me.
    We had done testing earlier and had found that performance had not
    degraded. We are waiting for out performance team to publish the official
    numbers to post it to the jira. Unfortunately they are busy qualifying 2xx
    releases currently. I will get the perf numbers and post them.

    2. I assume that merging requires a vote. I am sure people who know
    bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not necessary. If
    the right procedure is to call for voting, I will do so. Owen any comments?

    It feels like you are rushing this and are not doing what you would
    expect
    others to
    do in the same position, and what has been done in the past for such
    large
    projects.
    I am not trying to rush here and not follow the procedure required. I am
    not sure about what the procedure is. Any pointers to it is appreciated.

    Thanks,
    --Konstantin


    On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting <cutting@apache.org>
    wrote:
    Suresh, Sanjay,

    Thank you very much for addressing my questions.

    Cheers,

    Doug
    On 04/26/2011 10:29 AM, suresh srinivas wrote:
    Doug,

    1. Can you please describe the significant advantages this approach
    has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could choose
    to
    provide integrated namespace using symlinks. However, client side
    mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to go to right
    namenode based on configuration. This avoids unnecessary RPCs to the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is
    configured
    does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the
    cluster
    and
    future changes to symlinks need not be propagated to multiple
    namenodes.
    In
    client side mount tables, this information is in a central
    configuration.
    If a deployment still wants to use symbolic link, federation does not
    preclude it.
    It seems to me that one could run multiple namenodes on separate
    boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of
    running
    separate datanodes in terms of process resources such as memory is
    huge.
    # The disk i/o management and storage utilization using a single
    datanode
    is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to
    manage.
    However with federation, all datanodes are in a single cluster; with single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central component,
    upon
    which the performance and reliability of most other components of the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have
    another
    level to incorporate block pool ID into the hierarchy. This is not a
    significant change that should cause performance or stability
    concerns.
    #* datanodes use a separate thread per NN, just like the existing
    thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these
    tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing this branch
    for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing well
    and
    have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an
    email
    about
    this merge around 2 months ago. There are 90 subtasks that have been worked
    on last couple of months under HDFS-1052. Given that there was enough time
    to ask these questions, your email a day before I am planning to
    merge
    the
    branch into trunk seems late!


    --
    Regards,
    Suresh

    --
    Regards,
    Suresh

    --
    Regards,
    Suresh
  • Tsz Wo \(Nicholas\), Sze at Apr 27, 2011 at 5:09 pm
    It is not a surprise that the performance of Federation is better than trunk
    since, as Suresh mentioned previously, we improved some components of HDFS when
    we were developing Federation.

    Regards,
    Nicholas





    ________________________________
    From: suresh srinivas <srini30005@gmail.com>
    To: hdfs-dev@hadoop.apache.org
    Sent: Wed, April 27, 2011 10:02:32 AM
    Subject: Re: [Discuss] Merge federation branch HDFS-1052 into trunk

    I posted the TestDFSIO comparison with and without federation to HDFS-1052.
    Please let me know if it addresses your concern. I am also adding it here:

    TestDFSIO read tests
    *Without federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:04:24 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 43.62329251162561
    Average IO rate mb/sec: 44.619869232177734
    IO rate std deviation: 5.060306158158443
    Test exec time sec: 959.943

    *With federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:43:10 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 45.657513857055456
    Average IO rate mb/sec: 46.72107696533203
    IO rate std deviation: 5.455125923399539
    Test exec time sec: 924.922

    TestDFSIO write tests
    *Without federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 01:47:50 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 35.940755259031015
    Average IO rate mb/sec: 38.236236572265625
    IO rate std deviation: 5.929484960036511
    Test exec time sec: 1266.624

    *With federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 02:27:12 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 42.17884674597227
    Average IO rate mb/sec: 43.11423873901367
    IO rate std deviation: 5.357057259968647
    Test exec time sec: 1135.298
    {noformat}

    On Tue, Apr 26, 2011 at 11:55 PM, suresh srinivas wrote:

    Konstantin,

    Could you provide me link to how this was done on a big feature, like say
    append and how benchmark info was captured? I am planning to run dfsio
    tests, btw.

    Regards,
    Suresh

    On Tue, Apr 26, 2011 at 11:34 PM, suresh srinivas wrote:

    Konstantin,

    On Tue, Apr 26, 2011 at 10:26 PM, Konstantin Shvachko <
    shv.hadoop@gmail.com> wrote:
    Suresh, Sanjay.

    1. I asked for benchmarks many times over the course of different
    discussions on the topic.
    I don't see any numbers attached to jira, and I was getting the same
    response,
    Doug just got from you, guys: which is "why would the performance be
    worse".
    And this is not an argument for me.
    We had done testing earlier and had found that performance had not
    degraded. We are waiting for out performance team to publish the official
    numbers to post it to the jira. Unfortunately they are busy qualifying 2xx
    releases currently. I will get the perf numbers and post them.

    2. I assume that merging requires a vote. I am sure people who know
    bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not necessary. If
    the right procedure is to call for voting, I will do so. Owen any comments?

    It feels like you are rushing this and are not doing what you would
    expect
    others to
    do in the same position, and what has been done in the past for such
    large
    projects.
    I am not trying to rush here and not follow the procedure required. I am
    not sure about what the procedure is. Any pointers to it is appreciated.

    Thanks,
    --Konstantin


    On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting <cutting@apache.org>
    wrote:
    Suresh, Sanjay,

    Thank you very much for addressing my questions.

    Cheers,

    Doug
    On 04/26/2011 10:29 AM, suresh srinivas wrote:
    Doug,

    1. Can you please describe the significant advantages this approach
    has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could choose
    to
    provide integrated namespace using symlinks. However, client side
    mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to go to right
    namenode based on configuration. This avoids unnecessary RPCs to the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is
    configured
    does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the
    cluster
    and
    future changes to symlinks need not be propagated to multiple
    namenodes.
    In
    client side mount tables, this information is in a central
    configuration.
    If a deployment still wants to use symbolic link, federation does not
    preclude it.
    It seems to me that one could run multiple namenodes on separate
    boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of
    running
    separate datanodes in terms of process resources such as memory is
    huge.
    # The disk i/o management and storage utilization using a single
    datanode
    is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to
    manage.
    However with federation, all datanodes are in a single cluster; with single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central component,
    upon
    which the performance and reliability of most other components of the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have
    another
    level to incorporate block pool ID into the hierarchy. This is not a
    significant change that should cause performance or stability
    concerns.
    #* datanodes use a separate thread per NN, just like the existing
    thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these
    tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing this branch
    for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing well
    and
    have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an
    email
    about
    this merge around 2 months ago. There are 90 subtasks that have been worked
    on last couple of months under HDFS-1052. Given that there was enough time
    to ask these questions, your email a day before I am planning to
    merge
    the
    branch into trunk seems late!


    --
    Regards,
    Suresh

    --
    Regards,
    Suresh

    --
    Regards,
    Suresh
  • Devaraj Das at Apr 27, 2011 at 5:09 pm
    Good to see the performance improvements with federation. Curious to know whether it is because of the associated refactoring?


    On 4/27/11 10:02 AM, "suresh srinivas" wrote:

    I posted the TestDFSIO comparison with and without federation to HDFS-1052.
    Please let me know if it addresses your concern. I am also adding it here:

    TestDFSIO read tests
    *Without federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:04:24 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 43.62329251162561
    Average IO rate mb/sec: 44.619869232177734
    IO rate std deviation: 5.060306158158443
    Test exec time sec: 959.943

    *With federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:43:10 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 45.657513857055456
    Average IO rate mb/sec: 46.72107696533203
    IO rate std deviation: 5.455125923399539
    Test exec time sec: 924.922

    TestDFSIO write tests
    *Without federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 01:47:50 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 35.940755259031015
    Average IO rate mb/sec: 38.236236572265625
    IO rate std deviation: 5.929484960036511
    Test exec time sec: 1266.624

    *With federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 02:27:12 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 42.17884674597227
    Average IO rate mb/sec: 43.11423873901367
    IO rate std deviation: 5.357057259968647
    Test exec time sec: 1135.298
    {noformat}

    On Tue, Apr 26, 2011 at 11:55 PM, suresh srinivas wrote:

    Konstantin,

    Could you provide me link to how this was done on a big feature, like say
    append and how benchmark info was captured? I am planning to run dfsio
    tests, btw.

    Regards,
    Suresh

    On Tue, Apr 26, 2011 at 11:34 PM, suresh srinivas wrote:

    Konstantin,

    On Tue, Apr 26, 2011 at 10:26 PM, Konstantin Shvachko <
    shv.hadoop@gmail.com> wrote:
    Suresh, Sanjay.

    1. I asked for benchmarks many times over the course of different
    discussions on the topic.
    I don't see any numbers attached to jira, and I was getting the same
    response,
    Doug just got from you, guys: which is "why would the performance be
    worse".
    And this is not an argument for me.
    We had done testing earlier and had found that performance had not
    degraded. We are waiting for out performance team to publish the official
    numbers to post it to the jira. Unfortunately they are busy qualifying 2xx
    releases currently. I will get the perf numbers and post them.

    2. I assume that merging requires a vote. I am sure people who know
    bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not necessary. If
    the right procedure is to call for voting, I will do so. Owen any comments?

    It feels like you are rushing this and are not doing what you would
    expect
    others to
    do in the same position, and what has been done in the past for such
    large
    projects.
    I am not trying to rush here and not follow the procedure required. I am
    not sure about what the procedure is. Any pointers to it is appreciated.

    Thanks,
    --Konstantin


    On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting <cutting@apache.org>
    wrote:
    Suresh, Sanjay,

    Thank you very much for addressing my questions.

    Cheers,

    Doug
    On 04/26/2011 10:29 AM, suresh srinivas wrote:
    Doug,

    1. Can you please describe the significant advantages this approach
    has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could choose
    to
    provide integrated namespace using symlinks. However, client side
    mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to go to right
    namenode based on configuration. This avoids unnecessary RPCs to the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is
    configured
    does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the
    cluster
    and
    future changes to symlinks need not be propagated to multiple
    namenodes.
    In
    client side mount tables, this information is in a central
    configuration.
    If a deployment still wants to use symbolic link, federation does not
    preclude it.
    It seems to me that one could run multiple namenodes on separate
    boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of
    running
    separate datanodes in terms of process resources such as memory is
    huge.
    # The disk i/o management and storage utilization using a single
    datanode
    is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to
    manage.
    However with federation, all datanodes are in a single cluster; with single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central component,
    upon
    which the performance and reliability of most other components of the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have
    another
    level to incorporate block pool ID into the hierarchy. This is not a
    significant change that should cause performance or stability
    concerns.
    #* datanodes use a separate thread per NN, just like the existing
    thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these
    tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing this branch
    for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing well
    and
    have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an
    email
    about
    this merge around 2 months ago. There are 90 subtasks that have been worked
    on last couple of months under HDFS-1052. Given that there was enough time
    to ask these questions, your email a day before I am planning to
    merge
    the
    branch into trunk seems late!


    --
    Regards,
    Suresh

    --
    Regards,
    Suresh

    --
    Regards,
    Suresh
  • Konstantin Boudnik at Apr 27, 2011 at 5:42 pm
    Interesting... while the read performance has only marginally improved
    <4% (still a good thing) the write performance shows significantly
    better improvements >10%. Very interesting asymmetry, indeed.

    Suresh, what was the size of the cluster in the testing?
    Cos
    On Wed, Apr 27, 2011 at 10:02, suresh srinivas wrote:
    I posted the TestDFSIO comparison with and without federation to HDFS-1052.
    Please let me know if it addresses your concern. I am also adding it here:

    TestDFSIO read tests
    *Without federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:04:24 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 43.62329251162561
    Average IO rate mb/sec: 44.619869232177734
    IO rate std deviation: 5.060306158158443
    Test exec time sec: 959.943

    *With federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:43:10 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 45.657513857055456
    Average IO rate mb/sec: 46.72107696533203
    IO rate std deviation: 5.455125923399539
    Test exec time sec: 924.922

    TestDFSIO write tests
    *Without federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 01:47:50 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 35.940755259031015
    Average IO rate mb/sec: 38.236236572265625
    IO rate std deviation: 5.929484960036511
    Test exec time sec: 1266.624

    *With federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 02:27:12 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 42.17884674597227
    Average IO rate mb/sec: 43.11423873901367
    IO rate std deviation: 5.357057259968647
    Test exec time sec: 1135.298
    {noformat}

    On Tue, Apr 26, 2011 at 11:55 PM, suresh srinivas wrote:

    Konstantin,

    Could you provide me link to how this was done on a big feature, like say
    append and how benchmark info was captured? I am planning to run dfsio
    tests, btw.

    Regards,
    Suresh

    On Tue, Apr 26, 2011 at 11:34 PM, suresh srinivas wrote:

    Konstantin,

    On Tue, Apr 26, 2011 at 10:26 PM, Konstantin Shvachko <
    shv.hadoop@gmail.com> wrote:
    Suresh, Sanjay.

    1. I asked for benchmarks many times over the course of different
    discussions on the topic.
    I don't see any numbers attached to jira, and I was getting the same
    response,
    Doug just got from you, guys: which is "why would the performance be
    worse".
    And this is not an argument for me.
    We had done testing earlier and had found that performance had not
    degraded. We are waiting for out performance team to publish the official
    numbers to post it to the jira. Unfortunately they are busy qualifying 2xx
    releases currently. I will get the perf numbers and post them.

    2. I assume that merging requires a vote. I am sure people who know
    bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not necessary. If
    the right procedure is to call for voting, I will do so. Owen any comments?

    It feels like you are rushing this and are not doing what you would
    expect
    others to
    do in the same position, and what has been done in the past for such
    large
    projects.
    I am not trying to rush here and not follow the procedure required. I am
    not sure about what the procedure is. Any pointers to it is appreciated.

    Thanks,
    --Konstantin


    On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting <cutting@apache.org>
    wrote:
    Suresh, Sanjay,

    Thank you very much for addressing my questions.

    Cheers,

    Doug
    On 04/26/2011 10:29 AM, suresh srinivas wrote:
    Doug,

    1. Can you please describe the significant advantages this approach
    has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could choose
    to
    provide integrated namespace using symlinks. However, client side
    mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to go to right
    namenode based on configuration. This avoids unnecessary RPCs to the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is
    configured
    does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the
    cluster
    and
    future changes to symlinks need not be propagated to multiple
    namenodes.
    In
    client side mount tables, this information is in a central
    configuration.
    If a deployment still wants to use symbolic link, federation does not
    preclude it.
    It seems to me that one could run multiple namenodes on separate
    boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of
    running
    separate datanodes in terms of process resources such as memory is
    huge.
    # The disk i/o management and storage utilization using a single
    datanode
    is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to
    manage.
    However with federation, all datanodes are in a single cluster; with single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central component,
    upon
    which the performance and reliability of most other components of the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have
    another
    level to incorporate block pool ID into the hierarchy. This is not a
    significant change that should cause performance or stability
    concerns.
    #* datanodes use a separate thread per NN, just like the existing
    thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these
    tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing this branch
    for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing well
    and
    have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an
    email
    about
    this merge around 2 months ago. There are 90 subtasks that have been worked
    on last couple of months under HDFS-1052. Given that there was enough time
    to ask these questions, your email a day before I am planning to
    merge
    the
    branch into trunk seems late!


    --
    Regards,
    Suresh

    --
    Regards,
    Suresh

    --
    Regards,
    Suresh
  • Suresh srinivas at Apr 27, 2011 at 9:36 pm
    I ran these tests on my laptop. I would like to use this data to emphasize
    that there is no regression in performance. I am not sure with just the
    tests that I ran we could conclude there is a huge gain in performance with
    federation. When out performance test team runs tests at scale we will get
    more clearer picture.


    On Wed, Apr 27, 2011 at 10:41 AM, Konstantin Boudnik wrote:

    Interesting... while the read performance has only marginally improved
    <4% (still a good thing) the write performance shows significantly
    better improvements >10%. Very interesting asymmetry, indeed.

    Suresh, what was the size of the cluster in the testing?
    Cos
    On Wed, Apr 27, 2011 at 10:02, suresh srinivas wrote:
    I posted the TestDFSIO comparison with and without federation to
    HDFS-1052.
    Please let me know if it addresses your concern. I am also adding it here:
    TestDFSIO read tests
    *Without federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:04:24 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 43.62329251162561
    Average IO rate mb/sec: 44.619869232177734
    IO rate std deviation: 5.060306158158443
    Test exec time sec: 959.943

    *With federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:43:10 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 45.657513857055456
    Average IO rate mb/sec: 46.72107696533203
    IO rate std deviation: 5.455125923399539
    Test exec time sec: 924.922

    TestDFSIO write tests
    *Without federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 01:47:50 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 35.940755259031015
    Average IO rate mb/sec: 38.236236572265625
    IO rate std deviation: 5.929484960036511
    Test exec time sec: 1266.624

    *With federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 02:27:12 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 42.17884674597227
    Average IO rate mb/sec: 43.11423873901367
    IO rate std deviation: 5.357057259968647
    Test exec time sec: 1135.298
    {noformat}


    On Tue, Apr 26, 2011 at 11:55 PM, suresh srinivas <srini30005@gmail.com
    wrote:
    Konstantin,

    Could you provide me link to how this was done on a big feature, like
    say
    append and how benchmark info was captured? I am planning to run dfsio
    tests, btw.

    Regards,
    Suresh


    On Tue, Apr 26, 2011 at 11:34 PM, suresh srinivas <srini30005@gmail.com
    wrote:
    Konstantin,

    On Tue, Apr 26, 2011 at 10:26 PM, Konstantin Shvachko <
    shv.hadoop@gmail.com> wrote:
    Suresh, Sanjay.

    1. I asked for benchmarks many times over the course of different
    discussions on the topic.
    I don't see any numbers attached to jira, and I was getting the same
    response,
    Doug just got from you, guys: which is "why would the performance be
    worse".
    And this is not an argument for me.
    We had done testing earlier and had found that performance had not
    degraded. We are waiting for out performance team to publish the
    official
    numbers to post it to the jira. Unfortunately they are busy qualifying
    2xx
    releases currently. I will get the perf numbers and post them.

    2. I assume that merging requires a vote. I am sure people who know
    bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not necessary.
    If
    the right procedure is to call for voting, I will do so. Owen any
    comments?
    It feels like you are rushing this and are not doing what you would
    expect
    others to
    do in the same position, and what has been done in the past for such
    large
    projects.
    I am not trying to rush here and not follow the procedure required. I
    am
    not sure about what the procedure is. Any pointers to it is
    appreciated.
    Thanks,
    --Konstantin


    On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting <cutting@apache.org>
    wrote:
    Suresh, Sanjay,

    Thank you very much for addressing my questions.

    Cheers,

    Doug
    On 04/26/2011 10:29 AM, suresh srinivas wrote:
    Doug,

    1. Can you please describe the significant advantages this
    approach
    has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could
    choose
    to
    provide integrated namespace using symlinks. However, client side
    mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to go
    to
    right
    namenode based on configuration. This avoids unnecessary RPCs to
    the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is
    configured
    does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the
    cluster
    and
    future changes to symlinks need not be propagated to multiple
    namenodes.
    In
    client side mount tables, this information is in a central
    configuration.
    If a deployment still wants to use symbolic link, federation does
    not
    preclude it.
    It seems to me that one could run multiple namenodes on separate
    boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of
    running
    separate datanodes in terms of process resources such as memory is
    huge.
    # The disk i/o management and storage utilization using a single
    datanode
    is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to
    manage.
    However with federation, all datanodes are in a single cluster;
    with
    single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central
    component,
    upon
    which the performance and reliability of most other components of
    the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have
    another
    level to incorporate block pool ID into the hierarchy. This is not
    a
    significant change that should cause performance or stability
    concerns.
    #* datanodes use a separate thread per NN, just like the existing
    thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit
    tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these
    tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing
    this
    branch
    for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing
    well
    and
    have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an
    email
    about
    this merge around 2 months ago. There are 90 subtasks that have
    been
    worked
    on last couple of months under HDFS-1052. Given that there was
    enough
    time
    to ask these questions, your email a day before I am planning to
    merge
    the
    branch into trunk seems late!


    --
    Regards,
    Suresh

    --
    Regards,
    Suresh

    --
    Regards,
    Suresh


    --
    Regards,
    Suresh
  • Konstantin Shvachko at Apr 28, 2011 at 5:18 am
    Suresh,
    Showing no degradation in performance on one-node cluster is a good start
    for benchmarking.
    You still have a dev cluster to run benchmarks, don't you?
    --Konstantin
    On Wed, Apr 27, 2011 at 2:36 PM, suresh srinivas wrote:

    I ran these tests on my laptop. I would like to use this data to emphasize
    that there is no regression in performance. I am not sure with just the
    tests that I ran we could conclude there is a huge gain in performance with
    federation. When out performance test team runs tests at scale we will get
    more clearer picture.



    On Wed, Apr 27, 2011 at 10:41 AM, Konstantin Boudnik <cos@boudnik.org
    wrote:
    Interesting... while the read performance has only marginally improved
    <4% (still a good thing) the write performance shows significantly
    better improvements >10%. Very interesting asymmetry, indeed.

    Suresh, what was the size of the cluster in the testing?
    Cos

    On Wed, Apr 27, 2011 at 10:02, suresh srinivas <srini30005@gmail.com>
    wrote:
    I posted the TestDFSIO comparison with and without federation to
    HDFS-1052.
    Please let me know if it addresses your concern. I am also adding it here:
    TestDFSIO read tests
    *Without federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:04:24 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 43.62329251162561
    Average IO rate mb/sec: 44.619869232177734
    IO rate std deviation: 5.060306158158443
    Test exec time sec: 959.943

    *With federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:43:10 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 45.657513857055456
    Average IO rate mb/sec: 46.72107696533203
    IO rate std deviation: 5.455125923399539
    Test exec time sec: 924.922

    TestDFSIO write tests
    *Without federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 01:47:50 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 35.940755259031015
    Average IO rate mb/sec: 38.236236572265625
    IO rate std deviation: 5.929484960036511
    Test exec time sec: 1266.624

    *With federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 02:27:12 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 42.17884674597227
    Average IO rate mb/sec: 43.11423873901367
    IO rate std deviation: 5.357057259968647
    Test exec time sec: 1135.298
    {noformat}


    On Tue, Apr 26, 2011 at 11:55 PM, suresh srinivas <
    srini30005@gmail.com
    wrote:
    Konstantin,

    Could you provide me link to how this was done on a big feature, like
    say
    append and how benchmark info was captured? I am planning to run dfsio
    tests, btw.

    Regards,
    Suresh


    On Tue, Apr 26, 2011 at 11:34 PM, suresh srinivas <
    srini30005@gmail.com
    wrote:
    Konstantin,

    On Tue, Apr 26, 2011 at 10:26 PM, Konstantin Shvachko <
    shv.hadoop@gmail.com> wrote:
    Suresh, Sanjay.

    1. I asked for benchmarks many times over the course of different
    discussions on the topic.
    I don't see any numbers attached to jira, and I was getting the same
    response,
    Doug just got from you, guys: which is "why would the performance be
    worse".
    And this is not an argument for me.
    We had done testing earlier and had found that performance had not
    degraded. We are waiting for out performance team to publish the
    official
    numbers to post it to the jira. Unfortunately they are busy
    qualifying
    2xx
    releases currently. I will get the perf numbers and post them.

    2. I assume that merging requires a vote. I am sure people who know
    bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not
    necessary.
    If
    the right procedure is to call for voting, I will do so. Owen any
    comments?
    It feels like you are rushing this and are not doing what you would
    expect
    others to
    do in the same position, and what has been done in the past for such
    large
    projects.
    I am not trying to rush here and not follow the procedure required. I
    am
    not sure about what the procedure is. Any pointers to it is
    appreciated.
    Thanks,
    --Konstantin


    On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting <cutting@apache.org>
    wrote:
    Suresh, Sanjay,

    Thank you very much for addressing my questions.

    Cheers,

    Doug
    On 04/26/2011 10:29 AM, suresh srinivas wrote:
    Doug,

    1. Can you please describe the significant advantages this
    approach
    has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could
    choose
    to
    provide integrated namespace using symlinks. However, client
    side
    mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to
    go
    to
    right
    namenode based on configuration. This avoids unnecessary RPCs to
    the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is
    configured
    does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the
    cluster
    and
    future changes to symlinks need not be propagated to multiple
    namenodes.
    In
    client side mount tables, this information is in a central
    configuration.
    If a deployment still wants to use symbolic link, federation
    does
    not
    preclude it.
    It seems to me that one could run multiple namenodes on
    separate
    boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of
    running
    separate datanodes in terms of process resources such as memory
    is
    huge.
    # The disk i/o management and storage utilization using a single
    datanode
    is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters
    to
    manage.
    However with federation, all datanodes are in a single cluster;
    with
    single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central
    component,
    upon
    which the performance and reliability of most other components
    of
    the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have
    another
    level to incorporate block pool ID into the hierarchy. This is
    not
    a
    significant change that should cause performance or stability
    concerns.
    #* datanodes use a separate thread per NN, just like the
    existing
    thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit
    tests?
    As regards to testing, we have passed 600+ tests. In hadoop,
    these
    tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing
    this
    branch
    for last 4 months, with QA validation that reflects our
    production
    environment. We have found the system to be stable, performing
    well
    and
    have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an
    email
    about
    this merge around 2 months ago. There are 90 subtasks that have
    been
    worked
    on last couple of months under HDFS-1052. Given that there was
    enough
    time
    to ask these questions, your email a day before I am planning to
    merge
    the
    branch into trunk seems late!


    --
    Regards,
    Suresh

    --
    Regards,
    Suresh

    --
    Regards,
    Suresh


    --
    Regards,
    Suresh
  • Hairong at Apr 27, 2011 at 5:47 pm
    Nice performance data! The federation branch definitely adds code
    complexity to HDFS, but this is a long waited feature to improve HDFS
    scalability and is a step forward to separating the namespace management
    from the storage management. I am for merging this to trunk.

    Hairong
    On 4/27/11 10:02 AM, "suresh srinivas" wrote:

    I posted the TestDFSIO comparison with and without federation to
    HDFS-1052.
    Please let me know if it addresses your concern. I am also adding it here:

    TestDFSIO read tests
    *Without federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:04:24 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 43.62329251162561
    Average IO rate mb/sec: 44.619869232177734
    IO rate std deviation: 5.060306158158443
    Test exec time sec: 959.943

    *With federation:*
    ----- TestDFSIO ----- : read
    Date & time: Wed Apr 27 02:43:10 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 45.657513857055456
    Average IO rate mb/sec: 46.72107696533203
    IO rate std deviation: 5.455125923399539
    Test exec time sec: 924.922

    TestDFSIO write tests
    *Without federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 01:47:50 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 35.940755259031015
    Average IO rate mb/sec: 38.236236572265625
    IO rate std deviation: 5.929484960036511
    Test exec time sec: 1266.624

    *With federation:*
    ----- TestDFSIO ----- : write
    Date & time: Wed Apr 27 02:27:12 PDT 2011
    Number of files: 1000
    Total MBytes processed: 30000.0
    Throughput mb/sec: 42.17884674597227
    Average IO rate mb/sec: 43.11423873901367
    IO rate std deviation: 5.357057259968647
    Test exec time sec: 1135.298
    {noformat}


    On Tue, Apr 26, 2011 at 11:55 PM, suresh srinivas
    wrote:
    Konstantin,

    Could you provide me link to how this was done on a big feature, like
    say
    append and how benchmark info was captured? I am planning to run dfsio
    tests, btw.

    Regards,
    Suresh


    On Tue, Apr 26, 2011 at 11:34 PM, suresh srinivas
    wrote:
    Konstantin,

    On Tue, Apr 26, 2011 at 10:26 PM, Konstantin Shvachko <
    shv.hadoop@gmail.com> wrote:
    Suresh, Sanjay.

    1. I asked for benchmarks many times over the course of different
    discussions on the topic.
    I don't see any numbers attached to jira, and I was getting the same
    response,
    Doug just got from you, guys: which is "why would the performance be
    worse".
    And this is not an argument for me.
    We had done testing earlier and had found that performance had not
    degraded. We are waiting for out performance team to publish the
    official
    numbers to post it to the jira. Unfortunately they are busy qualifying
    2xx
    releases currently. I will get the perf numbers and post them.

    2. I assume that merging requires a vote. I am sure people who know
    bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not
    necessary. If
    the right procedure is to call for voting, I will do so. Owen any
    comments?

    It feels like you are rushing this and are not doing what you would
    expect
    others to
    do in the same position, and what has been done in the past for such
    large
    projects.
    I am not trying to rush here and not follow the procedure required. I
    am
    not sure about what the procedure is. Any pointers to it is
    appreciated.

    Thanks,
    --Konstantin


    On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting <cutting@apache.org>
    wrote:
    Suresh, Sanjay,

    Thank you very much for addressing my questions.

    Cheers,

    Doug
    On 04/26/2011 10:29 AM, suresh srinivas wrote:
    Doug,

    1. Can you please describe the significant advantages this
    approach
    has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could
    choose
    to
    provide integrated namespace using symlinks. However, client side
    mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to
    go to
    right
    namenode based on configuration. This avoids unnecessary RPCs to
    the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is
    configured
    does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the
    cluster
    and
    future changes to symlinks need not be propagated to multiple
    namenodes.
    In
    client side mount tables, this information is in a central
    configuration.
    If a deployment still wants to use symbolic link, federation does
    not
    preclude it.
    It seems to me that one could run multiple namenodes on separate
    boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of
    running
    separate datanodes in terms of process resources such as memory is
    huge.
    # The disk i/o management and storage utilization using a single
    datanode
    is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to
    manage.
    However with federation, all datanodes are in a single cluster;
    with
    single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central
    component,
    upon
    which the performance and reliability of most other components of
    the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have
    another
    level to incorporate block pool ID into the hierarchy. This is
    not a
    significant change that should cause performance or stability
    concerns.
    #* datanodes use a separate thread per NN, just like the existing
    thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit
    tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these
    tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing
    this
    branch
    for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing
    well
    and
    have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an
    email
    about
    this merge around 2 months ago. There are 90 subtasks that have
    been
    worked
    on last couple of months under HDFS-1052. Given that there was
    enough
    time
    to ask these questions, your email a day before I am planning to
    merge
    the
    branch into trunk seems late!


    --
    Regards,
    Suresh

    --
    Regards,
    Suresh

    --
    Regards,
    Suresh
  • Konstantin Shvachko at Apr 28, 2011 at 4:57 am
    Yes, I can talk about append as an example.
    Some differences with federation project are:
    - append had a comprehensive test plan document, which was designed an
    executed;
    - append was independently evaluated by HBase guys;
    - it introduced new benchmark for append;
    - We ran both DFSIO and NNThroughput. DFSIO was executed on a relatively
    small cluster. I couldn't find where I posted the results, my bad. But you
    may be able to find these tasks in our scrum records.

    --Konstantin

    On Tue, Apr 26, 2011 at 11:55 PM, suresh srinivas wrote:

    Konstantin,

    Could you provide me link to how this was done on a big feature, like say
    append and how benchmark info was captured? I am planning to run dfsio
    tests, btw.

    Regards,
    Suresh

    On Tue, Apr 26, 2011 at 11:34 PM, suresh srinivas <srini30005@gmail.com
    wrote:
    Konstantin,

    On Tue, Apr 26, 2011 at 10:26 PM, Konstantin Shvachko <
    shv.hadoop@gmail.com> wrote:
    Suresh, Sanjay.

    1. I asked for benchmarks many times over the course of different
    discussions on the topic.
    I don't see any numbers attached to jira, and I was getting the same
    response,
    Doug just got from you, guys: which is "why would the performance be
    worse".
    And this is not an argument for me.
    We had done testing earlier and had found that performance had not
    degraded. We are waiting for out performance team to publish the official
    numbers to post it to the jira. Unfortunately they are busy qualifying 2xx
    releases currently. I will get the perf numbers and post them.

    2. I assume that merging requires a vote. I am sure people who know
    bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not necessary. If
    the right procedure is to call for voting, I will do so. Owen any
    comments?
    It feels like you are rushing this and are not doing what you would
    expect
    others to
    do in the same position, and what has been done in the past for such
    large
    projects.
    I am not trying to rush here and not follow the procedure required. I am
    not sure about what the procedure is. Any pointers to it is appreciated.

    Thanks,
    --Konstantin

    On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting wrote:

    Suresh, Sanjay,

    Thank you very much for addressing my questions.

    Cheers,

    Doug
    On 04/26/2011 10:29 AM, suresh srinivas wrote:
    Doug,

    1. Can you please describe the significant advantages this approach
    has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could choose
    to
    provide integrated namespace using symlinks. However, client side
    mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to go
    to
    right
    namenode based on configuration. This avoids unnecessary RPCs to the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is
    configured
    does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the
    cluster
    and
    future changes to symlinks need not be propagated to multiple
    namenodes.
    In
    client side mount tables, this information is in a central
    configuration.
    If a deployment still wants to use symbolic link, federation does
    not
    preclude it.
    It seems to me that one could run multiple namenodes on separate
    boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of
    running
    separate datanodes in terms of process resources such as memory is
    huge.
    # The disk i/o management and storage utilization using a single
    datanode
    is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to
    manage.
    However with federation, all datanodes are in a single cluster; with single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central component,
    upon
    which the performance and reliability of most other components of
    the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have
    another
    level to incorporate block pool ID into the hierarchy. This is not a
    significant change that should cause performance or stability
    concerns.
    #* datanodes use a separate thread per NN, just like the existing
    thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these
    tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing
    this
    branch
    for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing well
    and
    have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an
    email
    about
    this merge around 2 months ago. There are 90 subtasks that have been worked
    on last couple of months under HDFS-1052. Given that there was
    enough
    time
    to ask these questions, your email a day before I am planning to
    merge
    the
    branch into trunk seems late!


    --
    Regards,
    Suresh

    --
    Regards,
    Suresh
  • Konstantin Boudnik at Apr 28, 2011 at 1:45 pm
    +1. Having an open QE process would be a tremendous value-add to the
    overall quality of the feature. Append was an exemplary development in
    this sense. Would it be possible to have Federation test plan (if
    exists) to be published along with the specs on the JIRA (similar to
    HDFS-265) at least for the reference?

    Cos
    On Wed, Apr 27, 2011 at 21:56, Konstantin Shvachko wrote:
    Yes, I can talk about append as an example.
    Some differences with federation project are:
    - append had a comprehensive test plan document, which was designed an
    executed;
    - append was independently evaluated by HBase guys;
    - it introduced new benchmark for append;
    - We ran both DFSIO and NNThroughput. DFSIO was executed on a relatively
    small cluster. I couldn't find where I posted the results, my bad. But you
    may be able to find these tasks in our scrum records.

    --Konstantin

    On Tue, Apr 26, 2011 at 11:55 PM, suresh srinivas wrote:

    Konstantin,

    Could you provide me link to how this was done on a big feature, like say
    append and how benchmark info was captured? I am planning to run dfsio
    tests, btw.

    Regards,
    Suresh

    On Tue, Apr 26, 2011 at 11:34 PM, suresh srinivas <srini30005@gmail.com
    wrote:
    Konstantin,

    On Tue, Apr 26, 2011 at 10:26 PM, Konstantin Shvachko <
    shv.hadoop@gmail.com> wrote:
    Suresh, Sanjay.

    1. I asked for benchmarks many times over the course of different
    discussions on the topic.
    I don't see any numbers attached to jira, and I was getting the same
    response,
    Doug just got from you, guys: which is "why would the performance be
    worse".
    And this is not an argument for me.
    We had done testing earlier and had found that performance had not
    degraded. We are waiting for out performance team to publish the official
    numbers to post it to the jira. Unfortunately they are busy qualifying 2xx
    releases currently. I will get the perf numbers and post them.

    2. I assume that merging requires a vote. I am sure people who know
    bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not necessary. If
    the right procedure is to call for voting, I will do so. Owen any
    comments?
    It feels like you are rushing this and are not doing what you would
    expect
    others to
    do in the same position, and what has been done in the past for such
    large
    projects.
    I am not trying to rush here and not follow the procedure required. I am
    not sure about what the procedure is. Any pointers to it is appreciated.

    Thanks,
    --Konstantin


    On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting <cutting@apache.org>
    wrote:
    Suresh, Sanjay,

    Thank you very much for addressing my questions.

    Cheers,

    Doug
    On 04/26/2011 10:29 AM, suresh srinivas wrote:
    Doug,

    1. Can you please describe the significant advantages this approach
    has
    over a symlink-based approach?
    Federation is complementary with symlink approach. You could choose
    to
    provide integrated namespace using symlinks. However, client side
    mount
    tables seems a better approach for many reasons:
    # Unlike symbolic links, client side mount tables can choose to go
    to
    right
    namenode based on configuration. This avoids unnecessary RPCs to the
    namenodes to discover the targer of symlink.
    # The unavailability of a namenode where a symbolic link is
    configured
    does
    not affect reaching the symlink target.
    # Symbolic links need not be configured on every namenode in the
    cluster
    and
    future changes to symlinks need not be propagated to multiple
    namenodes.
    In
    client side mount tables, this information is in a central
    configuration.
    If a deployment still wants to use symbolic link, federation does
    not
    preclude it.
    It seems to me that one could run multiple namenodes on separate
    boxes
    and run multile datanode processes per storage box

    There are several advantages to using a single datanode:
    # When you have large number of namenodes (say 20), the cost of
    running
    separate datanodes in terms of process resources such as memory is
    huge.
    # The disk i/o management and storage utilization using a single
    datanode
    is
    much better, as it has complete view the storage.
    # In the approach you are proposing, you have several clusters to
    manage.
    However with federation, all datanodes are in a single cluster; with single
    configuration and operationally easier to manage.
    The patch modifies much of the logic of Hadoop's central component,
    upon
    which the performance and reliability of most other components of
    the
    ecosystem depend.
    That is not true.

    # Namenode is mostly unchanged in this feature.
    # Read/write pipelines are unchanged.
    # The changes are mainly in datanode:
    #* the storage, FSDataset, Directory and Disk scanners now have
    another
    level to incorporate block pool ID into the hierarchy. This is not a
    significant change that should cause performance or stability
    concerns.
    #* datanodes use a separate thread per NN, just like the existing
    thread
    that communicates with NN.
    Can you please tell me how this has been tested beyond unit tests?
    As regards to testing, we have passed 600+ tests. In hadoop, these
    tests
    are mostly integration tests and not pure unit tests.

    While these tests have been extensive, we have also been testing
    this
    branch
    for last 4 months, with QA validation that reflects our production
    environment. We have found the system to be stable, performing well
    and
    have
    not found any blockers with the branch so far.

    HDFS-1052 has been open more than a year now. I had also sent an
    email
    about
    this merge around 2 months ago. There are 90 subtasks that have been worked
    on last couple of months under HDFS-1052. Given that there was
    enough
    time
    to ask these questions, your email a day before I am planning to
    merge
    the
    branch into trunk seems late!


    --
    Regards,
    Suresh

    --
    Regards,
    Suresh
  • Suresh srinivas at Apr 28, 2011 at 6:02 pm
    As Eli suggested, I have uploaded a new patch to the jira. Merging new trunk
    changes and testing them took several hours! It passes all the tests except
    two unit test failure. These failures do not happen on my machine - if this
    is a real failure we will address them after merging the patch to the trunk.

    Please review the patch and post your comments on jira.

    Regards,
    Suresh
  • Owen O'Malley at Apr 27, 2011 at 8:54 pm

    On Apr 26, 2011, at 11:34 PM, suresh srinivas wrote:

    2. I assume that merging requires a vote. I am sure people who know bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not necessary. If
    the right procedure is to call for voting, I will do so. Owen any comments?
    Merging a branch back in doesn't require an explicit vote. It is just a code commit. This discussion thread is enough to establish that there is consensus in the dev community.

    -- Owen
  • Suresh srinivas at Apr 27, 2011 at 9:44 pm
    If there are no further issues by tonight, I will merge the branch into
    trunk.

    Regards,
    Suresh
    On Wed, Apr 27, 2011 at 1:53 PM, Owen O'Malley wrote:
    On Apr 26, 2011, at 11:34 PM, suresh srinivas wrote:

    2. I assume that merging requires a vote. I am sure people who know
    bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not necessary. If
    the right procedure is to call for voting, I will do so. Owen any
    comments?

    Merging a branch back in doesn't require an explicit vote. It is just a
    code commit. This discussion thread is enough to establish that there is
    consensus in the dev community.

    -- Owen



    --
    Regards,
    Suresh
  • Konstantin Shvachko at Apr 28, 2011 at 5:12 am
    Owen,

    The question is whether this is a
    * Code Change,
    which requires Lazy consensus of active committers or a
    * Adoption of New Codebase,
    which needs Lazy 2/3 majority of PMC members

    Lazy consensus requires 3 binding +1 votes and no binding vetoes.

    If I am looking at the current bylaws, then it tells me this needs a vote.
    Did I miss anything?

    Konstantin

    On Wed, Apr 27, 2011 at 1:53 PM, Owen O'Malley wrote:
    On Apr 26, 2011, at 11:34 PM, suresh srinivas wrote:

    2. I assume that merging requires a vote. I am sure people who know
    bylaws
    better than I do will correct me if it is not true.
    Did I miss the vote?

    As regards to voting, since I was not sure about the procedure, I had
    consulted Owen about it. He had indicated that voting is not necessary. If
    the right procedure is to call for voting, I will do so. Owen any
    comments?

    Merging a branch back in doesn't require an explicit vote. It is just a
    code commit. This discussion thread is enough to establish that there is
    consensus in the dev community.

    -- Owen
  • Owen O'Malley at Apr 28, 2011 at 8:34 pm

    On Apr 27, 2011, at 10:12 PM, Konstantin Shvachko wrote:

    The question is whether this is a
    * Code Change,
    which requires Lazy consensus of active committers or a
    * Adoption of New Codebase,
    which needs Lazy 2/3 majority of PMC members
    This is a code change, just like all of our jiras. The standard rules of at least one +1 on the jira and no -1's apply.

    Adoption of new codebase is adopting a new subproject or completely replacing trunk.
    Lazy consensus requires 3 binding +1 votes and no binding vetoes.
    This was clarified in the bylaws back in November.

    http://mail-archives.apache.org/mod_mbox/hadoop-general/201011.mbox/%3C159E99C4-B71C-437E-9640-AA24C50D636E@apache.org%3E

    Where it was modified to:

    Lazy consensus of active committers, but with a minimum of
    one +1. The code can be committed after the first +1.

    -- Owen
  • Suresh srinivas at Apr 28, 2011 at 10:12 pm
    Owen, thanks for clarification.

    I have attached the patch to the jira HDFS-1052. Please use the jira to cast
    your vote or post objections. If you have objections please be specific on
    how I can address it and move forward with this issue.

    Regards,
    Suresh
    On Thu, Apr 28, 2011 at 1:33 PM, Owen O'Malley wrote:
    On Apr 27, 2011, at 10:12 PM, Konstantin Shvachko wrote:

    The question is whether this is a
    * Code Change,
    which requires Lazy consensus of active committers or a
    * Adoption of New Codebase,
    which needs Lazy 2/3 majority of PMC members
    This is a code change, just like all of our jiras. The standard rules of at
    least one +1 on the jira and no -1's apply.

    Adoption of new codebase is adopting a new subproject or completely
    replacing trunk.
    Lazy consensus requires 3 binding +1 votes and no binding vetoes.
    This was clarified in the bylaws back in November.


    http://mail-archives.apache.org/mod_mbox/hadoop-general/201011.mbox/%3C159E99C4-B71C-437E-9640-AA24C50D636E@apache.org%3E

    Where it was modified to:

    Lazy consensus of active committers, but with a minimum of
    one +1. The code can be committed after the first +1.

    -- Owen



    --
    Regards,
    Suresh
  • Konstantin Shvachko at Apr 29, 2011 at 6:31 am
    Thanks for clarifying, Owen.
    Should we have the bylaws somewhere on wiki?
    --Konstantin

    On Thu, Apr 28, 2011 at 1:33 PM, Owen O'Malley wrote:
    On Apr 27, 2011, at 10:12 PM, Konstantin Shvachko wrote:

    The question is whether this is a
    * Code Change,
    which requires Lazy consensus of active committers or a
    * Adoption of New Codebase,
    which needs Lazy 2/3 majority of PMC members
    This is a code change, just like all of our jiras. The standard rules of at
    least one +1 on the jira and no -1's apply.

    Adoption of new codebase is adopting a new subproject or completely
    replacing trunk.
    Lazy consensus requires 3 binding +1 votes and no binding vetoes.
    This was clarified in the bylaws back in November.


    http://mail-archives.apache.org/mod_mbox/hadoop-general/201011.mbox/%3C159E99C4-B71C-437E-9640-AA24C50D636E@apache.org%3E

    Where it was modified to:

    Lazy consensus of active committers, but with a minimum of
    one +1. The code can be committed after the first +1.

    -- Owen
  • Todd Lipcon at May 2, 2011 at 9:45 pm
    Apparently this merge wasn't tested against MapReduce trunk at all -- MR
    trunk has been failing to compile for several days. Please see
    MAPREDUCE-2465. I attempted to fix it myself but don't have enough
    background in the new federation code or in RAID.

    -Todd

    On Thu, Apr 28, 2011 at 11:30 PM, Konstantin Shvachko
    wrote:
    Thanks for clarifying, Owen.
    Should we have the bylaws somewhere on wiki?
    --Konstantin

    On Thu, Apr 28, 2011 at 1:33 PM, Owen O'Malley wrote:
    On Apr 27, 2011, at 10:12 PM, Konstantin Shvachko wrote:

    The question is whether this is a
    * Code Change,
    which requires Lazy consensus of active committers or a
    * Adoption of New Codebase,
    which needs Lazy 2/3 majority of PMC members
    This is a code change, just like all of our jiras. The standard rules of at
    least one +1 on the jira and no -1's apply.

    Adoption of new codebase is adopting a new subproject or completely
    replacing trunk.
    Lazy consensus requires 3 binding +1 votes and no binding vetoes.
    This was clarified in the bylaws back in November.


    http://mail-archives.apache.org/mod_mbox/hadoop-general/201011.mbox/%3C159E99C4-B71C-437E-9640-AA24C50D636E@apache.org%3E
    Where it was modified to:

    Lazy consensus of active committers, but with a minimum of
    one +1. The code can be committed after the first +1.

    -- Owen


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Suresh srinivas at May 3, 2011 at 2:17 am
    We have been testing federation regularly with MapReduce with yahoo-merge
    branches. With trunk we missed the contrib (raid). The dependency with
    project splits has been crazy. Not sure how large changes can keep on top of
    all these things.

    I am working on fixing the raid contrib.
    On Mon, May 2, 2011 at 2:44 PM, Todd Lipcon wrote:

    Apparently this merge wasn't tested against MapReduce trunk at all -- MR
    trunk has been failing to compile for several days. Please see
    MAPREDUCE-2465. I attempted to fix it myself but don't have enough
    background in the new federation code or in RAID.

    -Todd

    On Thu, Apr 28, 2011 at 11:30 PM, Konstantin Shvachko
    wrote:
    Thanks for clarifying, Owen.
    Should we have the bylaws somewhere on wiki?
    --Konstantin

    On Thu, Apr 28, 2011 at 1:33 PM, Owen O'Malley wrote:
    On Apr 27, 2011, at 10:12 PM, Konstantin Shvachko wrote:

    The question is whether this is a
    * Code Change,
    which requires Lazy consensus of active committers or a
    * Adoption of New Codebase,
    which needs Lazy 2/3 majority of PMC members
    This is a code change, just like all of our jiras. The standard rules
    of
    at
    least one +1 on the jira and no -1's apply.

    Adoption of new codebase is adopting a new subproject or completely
    replacing trunk.
    Lazy consensus requires 3 binding +1 votes and no binding vetoes.
    This was clarified in the bylaws back in November.

    http://mail-archives.apache.org/mod_mbox/hadoop-general/201011.mbox/%3C159E99C4-B71C-437E-9640-AA24C50D636E@apache.org%3E
    Where it was modified to:

    Lazy consensus of active committers, but with a minimum of
    one +1. The code can be committed after the first +1.

    -- Owen


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Regards,
    Suresh
  • Sanjay Radia at Apr 27, 2011 at 12:27 am

    On Apr 25, 2011, at 2:36 PM, Doug Cutting wrote:
    A couple of questions:

    1. Can you please describe the significant advantages this approach
    has
    over a symlink-based approach?
    It seems to me that one could run multiple namenodes on separate boxes
    and run multile datanode processes per storage box configured with
    something like:
    .......
    Doug,

    There are two separate issues; your email seems to suggest that these
    are joined.
    (1) creating (or not ) a unified namespace
    (2) sharing the storage and the block storage layer across NameNodes -
    the architecture document covers this layering in great detail.
    This separation reflects architecture of HDFS (derived from GFS) where
    the namespace layer is separate from the block storage layer (although
    the HDFS implementation violates the layers in many places).


    HDFS-1052 deals with (2) - allowing multiple NameNodes to share the
    block storage layer.

    As far as (1), creating a unified namespace, federation does NOT
    dictate how you create a unified namespace or whether you even create
    a unified namespace in the first place. Indeed you may want to share
    the physical storage but want independent namespaces. For example, you
    may want to run a private namespace for HBase files within the same
    Hadoop cluster. Two different tenants sharing a cluster may choose to
    have their independent namespaces for isolation.

    Of course in many situations one wants to create a unified namespace.
    One could create a unified namespace using symbolic links as you
    suggest. The federation work has also added client-side mount tables
    (HDFS-1053) (it is an implementation of FileSystem and
    AbstractFileSystem). It offers advantages over symbolic links but this
    is separable and you can use symbolic links if you like. HDFS-1053
    (client-side mount tables) makes no changes to any existing file system.

    Now getting to (2), sharing the physical storage and the block
    storage layer.
    The approach you describe (run multiple DNs on the same machine which
    is essentially multiple super-imposed HDFS clusters)
    is the most common reaction to this work and one which we also explored.
    Unfortunately this approach runs into several issues and when you
    start exploring the details you realize that it is essentially a hack.
    - Extra processes running the DN on the same machine taking precious
    memory away from MR tasks.
    - Independent pools of threads for each DN
    - Not being able to schedule disk operations across multiple DNs
    - Not being able to provide a unified view of balancing or
    decommissioning. For example, one could run multiple balancers but
    this will give you less control of bandwidth used for balancing.
    - The disk-fail-in-place work and the balance-disks-on-introducing-a-
    new-disk would become more complicated to coordinate across DNs.
    - Federation allows the cluster to be managed as a unit rather then as
    a a bunch of overlapping HDFS clusters. Overlapping HDFS clusters will
    be operationally taxing.

    On the other hand, the new architecture generalizes the block storage
    layer and allow us to evolve it to address new needs. For example, it
    will allow us to address issues like offering tmp storage for
    intermediate MR output - one can allocate a block pool for MR tmp
    storage on each DN. HBase could also use the block storage layer
    directly without going through a name node.
    2. .... The patch modifies much
    of the logic of Hadoop's central component, upon which the performance
    and reliability of most other components of the ecosystem depend.
    Changes to the code base
    - The fundamental code change is to extend the notion of block id to
    now include a block pool id.
    - The NN had little change, the protocols did change to include the
    block pool id.
    - The DN code did change. Each data structure is now indexed by the
    block pool id -- while this is a code change, it is architecturally
    very simple and low risk.
    - We also did a fair amount of cleanup of threads used to send block
    reports - while it was not strictly necessary to do the cleanup we
    took the extra effort to pay the technical debt. As Dhruba recently
    noted, adding support to send block reports to primary and secondary
    NN for HA will be now much easier to do.

    The write and read pipelines - which are performance critical, have
    NOT changed.
    It seems to me that such an invasive change should be well tested
    before it
    is merged to trunk. Can you please tell me how this has been tested
    beyond unit tests?

    Risk, Quality & Testing
    Besides the amount of code change one has to ask the fundamental
    question: how good is the design and how is the project managed.
    Conceptually, federation is very simple: pools of blocks are owned by
    a service (a NN in this case) and the block id is extended by an
    identifier called the block-pool id.
    First and foremost - we wrote a very extensive architecture document -
    more comprehensive than any other document in Hadoop in the past.
    This was published very early: version 1 in march 2010 and version 5
    in april 2010 based on feedback we received from the community. We
    sought and incorporated feedback from other HDFs developers outside of
    Yahoo.

    The project was managed as a separate branch rather than introduce the
    code to trunk incrementally.
    The branch has also been tested as a separate unit by us - this
    ensures that it does not destabilize trunk.

    More details on testing.
    The same QA process that drove and tested key stable Apache Hadoop
    releases (16, 17, 18, 20, 20-security) is being used for testing the
    federation feature. We have been running integrated tests with
    federation for a few months and continue to do so.
    We will not deploy a Hadoop release with the federation feature in
    Yahoo clusters until we are confident that it is stable and reliable
    for our clusters. Indeed the level of testing is significantly more
    than in previous releases.

    Hopefully the above addresses your concerns.

    regards
    sanjay
  • Konstantin Boudnik at Apr 27, 2011 at 1:00 am
    Sanjay,

    I assume the outlined changes won't an earlier version of HDFS from
    upgrads to the federation version, right?

    Cos
    On Tue, Apr 26, 2011 at 17:26, Sanjay Radia wrote:

    Changes to the code base
    - The fundamental code change is to extend the notion of block id to now
    include a block pool id.
    - The  NN had little change, the protocols did change to include the block
    pool id.
    - The DN code did change. Each data structure is now indexed by the block
    pool id -- while this is a code change, it is architecturally very simple
    and low risk.
    - We also did a fair amount of cleanup of threads used to send block reports
    - while it was not strictly necessary to do the cleanup we took the extra
    effort to pay the technical debt. As Dhruba recently noted, adding support
    to send block reports to primary and secondary NN for HA will be now much
    easier to do.
  • Dhruba Borthakur at Apr 27, 2011 at 4:27 am
    I feel that making the datanode talk to multiple namenodes is very valuable,
    especially when there is plenty of storage available on a single datanode
    machine (think 24 TB to 36 TB) and a single namenode does not have enough
    memory to hold all file metadata for such a large cluster in memory.

    This is a feature that we are in dire need of, and could put it to good use
    starting "yesterday"!

    thanks,
    dhruba
    On Tue, Apr 26, 2011 at 5:59 PM, Konstantin Boudnik wrote:

    Sanjay,

    I assume the outlined changes won't an earlier version of HDFS from
    upgrads to the federation version, right?

    Cos
    On Tue, Apr 26, 2011 at 17:26, Sanjay Radia wrote:

    Changes to the code base
    - The fundamental code change is to extend the notion of block id to now
    include a block pool id.
    - The NN had little change, the protocols did change to include the block
    pool id.
    - The DN code did change. Each data structure is now indexed by the block
    pool id -- while this is a code change, it is architecturally very simple
    and low risk.
    - We also did a fair amount of cleanup of threads used to send block reports
    - while it was not strictly necessary to do the cleanup we took the extra
    effort to pay the technical debt. As Dhruba recently noted, adding support
    to send block reports to primary and secondary NN for HA will be now much
    easier to do.


    --
    Connect to me at http://www.facebook.com/dhruba
  • Tsz Wo \(Nicholas\), Sze at Apr 27, 2011 at 5:16 am
    Agree. It is a step forward to distributed namespace.

    Regards,
    Nicholas





    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: hdfs-dev@hadoop.apache.org
    Cc: sradia@yahoo-inc.com; Doug Cutting <cutting@apache.org>
    Sent: Wed, April 27, 2011 12:27:30 AM
    Subject: Re: [Discuss] Merge federation branch HDFS-1052 into trunk

    I feel that making the datanode talk to multiple namenodes is very valuable,
    especially when there is plenty of storage available on a single datanode
    machine (think 24 TB to 36 TB) and a single namenode does not have enough
    memory to hold all file metadata for such a large cluster in memory.

    This is a feature that we are in dire need of, and could put it to good use
    starting "yesterday"!

    thanks,
    dhruba
    On Tue, Apr 26, 2011 at 5:59 PM, Konstantin Boudnik wrote:

    Sanjay,

    I assume the outlined changes won't an earlier version of HDFS from
    upgrads to the federation version, right?

    Cos
    On Tue, Apr 26, 2011 at 17:26, Sanjay Radia wrote:

    Changes to the code base
    - The fundamental code change is to extend the notion of block id to now
    include a block pool id.
    - The NN had little change, the protocols did change to include the block
    pool id.
    - The DN code did change. Each data structure is now indexed by the block
    pool id -- while this is a code change, it is architecturally very simple
    and low risk.
    - We also did a fair amount of cleanup of threads used to send block reports
    - while it was not strictly necessary to do the cleanup we took the extra
    effort to pay the technical debt. As Dhruba recently noted, adding support
    to send block reports to primary and secondary NN for HA will be now much
    easier to do.


    --
    Connect to me at http://www.facebook.com/dhruba
  • Konstantin Shvachko at Apr 27, 2011 at 5:37 am
    Dhruba,

    It would be very valuable for the community to share your experience
    if you performed any independent testing of the federation branch.

    Thanks,
    --Konstantin
    On Tue, Apr 26, 2011 at 9:27 PM, Dhruba Borthakur wrote:

    I feel that making the datanode talk to multiple namenodes is very
    valuable,
    especially when there is plenty of storage available on a single datanode
    machine (think 24 TB to 36 TB) and a single namenode does not have enough
    memory to hold all file metadata for such a large cluster in memory.

    This is a feature that we are in dire need of, and could put it to good use
    starting "yesterday"!

    thanks,
    dhruba
    On Tue, Apr 26, 2011 at 5:59 PM, Konstantin Boudnik wrote:

    Sanjay,

    I assume the outlined changes won't an earlier version of HDFS from
    upgrads to the federation version, right?

    Cos
    On Tue, Apr 26, 2011 at 17:26, Sanjay Radia wrote:

    Changes to the code base
    - The fundamental code change is to extend the notion of block id to
    now
    include a block pool id.
    - The NN had little change, the protocols did change to include the block
    pool id.
    - The DN code did change. Each data structure is now indexed by the
    block
    pool id -- while this is a code change, it is architecturally very
    simple
    and low risk.
    - We also did a fair amount of cleanup of threads used to send block reports
    - while it was not strictly necessary to do the cleanup we took the
    extra
    effort to pay the technical debt. As Dhruba recently noted, adding support
    to send block reports to primary and secondary NN for HA will be now
    much
    easier to do.


    --
    Connect to me at http://www.facebook.com/dhruba
  • Konstantin Boudnik at Apr 27, 2011 at 5:40 am
    Oops, the message came out garbled. I meant to say

    I assume the outlined changes won't prevent an earlier version of HDFS from
    upgrades to the federation version, right?

    Thanks in advance,
    Cos
    On Tue, Apr 26, 2011 at 17:59, Konstantin Boudnik wrote:
    Sanjay,

    I assume the outlined changes won't an earlier version of HDFS from
    upgrads to the federation version, right?

    Cos
    On Tue, Apr 26, 2011 at 17:26, Sanjay Radia wrote:

    Changes to the code base
    - The fundamental code change is to extend the notion of block id to now
    include a block pool id.
    - The  NN had little change, the protocols did change to include the block
    pool id.
    - The DN code did change. Each data structure is now indexed by the block
    pool id -- while this is a code change, it is architecturally very simple
    and low risk.
    - We also did a fair amount of cleanup of threads used to send block reports
    - while it was not strictly necessary to do the cleanup we took the extra
    effort to pay the technical debt. As Dhruba recently noted, adding support
    to send block reports to primary and secondary NN for HA will be now much
    easier to do.
  • Suresh srinivas at Apr 27, 2011 at 6:28 am
    Upgrades from earlier version is supported. The existing configuration
    should run without any change.
    On Tue, Apr 26, 2011 at 10:40 PM, Konstantin Boudnik wrote:

    Oops, the message came out garbled. I meant to say

    I assume the outlined changes won't prevent an earlier version of HDFS from
    upgrades to the federation version, right?

    Thanks in advance,
    Cos
  • Sanjay Radia at Apr 27, 2011 at 2:03 pm

    On Apr 26, 2011, at 10:40 PM, Konstantin Boudnik wrote:

    Oops, the message came out garbled. I meant to say

    I assume the outlined changes won't prevent an earlier version of
    HDFS from
    upgrades to the federation version, right?

    Yes absolutely. We have tested upgrades .
    Besides our ops will throw us out of the window if we even hint that
    there isn't an
    automatic upgrade for the next release :-)

    sanjay
    Thanks in advance,
    Cos
    On Tue, Apr 26, 2011 at 17:59, Konstantin Boudnik wrote:
    Sanjay,

    I assume the outlined changes won't an earlier version of HDFS from
    upgrads to the federation version, right?

    Cos

    On Tue, Apr 26, 2011 at 17:26, Sanjay Radia <sradia@yahoo-inc.com>
    wrote:
    Changes to the code base
    - The fundamental code change is to extend the notion of block id
    to now
    include a block pool id.
    - The NN had little change, the protocols did change to include
    the block
    pool id.
    - The DN code did change. Each data structure is now indexed by
    the block
    pool id -- while this is a code change, it is architecturally very
    simple
    and low risk.
    - We also did a fair amount of cleanup of threads used to send
    block reports
    - while it was not strictly necessary to do the cleanup we took
    the extra
    effort to pay the technical debt. As Dhruba recently noted, adding
    support
    to send block reports to primary and secondary NN for HA will be
    now much
    easier to do.
  • Eli Collins at Apr 27, 2011 at 9:37 pm
    Hey Suresh,

    Do you plan to update the patch on HDFS-1052 soon? Trunk has moved on
    a little bit since the last patch. I assume we vote on the patch
    there. I think additional review feedback (beyond what's already been
    done) can be handled after the code is merged, I know what a pain it
    is to keep a patch out of mainline. What I've looked at so far looks
    great btw.

    For those of you who missed the design doc you should check it out:
    https://issues.apache.org/jira/secure/attachment/12442372/Mulitple+Namespaces5.pdf

    Thanks,
    Eli
    On Fri, Apr 22, 2011 at 9:48 AM, Suresh Srinivas wrote:
    A few weeks ago, I had sent an email about the progress of HDFS federation development in HDFS-1052 branch. I am happy to announce that all the tasks related to this feature development is complete and it is ready to be integrated into trunk.

    I have a merge patch attached to HDFS-1052 jira. All Hudson tests pass except for two test failures. We will fix these unit test failures in trunk, post merge. I plan on completing merge to trunk early next week. I would like to do this ASAP to avoid having to keep the patch up to date (which has been time consuming). This also avoids need for re-merging, due to SVN changes proposed by Nigel, scheduled late next week. Comments are welcome.

    Regards,
    Suresh
  • Suresh srinivas at Apr 28, 2011 at 12:23 am
    Thanks Eli.

    The merge of latest changes in trunk is not straight forward. I will get it
    done tonight and post a new patch. That means the earlier the merge can
    happen is tomorrow.

    On Wed, Apr 27, 2011 at 2:36 PM, Eli Collins wrote:

    Hey Suresh,

    Do you plan to update the patch on HDFS-1052 soon? Trunk has moved on
    a little bit since the last patch. I assume we vote on the patch
    there. I think additional review feedback (beyond what's already been
    done) can be handled after the code is merged, I know what a pain it
    is to keep a patch out of mainline. What I've looked at so far looks
    great btw.

    For those of you who missed the design doc you should check it out:

    https://issues.apache.org/jira/secure/attachment/12442372/Mulitple+Namespaces5.pdf

    Thanks,
    Eli
    On Fri, Apr 22, 2011 at 9:48 AM, Suresh Srinivas wrote:
    A few weeks ago, I had sent an email about the progress of HDFS
    federation development in HDFS-1052 branch. I am happy to announce that all
    the tasks related to this feature development is complete and it is ready to
    be integrated into trunk.
    I have a merge patch attached to HDFS-1052 jira. All Hudson tests pass
    except for two test failures. We will fix these unit test failures in trunk,
    post merge. I plan on completing merge to trunk early next week. I would
    like to do this ASAP to avoid having to keep the patch up to date (which has
    been time consuming). This also avoids need for re-merging, due to SVN
    changes proposed by Nigel, scheduled late next week. Comments are welcome.
    Regards,
    Suresh


    --
    Regards,
    Suresh

Related Discussions

People

Translate

site design / logo © 2022 Grokbase