FAQ
Hi,

We want to use HDFS-RAID in our production cluster. (
http://wiki.apache.org/hadoop/HDFS-RAID)
I am not able to find source/binaries/configs for this in official hadoop
distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).

Can somebody please tell me where can I find that? and installation
procedure?
Also, is HDFS-RAID implementation stable enough to use in production?

thanks,
Ajit.

Search Discussions

  • Harsh J at Sep 15, 2011 at 11:36 am
    Hey Ajit,

    HDFS-RAID was never part of the 0.20 release. It made its debut in the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] - http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/

    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in production?
    thanks,
    Ajit.


    --
    Harsh J
  • Ajit Ratnaparkhi at Sep 15, 2011 at 12:32 pm
    Hi,

    We were planning to use it for past data archival(instead of moving it to
    archival store).
    Archiving it in HDFS gives advantage of making it easily available for
    processing whenever required.

    Is there any archival solution in hadoop ecosystem?

    thanks,
    Ajit.

    On Thu, Sep 15, 2011 at 5:05 PM, Harsh J wrote:

    Hey Ajit,

    HDFS-RAID was never part of the 0.20 release. It made its debut in the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] -
    http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/

    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in production?
    thanks,
    Ajit.


    --
    Harsh J
  • Dhruba Borthakur at Sep 15, 2011 at 5:07 pm
    We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR
    encoding (effective replication of 2.5). Data older than a few months are
    raided using ReedSolomon (effective observed replication factor of 1.5).
    This is running on our 60 PB size cluster for about an year now.

    thanks
    dhruba
    On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi wrote:

    Hi,

    We were planning to use it for past data archival(instead of moving it to
    archival store).
    Archiving it in HDFS gives advantage of making it easily available for
    processing whenever required.

    Is there any archival solution in hadoop ecosystem?

    thanks,
    Ajit.

    On Thu, Sep 15, 2011 at 5:05 PM, Harsh J wrote:

    Hey Ajit,

    HDFS-RAID was never part of the 0.20 release. It made its debut in the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] -
    http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/

    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in production?
    thanks,
    Ajit.


    --
    Harsh J

    --
    Connect to me at http://www.facebook.com/dhruba
  • Andrew Purtell at Sep 15, 2011 at 5:09 pm
    But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?


    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Thursday, September 15, 2011 10:06 AM
    Subject: Re: Need help regarding HDFS-RAID


    We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR encoding (effective replication of 2.5). Data older than a few months are raided using ReedSolomon (effective observed replication factor of 1.5). This is running on our 60 PB size cluster for about an year now.


    thanks
    dhruba



    On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi wrote:

    Hi,

    We were planning to use it for past data archival(instead of moving it to archival store).
    Archiving it in HDFS gives advantage of making it easily available for processing whenever required.


    Is there any archival solution in hadoop ecosystem?


    thanks,
    Ajit.



    On Thu, Sep 15, 2011 at 5:05 PM, Harsh J wrote:

    Hey Ajit,
    HDFS-RAID was never part of the 0.20 release. It made its debut in the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] - http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/


    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in production?
    thanks,
    Ajit.


    --
    Harsh J


    --
    Connect to me at http://www.facebook.com/dhruba

  • Dhruba Borthakur at Sep 15, 2011 at 5:14 pm
    That's right Andy. 0.22+. We are running a HDFS-RAID code base that is
    pretty close to what is available in Apache hdfs trunk.

    -dhruba
    On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell wrote:

    But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?

    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)

    ------------------------------
    *From:* Dhruba Borthakur <dhruba@gmail.com>
    *To:* hdfs-user@hadoop.apache.org
    *Sent:* Thursday, September 15, 2011 10:06 AM
    *Subject:* Re: Need help regarding HDFS-RAID

    We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR
    encoding (effective replication of 2.5). Data older than a few months are
    raided using ReedSolomon (effective observed replication factor of 1.5).
    This is running on our 60 PB size cluster for about an year now.

    thanks
    dhruba

    On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <
    ajit.ratnaparkhi@gmail.com> wrote:

    Hi,

    We were planning to use it for past data archival(instead of moving it to
    archival store).
    Archiving it in HDFS gives advantage of making it easily available for
    processing whenever required.

    Is there any archival solution in hadoop ecosystem?

    thanks,
    Ajit.


    On Thu, Sep 15, 2011 at 5:05 PM, Harsh J wrote:

    Hey Ajit,

    HDFS-RAID was never part of the 0.20 release. It made its debut in the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] -
    http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/

    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in production?
    thanks,
    Ajit.


    --
    Harsh J





    --
    Connect to me at http://www.facebook.com/dhruba


    --
    Connect to me at http://www.facebook.com/dhruba
  • Ajit Ratnaparkhi at Sep 15, 2011 at 5:55 pm
    Thanks for the info!
    So can I use HDFS-RAID taken from apache hdfs trunk as it is with
    hadoop-0.20.1/hadoop-0.20.2 ? It seems to be under branch 0.21, will it work
    with 0.20.* ?

    thanks,
    -Ajit.
    On Thu, Sep 15, 2011 at 10:44 PM, Dhruba Borthakur wrote:

    That's right Andy. 0.22+. We are running a HDFS-RAID code base that is
    pretty close to what is available in Apache hdfs trunk.

    -dhruba

    On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell wrote:

    But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?

    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)

    ------------------------------
    *From:* Dhruba Borthakur <dhruba@gmail.com>
    *To:* hdfs-user@hadoop.apache.org
    *Sent:* Thursday, September 15, 2011 10:06 AM
    *Subject:* Re: Need help regarding HDFS-RAID

    We use HDFS RAID in a big way. Data older than 12 days are RAIDED using
    XOR encoding (effective replication of 2.5). Data older than a few months
    are raided using ReedSolomon (effective observed replication factor of 1.5).
    This is running on our 60 PB size cluster for about an year now.

    thanks
    dhruba

    On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <
    ajit.ratnaparkhi@gmail.com> wrote:

    Hi,

    We were planning to use it for past data archival(instead of moving it to
    archival store).
    Archiving it in HDFS gives advantage of making it easily available for
    processing whenever required.

    Is there any archival solution in hadoop ecosystem?

    thanks,
    Ajit.


    On Thu, Sep 15, 2011 at 5:05 PM, Harsh J wrote:

    Hey Ajit,

    HDFS-RAID was never part of the 0.20 release. It made its debut in the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] -
    http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/

    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in production?
    thanks,
    Ajit.


    --
    Harsh J





    --
    Connect to me at http://www.facebook.com/dhruba


    --
    Connect to me at http://www.facebook.com/dhruba
  • Andrew Purtell at Sep 15, 2011 at 6:02 pm
    HDFS RAID from 0.21 will work if back ported to 0.20. Only a minor fixup is needed.

    HDFS RAID from 0.22 relies on new HDFS APIs not available in 0.20.


    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

    ________________________________
    From: Ajit Ratnaparkhi <ajit.ratnaparkhi@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Cc: Andrew Purtell <apurtell@apache.org>
    Sent: Thursday, September 15, 2011 10:54 AM
    Subject: Re: Need help regarding HDFS-RAID


    Thanks for the info!
    So can I use HDFS-RAID taken from apache hdfs trunk as it is with hadoop-0.20.1/hadoop-0.20.2 ? It seems to be under branch 0.21, will it work with 0.20.* ?


    thanks,
    -Ajit.


    On Thu, Sep 15, 2011 at 10:44 PM, Dhruba Borthakur wrote:

    That's right Andy. 0.22+. We are running a HDFS-RAID code base that is pretty close to what is available in Apache hdfs trunk.

    -dhruba



    On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell wrote:

    But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?

    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Thursday, September 15, 2011 10:06 AM
    Subject: Re: Need help regarding HDFS-RAID



    We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR encoding (effective replication of 2.5). Data older than a few months are raided using ReedSolomon (effective observed replication factor of 1.5). This is running on our 60 PB size cluster for about an year now.


    thanks
    dhruba



    On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi wrote:

    Hi,

    We were planning to use it for past data archival(instead of moving it to archival store).
    Archiving it in HDFS gives advantage of making it easily available for processing whenever required.


    Is there any archival solution in hadoop ecosystem?


    thanks,
    Ajit.



    On Thu, Sep 15, 2011 at 5:05 PM, Harsh J wrote:

    Hey Ajit,
    HDFS-RAID was never part of the 0.20 release. It made its debut in the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] - http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/


    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in production?
    thanks,
    Ajit.


    --
    Harsh J


    --
    Connect to me at http://www.facebook.com/dhruba



    --
    Connect to me at http://www.facebook.com/dhruba
  • Ajit Ratnaparkhi at Sep 16, 2011 at 5:44 am
    Thanks!
    On Thu, Sep 15, 2011 at 11:31 PM, Andrew Purtell wrote:

    HDFS RAID from 0.21 will work if back ported to 0.20. Only a minor fixup is
    needed.

    HDFS RAID from 0.22 relies on new HDFS APIs not available in 0.20.

    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)

    ------------------------------
    *From:* Ajit Ratnaparkhi <ajit.ratnaparkhi@gmail.com>

    *To:* hdfs-user@hadoop.apache.org
    *Cc:* Andrew Purtell <apurtell@apache.org>
    *Sent:* Thursday, September 15, 2011 10:54 AM

    *Subject:* Re: Need help regarding HDFS-RAID

    Thanks for the info!
    So can I use HDFS-RAID taken from apache hdfs trunk as it is with
    hadoop-0.20.1/hadoop-0.20.2 ? It seems to be under branch 0.21, will it work
    with 0.20.* ?

    thanks,
    -Ajit.

    On Thu, Sep 15, 2011 at 10:44 PM, Dhruba Borthakur wrote:

    That's right Andy. 0.22+. We are running a HDFS-RAID code base that is
    pretty close to what is available in Apache hdfs trunk.

    -dhruba


    On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell wrote:

    But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?

    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)

    ------------------------------
    *From:* Dhruba Borthakur <dhruba@gmail.com>
    *To:* hdfs-user@hadoop.apache.org
    *Sent:* Thursday, September 15, 2011 10:06 AM
    *Subject:* Re: Need help regarding HDFS-RAID

    We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR
    encoding (effective replication of 2.5). Data older than a few months are
    raided using ReedSolomon (effective observed replication factor of 1.5).
    This is running on our 60 PB size cluster for about an year now.

    thanks
    dhruba

    On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <
    ajit.ratnaparkhi@gmail.com> wrote:

    Hi,

    We were planning to use it for past data archival(instead of moving it to
    archival store).
    Archiving it in HDFS gives advantage of making it easily available for
    processing whenever required.

    Is there any archival solution in hadoop ecosystem?

    thanks,
    Ajit.


    On Thu, Sep 15, 2011 at 5:05 PM, Harsh J wrote:

    Hey Ajit,

    HDFS-RAID was never part of the 0.20 release. It made its debut in the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] -
    http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/

    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in production?
    thanks,
    Ajit.


    --
    Harsh J





    --
    Connect to me at http://www.facebook.com/dhruba





    --
    Connect to me at http://www.facebook.com/dhruba



  • Andrew Purtell at Sep 17, 2011 at 4:17 pm
    Hi Dhruba,

    Would you consider a contribution of this to branch-0.20-security aka 0.20.2xx.x?

    If I am mistaken and you do not have a 0.22-ish HDFS RAID backported to an 0.20-ish platform, please disregard.

    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: hdfs-user@hadoop.apache.org; Andrew Purtell <apurtell@apache.org>
    Sent: Thursday, September 15, 2011 10:14 AM
    Subject: Re: Need help regarding HDFS-RAID


    That's right Andy. 0.22+. We are running a HDFS-RAID code base that is pretty close to what is available in Apache hdfs trunk.


    -dhruba


    On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell wrote:

    But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?

    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Thursday, September 15, 2011 10:06 AM
    Subject: Re: Need help regarding HDFS-RAID



    We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR encoding (effective replication of 2.5). Data older than a few months are raided using ReedSolomon (effective observed replication factor of 1.5). This is running on our 60 PB size cluster for about an year now.


    thanks
    dhruba



    On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi wrote:

    Hi,

    We were planning to use it for past data archival(instead of moving it to archival store).
    Archiving it in HDFS gives advantage of making it easily available for processing whenever required.


    Is there any archival solution in hadoop ecosystem?


    thanks,
    Ajit.



    On Thu, Sep 15, 2011 at 5:05 PM, Harsh J wrote:

    Hey Ajit,
    HDFS-RAID was never part of the 0.20 release. It made its debut in the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] - http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/


    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in production?
    thanks,
    Ajit.


    --
    Harsh J


    --
    Connect to me at http://www.facebook.com/dhruba



    --
    Connect to me at http://www.facebook.com/dhruba

  • Dhruba Borthakur at Sep 20, 2011 at 9:20 am
    Hi andy,

    we do run a version of HDFS RAID that is backported from Apache trunk to a
    0.20 based release. Our code is in
    https://github.com/facebook/hadoop-20-warehouse/tree/master/src/contrib/raid
    But I do not have an elegant way to contribute this code to Apache
    0.20.2xx.x.

    thanks,
    dhruba
    On Sat, Sep 17, 2011 at 9:16 AM, Andrew Purtell wrote:

    Hi Dhruba,

    Would you consider a contribution of this to branch-0.20-security aka
    0.20.2xx.x?

    If I am mistaken and you do not have a 0.22-ish HDFS RAID backported to an
    0.20-ish platform, please disregard.

    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)

    ------------------------------
    *From:* Dhruba Borthakur <dhruba@gmail.com>
    *To:* hdfs-user@hadoop.apache.org; Andrew Purtell <apurtell@apache.org>
    *Sent:* Thursday, September 15, 2011 10:14 AM

    *Subject:* Re: Need help regarding HDFS-RAID

    That's right Andy. 0.22+. We are running a HDFS-RAID code base that is
    pretty close to what is available in Apache hdfs trunk.

    -dhruba

    On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell wrote:

    But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?

    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)

    ------------------------------
    *From:* Dhruba Borthakur <dhruba@gmail.com>
    *To:* hdfs-user@hadoop.apache.org
    *Sent:* Thursday, September 15, 2011 10:06 AM
    *Subject:* Re: Need help regarding HDFS-RAID

    We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR
    encoding (effective replication of 2.5). Data older than a few months are
    raided using ReedSolomon (effective observed replication factor of 1.5).
    This is running on our 60 PB size cluster for about an year now.

    thanks
    dhruba

    On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <
    ajit.ratnaparkhi@gmail.com> wrote:

    Hi,

    We were planning to use it for past data archival(instead of moving it to
    archival store).
    Archiving it in HDFS gives advantage of making it easily available for
    processing whenever required.

    Is there any archival solution in hadoop ecosystem?

    thanks,
    Ajit.


    On Thu, Sep 15, 2011 at 5:05 PM, Harsh J wrote:

    Hey Ajit,

    HDFS-RAID was never part of the 0.20 release. It made its debut in the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] -
    http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/

    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in production?
    thanks,
    Ajit.


    --
    Harsh J





    --
    Connect to me at http://www.facebook.com/dhruba





    --
    Connect to me at http://www.facebook.com/dhruba


    --
    Connect to me at http://www.facebook.com/dhruba
  • Ajit Ratnaparkhi at Sep 20, 2011 at 1:50 pm
    Thanks Dhruba!

    Can I try using it? Is it open for use?

    -Ajit.
    On Tue, Sep 20, 2011 at 2:48 PM, Dhruba Borthakur wrote:

    Hi andy,

    we do run a version of HDFS RAID that is backported from Apache trunk to a
    0.20 based release. Our code is in
    https://github.com/facebook/hadoop-20-warehouse/tree/master/src/contrib/raid
    But I do not have an elegant way to contribute this code to Apache
    0.20.2xx.x.

    thanks,
    dhruba

    On Sat, Sep 17, 2011 at 9:16 AM, Andrew Purtell wrote:

    Hi Dhruba,

    Would you consider a contribution of this to branch-0.20-security aka
    0.20.2xx.x?

    If I am mistaken and you do not have a 0.22-ish HDFS RAID backported to an
    0.20-ish platform, please disregard.

    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)

    ------------------------------
    *From:* Dhruba Borthakur <dhruba@gmail.com>
    *To:* hdfs-user@hadoop.apache.org; Andrew Purtell <apurtell@apache.org>
    *Sent:* Thursday, September 15, 2011 10:14 AM

    *Subject:* Re: Need help regarding HDFS-RAID

    That's right Andy. 0.22+. We are running a HDFS-RAID code base that is
    pretty close to what is available in Apache hdfs trunk.

    -dhruba

    On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell wrote:

    But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?

    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)

    ------------------------------
    *From:* Dhruba Borthakur <dhruba@gmail.com>
    *To:* hdfs-user@hadoop.apache.org
    *Sent:* Thursday, September 15, 2011 10:06 AM
    *Subject:* Re: Need help regarding HDFS-RAID

    We use HDFS RAID in a big way. Data older than 12 days are RAIDED using
    XOR encoding (effective replication of 2.5). Data older than a few months
    are raided using ReedSolomon (effective observed replication factor of 1.5).
    This is running on our 60 PB size cluster for about an year now.

    thanks
    dhruba

    On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <
    ajit.ratnaparkhi@gmail.com> wrote:

    Hi,

    We were planning to use it for past data archival(instead of moving it to
    archival store).
    Archiving it in HDFS gives advantage of making it easily available for
    processing whenever required.

    Is there any archival solution in hadoop ecosystem?

    thanks,
    Ajit.


    On Thu, Sep 15, 2011 at 5:05 PM, Harsh J wrote:

    Hey Ajit,

    HDFS-RAID was never part of the 0.20 release. It made its debut in the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] -
    http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/

    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in production?
    thanks,
    Ajit.


    --
    Harsh J





    --
    Connect to me at http://www.facebook.com/dhruba





    --
    Connect to me at http://www.facebook.com/dhruba


    --
    Connect to me at http://www.facebook.com/dhruba
  • Andrew Purtell at Sep 20, 2011 at 4:03 pm
    Hi Dhruba,

    Thanks for the pointer. I'm going to try and pull this code into our internal 20-ish distro. Would you object if I make a contribution of that result if it is successful?


    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: Andrew Purtell <apurtell@apache.org>
    Cc: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
    Sent: Tuesday, September 20, 2011 2:18 AM
    Subject: Re: Need help regarding HDFS-RAID


    Hi andy,


    we do run a version of HDFS RAID that is backported from Apache trunk to a 0.20 based release. Our code is in https://github.com/facebook/hadoop-20-warehouse/tree/master/src/contrib/raid
    But I do not have an elegant way to contribute this code to Apache 0.20.2xx.x.


    thanks,
    dhruba


    On Sat, Sep 17, 2011 at 9:16 AM, Andrew Purtell wrote:

    Hi Dhruba,

    Would you consider a contribution of this to branch-0.20-security aka 0.20.2xx.x?


    If I am mistaken and you do not have a 0.22-ish HDFS RAID backported to an 0.20-ish platform, please disregard.


    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: hdfs-user@hadoop.apache.org; Andrew Purtell <apurtell@apache.org>
    Sent: Thursday, September 15, 2011 10:14 AM

    Subject: Re: Need help regarding HDFS-RAID



    That's right Andy. 0.22+. We are running a HDFS-RAID code base that is pretty close to what is available in Apache hdfs trunk.


    -dhruba


    On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell wrote:

    But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?

    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Thursday, September 15, 2011 10:06 AM
    Subject: Re: Need help regarding HDFS-RAID



    We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR encoding (effective replication of 2.5). Data older than a few months are raided using ReedSolomon (effective observed replication factor of 1.5). This is running on our 60 PB size cluster for about an year now.


    thanks
    dhruba



    On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi wrote:

    Hi,

    We were planning to use it for past data archival(instead of moving it to archival store).
    Archiving it in HDFS gives advantage of making it easily available for processing whenever required.


    Is there any archival solution in hadoop ecosystem?


    thanks,
    Ajit.



    On Thu, Sep 15, 2011 at 5:05 PM, Harsh J wrote:

    Hey Ajit,
    HDFS-RAID was never part of the 0.20 release. It made its debut in the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] - http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/


    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in production?
    thanks,
    Ajit.


    --
    Harsh J


    --
    Connect to me at http://www.facebook.com/dhruba



    --
    Connect to me at http://www.facebook.com/dhruba



    --
    Connect to me at http://www.facebook.com/dhruba

  • Dhruba Borthakur at Sep 20, 2011 at 4:49 pm
    Hi Andy,

    I will be very grateful to you if you merge and contribute it to Apache
    Hadoop 0.20.2xx.x.

    thanks,
    dhruba
    On Tue, Sep 20, 2011 at 9:03 AM, Andrew Purtell wrote:

    Hi Dhruba,

    Thanks for the pointer. I'm going to try and pull this code into our
    internal 20-ish distro. Would you object if I make a contribution of that
    result if it is successful?


    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)
    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: Andrew Purtell <apurtell@apache.org>
    Cc: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
    Sent: Tuesday, September 20, 2011 2:18 AM
    Subject: Re: Need help regarding HDFS-RAID


    Hi andy,


    we do run a version of HDFS RAID that is backported from Apache trunk to a
    0.20 based release. Our code is in
    https://github.com/facebook/hadoop-20-warehouse/tree/master/src/contrib/raid
    But I do not have an elegant way to contribute this code to
    Apache 0.20.2xx.x.

    thanks,
    dhruba


    On Sat, Sep 17, 2011 at 9:16 AM, Andrew Purtell wrote:

    Hi Dhruba,

    Would you consider a contribution of this to branch-0.20-security
    aka 0.20.2xx.x?

    If I am mistaken and you do not have a 0.22-ish HDFS RAID backported to
    an 0.20-ish platform, please disregard.

    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein
    (via Tom White)
    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: hdfs-user@hadoop.apache.org; Andrew Purtell <apurtell@apache.org>
    Sent: Thursday, September 15, 2011 10:14 AM

    Subject: Re: Need help regarding HDFS-RAID



    That's right Andy. 0.22+. We are running a HDFS-RAID code base that is
    pretty close to what is available in Apache hdfs trunk.

    -dhruba


    On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell wrote:

    But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?

    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet
    Hein (via Tom White)
    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Thursday, September 15, 2011 10:06 AM
    Subject: Re: Need help regarding HDFS-RAID



    We use HDFS RAID in a big way. Data older than 12 days are RAIDED
    using XOR encoding (effective replication of 2.5). Data older than a few
    months are raided using ReedSolomon (effective observed replication factor
    of 1.5). This is running on our 60 PB size cluster for about an year now.

    thanks
    dhruba



    On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi <
    ajit.ratnaparkhi@gmail.com> wrote:
    Hi,

    We were planning to use it for past data archival(instead of moving
    it to archival store).
    Archiving it in HDFS gives advantage of making it easily available
    for processing whenever required.

    Is there any archival solution in hadoop ecosystem?


    thanks,
    Ajit.



    On Thu, Sep 15, 2011 at 5:05 PM, Harsh J wrote:

    Hey Ajit,
    HDFS-RAID was never part of the 0.20 release. It made its debut in
    the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] -
    http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/

    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official
    hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and
    installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in
    production?
    thanks,
    Ajit.


    --
    Harsh J


    --
    Connect to me at http://www.facebook.com/dhruba



    --
    Connect to me at http://www.facebook.com/dhruba



    --
    Connect to me at http://www.facebook.com/dhruba



    --
    Connect to me at http://www.facebook.com/dhruba
  • Andrew Purtell at Sep 20, 2011 at 11:11 pm
    I will be very grateful to you if you merge and contribute it to Apache Hadoop 0.20.2xx.x.
    Hmm... I see what you mean. I was naive about what is "branch-20-warehouse". I was looking for an updated HDFS RAID that incorporated R-S coding but ran against a 20-ish HDFS. I suppose it is relatively easy to have a HDFS RAID close to what is in trunk if HDFS has evolved in your branch. :-)


    It looks like the changes to HDFS can be teased apart as:

    - BlockMissingException

    - Listing file status and block locations: LocatedFileStatus, FileSystem.listLocatedStatus


    - Corrupt file reporting
    - Changes to FSNameSystem and UnderReplicatedBlocks for tracking and reporting corrupt blocks

    - Update to the ClientProtocol for listing corrupt file blocks: listCorruptFileBlocks()

    - DFSUtil.getCorruptFiles


    - Change visibility and constructor for datanode.BlockSender so RAID can send repaired blocks without needing to be a DataNode or without reimplementing the packet protocol


    - A set of quite invasive changes to the NameNode dealing with pluggable block placement policies, but RAID could possibly live without this, the PlacementMonitor would have more work to do in that case


    I suppose the upside to any consideration for back porting all of this into an 0.20.2xx is all of the above has already gone through trunk.


    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: hdfs-user@hadoop.apache.org; Andrew Purtell <apurtell@apache.org>
    Sent: Tuesday, September 20, 2011 9:49 AM
    Subject: Re: Need help regarding HDFS-RAID


    Hi Andy,


    I will be very grateful to you if you merge and contribute it to Apache Hadoop 0.20.2xx.x.


    thanks,
    dhruba


    On Tue, Sep 20, 2011 at 9:03 AM, Andrew Purtell wrote:

    Hi Dhruba,
    Thanks for the pointer. I'm going to try and pull this code into our internal 20-ish distro. Would you object if I make a contribution of that result if it is successful?



    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: Andrew Purtell <apurtell@apache.org>
    Cc: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
    Sent: Tuesday, September 20, 2011 2:18 AM
    Subject: Re: Need help regarding HDFS-RAID


    Hi andy,


    we do run a version of HDFS RAID that is backported from Apache trunk to a 0.20 based release. Our code is in https://github.com/facebook/hadoop-20-warehouse/tree/master/src/contrib/raid
    But I do not have an elegant way to contribute this code to Apache 0.20.2xx.x.


    thanks,
    dhruba


    On Sat, Sep 17, 2011 at 9:16 AM, Andrew Purtell wrote:

    Hi Dhruba,

    Would you consider a contribution of this to branch-0.20-security aka 0.20.2xx.x?


    If I am mistaken and you do not have a 0.22-ish HDFS RAID backported to an 0.20-ish platform, please disregard.


    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: hdfs-user@hadoop.apache.org; Andrew Purtell <apurtell@apache.org>
    Sent: Thursday, September 15, 2011 10:14 AM

    Subject: Re: Need help regarding HDFS-RAID



    That's right Andy. 0.22+. We are running a HDFS-RAID code base that is pretty close to what is available in Apache hdfs trunk.


    -dhruba


    On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell wrote:

    But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?

    Best regards,


    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)

    ________________________________
    From: Dhruba Borthakur <dhruba@gmail.com>
    To: hdfs-user@hadoop.apache.org
    Sent: Thursday, September 15, 2011 10:06 AM
    Subject: Re: Need help regarding HDFS-RAID



    We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR encoding (effective replication of 2.5). Data older than a few months are raided using ReedSolomon (effective observed replication factor of 1.5). This is running on our 60 PB size cluster for about an year now.


    thanks
    dhruba



    On Thu, Sep 15, 2011 at 5:31 AM, Ajit Ratnaparkhi wrote:

    Hi,

    We were planning to use it for past data archival(instead of moving it to archival store).
    Archiving it in HDFS gives advantage of making it easily available for processing whenever required.


    Is there any archival solution in hadoop ecosystem?


    thanks,
    Ajit.



    On Thu, Sep 15, 2011 at 5:05 PM, Harsh J wrote:

    Hey Ajit,
    HDFS-RAID was never part of the 0.20 release. It made its debut in the
    0.21 release [1]. I know that Facebook uses it (and also did develop
    it), but unsure of users beyond Facebook.

    While 0.21 overall is not entirely deemed as production-usable yet
    (and is in fact, possibly abandoned for efforts on 0.22+), you can
    give that release a whirl on a test cluster and see for yourself if
    your need beats the stability.

    Just curious though - why are you looking to use this specifically?

    [1] - http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21/mapreduce/src/contrib/raid/


    On Thu, Sep 15, 2011 at 4:37 PM, Ajit Ratnaparkhi
    wrote:
    Hi,
    We want to use HDFS-RAID in our production cluster.
    (http://wiki.apache.org/hadoop/HDFS-RAID)
    I am not able to find source/binaries/configs for this in official hadoop
    distribution from apache hadoop. (checked in 0.20.1 and 0.20.2).
    Can somebody please tell me where can I find that? and installation
    procedure?
    Also, is HDFS-RAID implementation stable enough to use in production?
    thanks,
    Ajit.


    --
    Harsh J


    --
    Connect to me at http://www.facebook.com/dhruba



    --
    Connect to me at http://www.facebook.com/dhruba



    --
    Connect to me at http://www.facebook.com/dhruba



    --
    Connect to me at http://www.facebook.com/dhruba

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedSep 15, '11 at 11:07a
activeSep 20, '11 at 11:11p
posts15
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase