FAQ
Hi!

I am very happy to announce a new package, that is able to perform
Reed-Solomon Erasure code creation at several Gigabytes per second. This
will allow you to create real-time parity sets on your file storage.

The original intention with the package was to port the recently announced
Reed-Solomon library created by Backblaze, and it remains compatible with
that. The package does not use cgo, but uses AMD64 SSE3 assembler to
achieve the fast encoding.

Package home: https://github.com/klauspost/reedsolomon
Godoc: https://godoc.org/github.com/klauspost/reedsolomon
Development Log:
http://blog.klauspost.com/blazingly-fast-reed-solomon-coding/


I hope the package interface is as easy and straightforward to use as I
intended it to be, so I am very interesting in feedback on that. There are
some underlying "rules" of using RS, that cannot be conveyed easily in the
interface, but I hope the documentation and examples can help that.

Feedback, questions and comments are very welcome!


/Klaus


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Axel Wagner at Jun 22, 2015 at 4:04 pm
    Awesome! I wanted something like this for a while now.

    May I ask what kind of decoder you use? I assume from the
    description you use a systematic code? And do you plan to add an
    error-decoder (as opposed to just an erasure decoder, like now)?
    Because in theory you can detect and correct up to parity/2 errors
    too and I would consider that a very usefull thing.

    Lastly, something to clarify in the docs, is the total number of
    shards. i.e. with
    New(42, 23)
    will the encoder use a) 42 shards plus 23 parity shards, so 65 in total
    or b) 42 shards of which 23 are used for parity shards, so 42 in total,
    of which 19 are usable for data? I assume you do the first, but the
    literature diverges in notation on this, so some clarification is
    usefull, I think.

    Best,

    Axel

    Klaus Post <klauspost@gmail.com> writes:
    Hi!

    I am very happy to announce a new package, that is able to perform
    Reed-Solomon Erasure code creation at several Gigabytes per second. This
    will allow you to create real-time parity sets on your file storage.

    The original intention with the package was to port the recently announced
    Reed-Solomon library created by Backblaze, and it remains compatible with
    that. The package does not use cgo, but uses AMD64 SSE3 assembler to
    achieve the fast encoding.

    Package home: https://github.com/klauspost/reedsolomon
    Godoc: https://godoc.org/github.com/klauspost/reedsolomon
    Development Log:
    http://blog.klauspost.com/blazingly-fast-reed-solomon-coding/


    I hope the package interface is as easy and straightforward to use as I
    intended it to be, so I am very interesting in feedback on that. There are
    some underlying "rules" of using RS, that cannot be conveyed easily in the
    interface, but I hope the documentation and examples can help that.

    Feedback, questions and comments are very welcome!


    /Klaus


    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Klaus Post at Jun 22, 2015 at 5:32 pm
    Hi!
    On Monday, 22 June 2015 18:04:32 UTC+2, Axel Wagner wrote:

    Awesome! I wanted something like this for a while now.
    Great to hear!

    May I ask what kind of decoder you use? I assume from the
    description you use a systematic code?

    I am not deeply into the math, I mostly did a lot of tests to ensure that
    it aligns with the java client.

    You can read more about it here:
    https://www.backblaze.com/blog/reed-solomon/

    From what I understand of it - and please forgive me for my layman terms, I
    only understand it on an operational level: It computes a (reversible)
    Vandermonde matrix, which it uses with an 8 bit Galois Field to encode the
    erasure codes. I believe this paper was the basis of the original
    implementation: http://web.eecs.utk.edu/~plank/plank/papers/SPE-9-97.html


    And do you plan to add an
    error-decoder (as opposed to just an erasure decoder, like now)?
    Because in theory you can detect and correct up to parity/2 errors
    too and I would consider that a very usefull thing.
    I don't have any current plans, but that is because I don't know what it
    is- could you perhaps give an example?


    Lastly, something to clarify in the docs, is the total number of [...]
    I have adjusted the docs to hopefully make that a bit more clear. 'Data
    shards' should always refer to the data you want to encode and 'parity
    shards' to the calculated parity. It might be cleaner to wrap this type
    into a separate (struct) type, but on the other hand I like the directness
    of just supplying byte arrays.


    Best,

    Axel
    Thanks for the feedback!

    /Klaus
    Klaus Post <klau...@gmail.com <javascript:>> writes:
    Hi!

    I am very happy to announce a new package, that is able to perform
    Reed-Solomon Erasure code creation at several Gigabytes per second. This
    will allow you to create real-time parity sets on your file storage.

    The original intention with the package was to port the recently announced
    Reed-Solomon library created by Backblaze, and it remains compatible with
    that. The package does not use cgo, but uses AMD64 SSE3 assembler to
    achieve the fast encoding.

    Package home: https://github.com/klauspost/reedsolomon
    Godoc: https://godoc.org/github.com/klauspost/reedsolomon
    Development Log:
    http://blog.klauspost.com/blazingly-fast-reed-solomon-coding/


    I hope the package interface is as easy and straightforward to use as I
    intended it to be, so I am very interesting in feedback on that. There are
    some underlying "rules" of using RS, that cannot be conveyed easily in the
    interface, but I hope the documentation and examples can help that.

    Feedback, questions and comments are very welcome!


    /Klaus


    --
    You received this message because you are subscribed to the Google
    Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to golang-nuts...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Axel Wagner at Jun 22, 2015 at 5:52 pm

    Klaus Post writes:
    I am not deeply into the math, I mostly did a lot of tests to ensure that
    it aligns with the java client.

    You can read more about it here:
    https://www.backblaze.com/blog/reed-solomon/
    Yipp, I have read that in the meantime, interesting read, though
    they indeed don't go very far into the math.
    From what I understand of it - and please forgive me for my layman terms, I
    only understand it on an operational level: It computes a (reversible)
    Vandermonde matrix, which it uses with an 8 bit Galois Field to encode the
    erasure codes.
    Yipp, that's how I read it too :)
    I don't have any current plans, but that is because I don't know what it
    is- could you perhaps give an example?
    It is a pretty hard problem to do, which is why I think there aren't a
    lot of implementations out there. The difference is basically, that you
    assume, that I know which blocks are corrupted. But in theory, an
    algorithm can find that out, as long the numbers of errors is below a
    certain threshold. Think of it as losing a harddrive vs. bitrot.

    Reed and Solomon described a theoretical decoder, that explains pretty
    well why that works in theory, which is also described in the wikipedia
    [0]. It's completely impractical though, so other, more efficient (but
    also more expensive) correction algorithms were developed later.

    In any case, I didn't really expect you to implement that (but it would
    be cool if), because it's pretty non-trivial and involves a lot of
    math. I actually tried to do it once and failed miserably :) I guess for
    now another way to detect errors will have to do :)
    I have adjusted the docs to hopefully make that a bit more clear.
    Cool.

    So again, very nice package, I will keep it bookmarked for later use :)

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Klaus Post at Jun 22, 2015 at 6:14 pm

    On Monday, 22 June 2015 19:53:10 UTC+2, Axel Wagner wrote:
    It is a pretty hard problem to do, which is why I think there aren't a
    lot of implementations out there. The difference is basically, that you
    assume, that I know which blocks are corrupted. But in theory, an
    algorithm can find that out, as long the numbers of errors is below a
    certain threshold. Think of it as losing a harddrive vs. bitrot.
    Ah, ok. I haven't heard about that. Definitely sounds interesting.

    I considered adding a per shard hash as part of the library and help keep
    track of shard number and order, but I couldn't come up with something I
    felt could easily be used in a lot of different scenarios, so for now it is
    up to the library user to ensure validity of the individual shards. Either
    way, I think that will go into a separate package - the only thing I have
    that I might do is to add a simpler way of "streaming" data to the encoder.

    /Klaus


    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nick Craig-Wood at Jun 23, 2015 at 8:45 am

    On 22/06/15 19:14, Klaus Post wrote:
    On Monday, 22 June 2015 19:53:10 UTC+2, Axel Wagner wrote:

    It is a pretty hard problem to do, which is why I think there aren't a
    lot of implementations out there. The difference is basically, that you
    assume, that I know which blocks are corrupted. But in theory, an
    algorithm can find that out, as long the numbers of errors is below a
    certain threshold. Think of it as losing a harddrive vs. bitrot.

    Ah, ok. I haven't heard about that. Definitely sounds interesting.
    Erasure codes are usually used when you know don't have the data at all.
      This is extra information provided by a checksum of a packet or the
    disk being missing. Reed-Solomon is a fine erasure encoder (it is Optimal).

    For a Reed-Solomon code if you can deal with 2k+1 erasures then you can
    only deal with k corrections. There are other codes which will do much
    better than this. It has its advantages though as it is easy to
    implement in software over GF(256).

    If you are interested in RS coding, then Russ Cox wrote lots of stuff
    about it

    http://research.swtch.com/field

    And also a QR code (which uses RS codes) encoder in go (but not
    decoder?) here

    https://godoc.org/code.google.com/p/rsc/qr

    The general case of decoding reed solomon codes is hard. Many years a
    go I implemented the Berelekamp Massey decoder (which was used for
    decoding RS codes over satellite)

    https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Berlekamp.E2.80.93Massey_decoder

    There may be better algorithms now though.

    If I were doing the same things today, I'd consider a low density parity
    check code which perform much better. See David MacKay's book for a
    good introduction

    http://www.inference.phy.cam.ac.uk/mackay/itila/
    I considered adding a per shard hash as part of the library and help
    keep track of shard number and order, but I couldn't come up with
    something I felt could easily be used in a lot of different scenarios,
    so for now it is up to the library user to ensure validity of the
    individual shards.
    And that is as it should be for a low level library. The user might be
    packing the data into Ethernet packets and relying on the 32 bit CRC to
    check for corruptions, or storing in files, you just don't know...
    Either way, I think that will go into a separate
    package - the only thing I have that I might do is to add a simpler way
    of "streaming" data to the encoder.
    In the docs you wrote on the Reconstruct function

         //
         // The reconstructed shard set is complete, but integrity is not
    verified.
         // Use the Verify function to check if data set is ok.

    The Verify function should always pass after you do error decoding so it
    isn't an integrity check in any way.

    If you want an additional integrity check then the user of the library
    should add one with a CRC/MD5/SHA1 etc - calling Verify after
    Reconstruct is just a waste of CPU cycles!

    (In general you can use codes for detecting errors, or correcting
    errors, but not both!)

    --
    Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Klaus Post at Jun 23, 2015 at 9:48 am

    On Tuesday, 23 June 2015 10:46:10 UTC+2, Nick Craig-Wood wrote:
    If you are interested in RS coding, then Russ Cox wrote lots of stuff
    about it

    http://research.swtch.com/field
    Very nice introduction, and by a Gopher too :)

    It is very interesting, even though much of the theory behind it is way
    above my math level.

    In terms of speed, it is unfair to compare Russ' 22MB/s directly to my
    1GB/s, since his benchmarks are on a much smaller datasets.


    The Verify function should always pass after you do error decoding so it
    isn't an integrity check in any way.
    Good spot. And you are right, it cannot be assumed to be an integrity
    check. However it is more complex.

    If you Reconstruct and you are missing LESS than the number of shards you
    have parity for, the Verify will be able to detect data corruptions, since
    parity will not line up.

    If you run reconstruct, and you are missing the same number of shards you
    have parity for, the Verify function will always succeed, even if the data
    has been corrupted, since the missing shards will be made to match the
    corrupted data.

    But I think it should be made much clearer from the documentation, that
    'Verify' is sort of a red herring, when running it after a 'Reconstruct',
    and you should rely on checksums.
    par2 package [...]
    Yes, I have used par2 for many years for backups - especially when I did
    "long-time-storage" on DVD's. PAR2 is really good, although it has some
    rare failure states, which should be fixed in the algorithm used in this
    package.

    Great feedback! Thanks!

    Nick Craig-Wood <ni...@craig-wood.com <javascript:>> --
    /Klaus

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Vasiliy Tolstov at Jun 22, 2015 at 9:38 pm

    2015-06-22 18:48 GMT+03:00 Klaus Post <klauspost@gmail.com>:
    Hi!

    I am very happy to announce a new package, that is able to perform
    Reed-Solomon Erasure code creation at several Gigabytes per second. This
    will allow you to create real-time parity sets on your file storage.

    The original intention with the package was to port the recently announced
    Reed-Solomon library created by Backblaze, and it remains compatible with
    that. The package does not use cgo, but uses AMD64 SSE3 assembler to achieve
    the fast encoding.

    Package home: https://github.com/klauspost/reedsolomon
    Godoc: https://godoc.org/github.com/klauspost/reedsolomon
    Development Log:
    http://blog.klauspost.com/blazingly-fast-reed-solomon-coding/


    I hope the package interface is as easy and straightforward to use as I
    intended it to be, so I am very interesting in feedback on that. There are
    some underlying "rules" of using RS, that cannot be conveyed easily in the
    interface, but I hope the documentation and examples can help that.

    Feedback, questions and comments are very welcome!

    Wow, i'm already very happy with pgzip and now with this package i'm
    happy twice =)

    --
    Vasiliy Tolstov,
    e-mail: v.tolstov@selfip.ru

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nick Craig-Wood at Jun 23, 2015 at 8:36 am

    On 22/06/15 16:48, Klaus Post wrote:
    I am very happy to announce a new package, that is able to perform
    Reed-Solomon Erasure code creation at several Gigabytes per second. This
    will allow you to create real-time parity sets on your file storage.

    The original intention with the package was to port the recently
    announced Reed-Solomon library created by Backblaze, and it remains
    compatible with that. The package does not use cgo, but uses AMD64 SSE3
    assembler to achieve the fast encoding.

    Package home: https://github.com/klauspost/reedsolomon
    Godoc: https://godoc.org/github.com/klauspost/reedsolomon
    Development
    Log: http://blog.klauspost.com/blazingly-fast-reed-solomon-coding/


    I hope the package interface is as easy and straightforward to use as I
    intended it to be, so I am very interesting in feedback on that. There
    are some underlying "rules" of using RS, that cannot be conveyed easily
    in the interface, but I hope the documentation and examples can help that.

    Feedback, questions and comments are very welcome!
    Very nice package!

    If you wanted to implement a command line tool which uses it, you might
    want to check out the par tool (par2 package in ubuntu)

    http://parchive.sourceforge.net/

    There seems to be a specification for writing files to disk too...

    --
    Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedJun 22, '15 at 3:48p
activeJun 23, '15 at 9:48a
posts9
users4
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase