FAQ
Hi!

This is discussion for CL
21021: https://go-review.googlesource.com/#/c/21021/

What I am proposing in the CL is to replace the levels 1 to 3 with
specialized versions, that are significantly faster. Usually throughput is
around 1.7x of the standard encoder. This does comes at a compression cost;
usually the reduction loss is between 2 and 3 percent compared to the
original level 1,2 and 3.

This affects the following other packages: "compress/gzip", "compres/zlib",
"archive/zip", "image/png". My main target has been webservers, but I have
also had talks with the Docker crew, who would greatly appreciate faster
deflate for disk image creation. Also Russ Cox has proposed zip for object
file archives, which will also be affected by this change.

My rationale for this is that levels 1 to 3 indicate that we want the best
speed. Typically the user has done this be selecting flate.BestSpeed (or
similar), so they have already indicated they are willing to sacrifice
compression efficiency for better speed.

My main discussion points:
* Is it worth the additional code complexity for a 1.7x speed up?
* Is the 2-3% compression loss worth the speedup?
* Do you see any other downsides, or do you see any places where this would
benefit you?


Other discussion points:
* In the future the remaining levels 4-8 should be re-adjusted, but that
isn't strictly needed now.


DETAILED TEST SET DESCRIPTION:

---> Web content. This benchmark compresses a selection of individual HTML,
JS, CSS, SVG and small individual JSON files. This was selected to give an
indication of typical web server performance on content that is likely to
be gzip encoded.

Web content, level 1: 2.5% less compression reduction. 2.6x speedup.
Web content, level 2: 2.1% less compression reduction. 2.3x speedup.
Web content, level 3: 2.1% less compression reduction. 2.4x speedup.

- Test set: http://files.klauspost.com/sites.7z


---> A huge JSON stream, containing highly repetitive JSON. Typically
compresses around 95%. This content could represent a database dump, a
network stream or similar content.

JSON, level 1: 1.3% less compression reduction. 2.6x speedup.
JSON, level 2: 0.1% more compression reduction. 1.9x speedup.
JSON, level 3: 0.1% more compression reduction. 1.9x speedup.

- Test set: http://5.9.40.76/static/dicts/json-testset.gz


---> enwik9 is a standard Corpus, containing the first 10^9 bytes of the
XML text dump of the English version of Wikipedia on Mar. 3, 2006 used by
the "Large Text Compression benchmark". See more
here: http://mattmahoney.net/dc/text.html

enwik9, level 1: 2.7% less compression reduction. 1.8x speedup.
enwik9, level 2: 2.8% less compression reduction. 1.9x speedup.
enwik9, level 3: 2.6% less compression reduction. 2.0x speedup.

- Test set: http://mattmahoney.net/dc/enwik9.zip


---> 10GB is a standard test set, that represents a typical backup
scenario. The test data is designed to test archivers in realistic backup
scenarios with lots of already-compressed or hard to compress files and
lots of duplicate or nearly identical files. It consists of exactly 10 GB
(1010) bytes in 79,431 files in 4006 directories. For this test, a TAR file
has been created. See http://mattmahoney.net/dc/10gb.html

10gb, level 1: 3.1% less compression reduction. 3.0x speedup.
10gb, level 2: 3.1% less compression reduction. 3.2x speedup.
10gb, level 3: 3.4% less compression reduction. 3.4x speedup.

- Test set: http://mattmahoney.net/dc/10gb.html


---> Random data, should be uncompressible. Could be JPG, MP3, MP4, MKV
files in the "real world".

random, level 1: 0.0% less compression reduction. 19.5x speedup.
random, level 2: 0.0% less compression reduction. 19.2x speedup.
random, level 3: 0.0% less compression reduction. 19.4x speedup.

Test set: http://mattmahoney.net/dc/#sharnd - 200000000 bytes


Your input is appreciated.

/Klaus

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Antoinegrondin at Mar 22, 2016 at 5:39 pm
    The compromise makes a lot of sense to me. I had to use the `cgzip`
    bindings in many projects, adding complexity to builds and preventing the
    use of many tools like gorename, so that I could get a faster "BestSpeed"
    story.

    If the default `compress/gzip` implementation is ~1.7x faster for at most
    3% less compression, that's a huge win in all the cases I have that require
    `BestSpeed`.
    On Tuesday, 22 March 2016 06:53:33 UTC-4, klau...@gmail.com wrote:

    Hi!

    This is discussion for CL 21021:
    https://go-review.googlesource.com/#/c/21021/

    What I am proposing in the CL is to replace the levels 1 to 3 with
    specialized versions, that are significantly faster. Usually throughput is
    around 1.7x of the standard encoder. This does comes at a compression cost;
    usually the reduction loss is between 2 and 3 percent compared to the
    original level 1,2 and 3.

    This affects the following other packages: "compress/gzip",
    "compres/zlib", "archive/zip", "image/png". My main target has been
    webservers, but I have also had talks with the Docker crew, who would
    greatly appreciate faster deflate for disk image creation. Also Russ Cox
    has proposed zip for object file archives, which will also be affected by
    this change.

    My rationale for this is that levels 1 to 3 indicate that we want the best
    speed. Typically the user has done this be selecting flate.BestSpeed (or
    similar), so they have already indicated they are willing to sacrifice
    compression efficiency for better speed.

    My main discussion points:
    * Is it worth the additional code complexity for a 1.7x speed up?
    * Is the 2-3% compression loss worth the speedup?
    * Do you see any other downsides, or do you see any places where this
    would benefit you?


    Other discussion points:
    * In the future the remaining levels 4-8 should be re-adjusted, but that
    isn't strictly needed now.


    DETAILED TEST SET DESCRIPTION:

    ---> Web content. This benchmark compresses a selection of individual
    HTML, JS, CSS, SVG and small individual JSON files. This was selected to
    give an indication of typical web server performance on content that is
    likely to be gzip encoded.

    Web content, level 1: 2.5% less compression reduction. 2.6x speedup.
    Web content, level 2: 2.1% less compression reduction. 2.3x speedup.
    Web content, level 3: 2.1% less compression reduction. 2.4x speedup.

    - Test set: http://files.klauspost.com/sites.7z


    ---> A huge JSON stream, containing highly repetitive JSON. Typically
    compresses around 95%. This content could represent a database dump, a
    network stream or similar content.

    JSON, level 1: 1.3% less compression reduction. 2.6x speedup.
    JSON, level 2: 0.1% more compression reduction. 1.9x speedup.
    JSON, level 3: 0.1% more compression reduction. 1.9x speedup.

    - Test set: http://5.9.40.76/static/dicts/json-testset.gz


    ---> enwik9 is a standard Corpus, containing the first 10^9 bytes of the
    XML text dump of the English version of Wikipedia on Mar. 3, 2006 used by
    the "Large Text Compression benchmark". See more here:
    http://mattmahoney.net/dc/text.html

    enwik9, level 1: 2.7% less compression reduction. 1.8x speedup.
    enwik9, level 2: 2.8% less compression reduction. 1.9x speedup.
    enwik9, level 3: 2.6% less compression reduction. 2.0x speedup.

    - Test set: http://mattmahoney.net/dc/enwik9.zip


    ---> 10GB is a standard test set, that represents a typical backup
    scenario. The test data is designed to test archivers in realistic backup
    scenarios with lots of already-compressed or hard to compress files and
    lots of duplicate or nearly identical files. It consists of exactly 10 GB
    (1010) bytes in 79,431 files in 4006 directories. For this test, a TAR file
    has been created. See http://mattmahoney.net/dc/10gb.html

    10gb, level 1: 3.1% less compression reduction. 3.0x speedup.
    10gb, level 2: 3.1% less compression reduction. 3.2x speedup.
    10gb, level 3: 3.4% less compression reduction. 3.4x speedup.

    - Test set: http://mattmahoney.net/dc/10gb.html


    ---> Random data, should be uncompressible. Could be JPG, MP3, MP4, MKV
    files in the "real world".

    random, level 1: 0.0% less compression reduction. 19.5x speedup.
    random, level 2: 0.0% less compression reduction. 19.2x speedup.
    random, level 3: 0.0% less compression reduction. 19.4x speedup.

    Test set: http://mattmahoney.net/dc/#sharnd - 200000000 bytes


    Your input is appreciated.

    /Klaus
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Russ Cox at Mar 22, 2016 at 5:54 pm
    Hi Klaus,

    Can you post a link the graphs you showed me a month or two ago? (Maybe it
    was a blog post, but maybe it was a spreadsheet, I forget.) I thought they
    made a very compelling case for shifting the levels.

    Thanks.
    Russ

    On Tue, Mar 22, 2016 at 12:39 PM wrote:

    The compromise makes a lot of sense to me. I had to use the `cgzip`
    bindings in many projects, adding complexity to builds and preventing the
    use of many tools like gorename, so that I could get a faster "BestSpeed"
    story.

    If the default `compress/gzip` implementation is ~1.7x faster for at most
    3% less compression, that's a huge win in all the cases I have that require
    `BestSpeed`.

    On Tuesday, 22 March 2016 06:53:33 UTC-4, klau...@gmail.com wrote:

    Hi!

    This is discussion for CL 21021:
    https://go-review.googlesource.com/#/c/21021/

    What I am proposing in the CL is to replace the levels 1 to 3 with
    specialized versions, that are significantly faster. Usually throughput is
    around 1.7x of the standard encoder. This does comes at a compression cost;
    usually the reduction loss is between 2 and 3 percent compared to the
    original level 1,2 and 3.

    This affects the following other packages: "compress/gzip",
    "compres/zlib", "archive/zip", "image/png". My main target has been
    webservers, but I have also had talks with the Docker crew, who would
    greatly appreciate faster deflate for disk image creation. Also Russ Cox
    has proposed zip for object file archives, which will also be affected by
    this change.

    My rationale for this is that levels 1 to 3 indicate that we want the
    best speed. Typically the user has done this be selecting flate.BestSpeed
    (or similar), so they have already indicated they are willing to sacrifice
    compression efficiency for better speed.

    My main discussion points:
    * Is it worth the additional code complexity for a 1.7x speed up?
    * Is the 2-3% compression loss worth the speedup?
    * Do you see any other downsides, or do you see any places where this
    would benefit you?


    Other discussion points:
    * In the future the remaining levels 4-8 should be re-adjusted, but that
    isn't strictly needed now.


    DETAILED TEST SET DESCRIPTION:

    ---> Web content. This benchmark compresses a selection of individual
    HTML, JS, CSS, SVG and small individual JSON files. This was selected to
    give an indication of typical web server performance on content that is
    likely to be gzip encoded.

    Web content, level 1: 2.5% less compression reduction. 2.6x speedup.
    Web content, level 2: 2.1% less compression reduction. 2.3x speedup.
    Web content, level 3: 2.1% less compression reduction. 2.4x speedup.

    - Test set: http://files.klauspost.com/sites.7z


    ---> A huge JSON stream, containing highly repetitive JSON. Typically
    compresses around 95%. This content could represent a database dump, a
    network stream or similar content.

    JSON, level 1: 1.3% less compression reduction. 2.6x speedup.
    JSON, level 2: 0.1% more compression reduction. 1.9x speedup.
    JSON, level 3: 0.1% more compression reduction. 1.9x speedup.

    - Test set: http://5.9.40.76/static/dicts/json-testset.gz


    ---> enwik9 is a standard Corpus, containing the first 10^9 bytes of the
    XML text dump of the English version of Wikipedia on Mar. 3, 2006 used by
    the "Large Text Compression benchmark". See more here:
    http://mattmahoney.net/dc/text.html

    enwik9, level 1: 2.7% less compression reduction. 1.8x speedup.
    enwik9, level 2: 2.8% less compression reduction. 1.9x speedup.
    enwik9, level 3: 2.6% less compression reduction. 2.0x speedup.

    - Test set: http://mattmahoney.net/dc/enwik9.zip


    ---> 10GB is a standard test set, that represents a typical backup
    scenario. The test data is designed to test archivers in realistic backup
    scenarios with lots of already-compressed or hard to compress files and
    lots of duplicate or nearly identical files. It consists of exactly 10 GB
    (1010) bytes in 79,431 files in 4006 directories. For this test, a TAR file
    has been created. See http://mattmahoney.net/dc/10gb.html

    10gb, level 1: 3.1% less compression reduction. 3.0x speedup.
    10gb, level 2: 3.1% less compression reduction. 3.2x speedup.
    10gb, level 3: 3.4% less compression reduction. 3.4x speedup.

    - Test set: http://mattmahoney.net/dc/10gb.html


    ---> Random data, should be uncompressible. Could be JPG, MP3, MP4, MKV
    files in the "real world".

    random, level 1: 0.0% less compression reduction. 19.5x speedup.
    random, level 2: 0.0% less compression reduction. 19.2x speedup.
    random, level 3: 0.0% less compression reduction. 19.4x speedup.

    Test set: http://mattmahoney.net/dc/#sharnd - 200000000 bytes


    Your input is appreciated.

    /Klaus
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Klauspost at Mar 22, 2016 at 6:17 pm

    On Tuesday, 22 March 2016 18:39:28 UTC+1, antoine...@gmail.com wrote:
    The compromise makes a lot of sense to me. I had to use the `cgzip`
    bindings in many projects, adding complexity to builds and preventing the
    use of many tools like gorename, so that I could get a faster "BestSpeed"
    story.
    In regards to cgzip, the new levels 1-3 pretty much match the cgzip on
    easy-to-compress content. That being 'web content', "JSON" and 'enwik9' -
    especially with the SSA optimizations coming in. "Mixed content" (10GB) is
    approximately 2x the speed of cgzip, likely because it is quicker at
    skipping hard to compress sections.

    @rsc: I don't have it for this specific revision, but the overall pictures
    is: http://imgur.com/a/5sWM4

    Horizontal is compression (note this is a zoomed in representation), and
    vertically is throughput in MB/s. Since not some changes are left out,
    level 4-9 throughput is probably about 10-20% lower than these numbers.

    However, to me they indicate that there is little need for levels 7 and 8,
    and level 6 (the current default) is not a good trade-off between speed and
    compression throughput.

    Currently I have hit this target: http://i.imgur.com/dGvgnIf.png (10gb.tar
    testset) - I have written my considerations for how the compression/speed
    has been balanced
    here: https://blog.klauspost.com/rebalancing-deflate-compression-levels/


    /Klaus

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Mark Kubacki at Mar 22, 2016 at 7:31 pm
    This mixes 'flavour' into the dimension described by fast–best. In the
    light of zopfli being another potential candidate, perhaps we can find a
    way to express 'flavour/variant' as additional distinct dimension?

    Similar to "encoding.base64"'s StdEncoding, URLEncoding; only that there is
    no guarantee which one is used if none is explicitly asked for. (Think: 1–4
    Klaus', 7–9 zopfli or similar; subject to change with fashion and prevalent
    performance.)

    --
    Mark

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Klauspost at Mar 23, 2016 at 8:39 am

    @Mark:

    In the light of zopfli being another potential candidate [...]
    I can see your point. However, I think the performance degradation of
    zopfli would be too big to be acceptable as a replacement for level 9.
    Luckily we don't have the same restrictions as with the lower levels, so
    zopfli could be implemented as level 10-12 or something similar. That way
    we don't explode CPU usage for people using level 9, while still
    maintaining the option to go to "11" for something like precompressed
    static files. That said, it probably should be in a 3rd party package.

    @minux
    Nice speedups indeed. Does this require any assembly code?
    No - I have experimented with replacing the matcher with SSE 4.2 code, but
    the call overhead usually outweighs the benefits, and only for the revised
    level 3 does it give a persistent speedup in the area of 5% - which really
    doesn't match the added complexity. The main problem is that SSE 4.2 only
    really speeds up the cases that are already fast (long matches).

    I am mostly interested in speeding up the "slowest" cases, and this applies
    a bunch of tricks that helps this case a lot. Also the new SSA compiler
    will help eliminate most bounds checks, so I don't currently plan this to
    have any assembler added in the future.

    Most speedups are from simpler LZ77 matching, being based on the main loop
    from Snappy, some heuristic skipping and selective Huffman encoding.


    /Klaus

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Minux at Mar 22, 2016 at 11:25 pm
    Nice speedups indeed. Does this require any assembly code?

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Marvin Stenger at Mar 23, 2016 at 12:07 am

    This is discussion for CL 21021:
    https://go-review.googlesource.com/#/c/21021/
    ...no.


    Am Mittwoch, 23. März 2016 00:25:04 UTC+1 schrieb minux:
    Nice speedups indeed. Does this require any assembly code?
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Peter Waller at Mar 23, 2016 at 9:51 am

    On 22 March 2016 at 10:53, wrote:

    What I am proposing in the CL is to replace the levels 1 to 3 with
    specialized versions, that are significantly faster. Usually throughput is
    around 1.7x of the standard encoder. This does comes at a compression cost;
    usually the reduction loss is between 2 and 3 percent compared to the
    original level 1,2 and 3.
    This sounds great. I have also noticed the slowness of BestSpeed and have
    been tempted to use cgzip at times, so a big +1 from me.

    One question: You mention SSA "will bring improvements", so it is unclear
    to me if you mean future improvements to SSA, or that the benchmarks you
    quote are without SSA? I assume from the fact that you mention numbers in
    the CL that they are comparing to the existing performance with SSA.

    From Keith's "SSA performance numbers" mailing list post he showed Gzip
    (compress) was ~1/(1 - 0.1397) = 1.16x faster, if I understand correctly.
    Unclear to me which speed parameter though.

    So, does that mean the performance benefit we will get compared to go1.6
    may be even greater? (Optimistic ballpark 1.16 * 1.7 is ~2, which feels
    like a big win!)

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nigel Tao at Mar 23, 2016 at 11:28 am
    I'm OK with this in principle, with a few remarks below.

    Re the overall pictures at http://imgur.com/a/5sWM4 and at
    https://blog.klauspost.com/rebalancing-deflate-compression-levels/
    it's not obvious to me whether the graphs are showing the old or new
    algorithms for levels 1, 2 and 3. In fact, I think the best graph
    would show both.

    Flate is normally Huffman + LZ77. In
    https://go-review.googlesource.com/#/c/20982/ you are also proposing a
    Huffman-only compression level. How does that relate (if at all) to
    this proposal? Should we also have a LZ77-only compression level?
    Should we have more than one such level? Conversely, when introducing
    Snappy style matching, should we replace levels 1, 2 and 3, or just
    level 1 aka BestSpeed?

    https://go-review.googlesource.com/#/c/21021/ is marked "DO NOT
    SUBMIT", but it still has code like this:

    // Skip 1 byte for 16 consecutive missed.
    s += 1 + ((s - lit) >> 4)

    Please don't do that. We have had problems with this before:
    https://github.com/golang/snappy/commit/d1d908a252c22fd7afd36190d5cffb144aa8f777

    Also, that CL has three separate copy-pasted-and-specialized methods
    for levels 1, 2 and 3. Yes, each method has a sentence or two in
    comment, but it would be easier to understand the difference between
    the three actual algorithms if there was just a single method, with
    some "if e.level > 2" style blocks. Sure, copy-paste-and-specialize
    can increase performance (at a cost of having more code), but it can
    easily be a follow-up CL.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Klauspost at Mar 23, 2016 at 12:43 pm

    On Wednesday, 23 March 2016 12:28:36 UTC+1, Nigel Tao wrote:
    I'm OK with this in principle, with a few remarks below.
    I'm glad to hear that.

    Re the overall pictures at http://imgur.com/a/5sWM4 and at
    https://blog.klauspost.com/rebalancing-deflate-compression-levels/
    it's not obvious to me whether the graphs are showing the old or new
    algorithms for levels 1, 2 and 3. In fact, I think the best graph
    would show both.
    I didn't want to go into details about the re-balancing, since we can take
    that discussion separately, but the point was to show that:

    * There is a big gap from the proposed level 3 to level 4, which will be
    * Default level (6) does not represent a "sweet spot" between
    CPU/compression ratio.
    * Level 7-8 provide no significant compression ratio increase over level 6.

    Actual numbers are from my personal repo. With CL 20929 merged, I expect
    numbers to be close, but a bit lower because of the slower token slice and
    combined lazy/no-lazy code. Russ asked for the numbers I had shown him, and
    here they are.

    Once we have the pending things merged, I will submit the re-balance
    proposal and I will be able to do the "real" numbers.

    Flate is normally Huffman + LZ77. In
    https://go-review.googlesource.com/#/c/20982/ you are also proposing a
    Huffman-only compression level. How does that relate (if at all) to
    this proposal?

    It is not directly related, so I separated it out in a separate CL, and
    didn't mention it here to have the discussion a bit more focused. It is
    more a special case, that definitely has its uses, but doesn't provide a
    good generic compressor, but can be used on Snappy output, non-artificial
    PNG images and content where this proposed level 1 is even is too slow.

    Should we also have a LZ77-only compression level?
    Unfortunately that is not possible with the deflate format. LZ77 (matching)
    data must be Huffman compressed. One thing I will investigate is the
    possibility to only recalculate the Huffman tree every n deflate blocks
    (which are up to 64k each), since tree calculation takes ~50% of the CPU
    time in this proposed level 1. If we calculate them every second block,
    that would be a 25% speed increase.

    It will probably only be added to my personal package, but if it works
    well, who knows if it will find a place in the standard library.

    Should we have more than one such level? Conversely, when introducing
    Snappy style matching, should we replace levels 1, 2 and 3, or just
    level 1 aka BestSpeed?
    Level 2+3 offer good progression. The "generic" deflater leaves a big gap
    to level 1, and hits a performance ceiling. This is what you see in my blog
    post, which was written before I wrote the specialized level 2+3.

    Please don't do that. We have had problems with this before:
    Don't worry, I know what I am doing. Each deflate block is never longer
    thank 64KB, so skipping is reset for every 64KB data processed. This works
    as intended. If it didn't 10GB.tar would end up considerably worse.

    Btw, now that you mention it, you still haven't fixed Snappy which will
    completely stop compressing data once it encounters 1MB uncompressible
    data: https://github.com/golang/snappy/pull/23#issuecomment-183889455 -
    maybe you should add changes as PRs in the future, instead of committing
    directly to master - you are welcome to ping me for review, and maybe
    others also have some input.


    Also, that CL has three separate copy-pasted-and-specialized methods
    for levels 1, 2 and 3. Yes, each method has a sentence or two in
    comment, but it would be easier to understand the difference between
    the three actual algorithms if there was just a single method, with
    some "if e.level > 2" style blocks. Sure, copy-paste-and-specialize
    can increase performance (at a cost of having more code), but it can
    easily be a follow-up CL.
    Sure - I will probably go the route of trying to factor out the common code
    in smaller functions. That way the compiler can inline it, and we don't
    have the branching overhead.


    /Klaus

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Joe Tsai at Mar 23, 2016 at 8:06 pm
    Personally, I think the adjustments to levels 1-3 are worth doing, although
    I am worried about the amount of added logic. The loss of compression ratio
    does not bother me.

    In several of the recent CLs you posted, there has been significant amounts
    of duplicated logic. I am wondering if we could write one generic function
    that can do the logic of all three levels, but with the portions that
    differ segregated using a if constValueX {...} block. Then we can write a
    code generator that simply outputs another source file that contains three
    renamed versions of that generic function and just replaces each instance
    of constValueX with true/false and let the compiler deal with getting rid
    of the dead code. Thoughts?

    JT
    On Wednesday, March 23, 2016 at 5:43:19 AM UTC-7, klau...@gmail.com wrote:
    On Wednesday, 23 March 2016 12:28:36 UTC+1, Nigel Tao wrote:

    I'm OK with this in principle, with a few remarks below.
    I'm glad to hear that.

    Re the overall pictures at http://imgur.com/a/5sWM4 and at
    https://blog.klauspost.com/rebalancing-deflate-compression-levels/
    it's not obvious to me whether the graphs are showing the old or new
    algorithms for levels 1, 2 and 3. In fact, I think the best graph
    would show both.
    I didn't want to go into details about the re-balancing, since we can take
    that discussion separately, but the point was to show that:

    * There is a big gap from the proposed level 3 to level 4, which will be
    * Default level (6) does not represent a "sweet spot" between
    CPU/compression ratio.
    * Level 7-8 provide no significant compression ratio increase over level 6.

    Actual numbers are from my personal repo. With CL 20929 merged, I expect
    numbers to be close, but a bit lower because of the slower token slice and
    combined lazy/no-lazy code. Russ asked for the numbers I had shown him, and
    here they are.

    Once we have the pending things merged, I will submit the re-balance
    proposal and I will be able to do the "real" numbers.

    Flate is normally Huffman + LZ77. In
    https://go-review.googlesource.com/#/c/20982/ you are also proposing a
    Huffman-only compression level. How does that relate (if at all) to
    this proposal?

    It is not directly related, so I separated it out in a separate CL, and
    didn't mention it here to have the discussion a bit more focused. It is
    more a special case, that definitely has its uses, but doesn't provide a
    good generic compressor, but can be used on Snappy output, non-artificial
    PNG images and content where this proposed level 1 is even is too slow.

    Should we also have a LZ77-only compression level?
    Unfortunately that is not possible with the deflate format. LZ77
    (matching) data must be Huffman compressed. One thing I will investigate is
    the possibility to only recalculate the Huffman tree every n deflate blocks
    (which are up to 64k each), since tree calculation takes ~50% of the CPU
    time in this proposed level 1. If we calculate them every second block,
    that would be a 25% speed increase.

    It will probably only be added to my personal package, but if it works
    well, who knows if it will find a place in the standard library.

    Should we have more than one such level? Conversely, when introducing
    Snappy style matching, should we replace levels 1, 2 and 3, or just
    level 1 aka BestSpeed?
    Level 2+3 offer good progression. The "generic" deflater leaves a big gap
    to level 1, and hits a performance ceiling. This is what you see in my blog
    post, which was written before I wrote the specialized level 2+3.

    Please don't do that. We have had problems with this before:
    Don't worry, I know what I am doing. Each deflate block is never longer
    thank 64KB, so skipping is reset for every 64KB data processed. This works
    as intended. If it didn't 10GB.tar would end up considerably worse.

    Btw, now that you mention it, you still haven't fixed Snappy which will
    completely stop compressing data once it encounters 1MB uncompressible
    data: https://github.com/golang/snappy/pull/23#issuecomment-183889455 -
    maybe you should add changes as PRs in the future, instead of committing
    directly to master - you are welcome to ping me for review, and maybe
    others also have some input.


    Also, that CL has three separate copy-pasted-and-specialized methods
    for levels 1, 2 and 3. Yes, each method has a sentence or two in
    comment, but it would be easier to understand the difference between
    the three actual algorithms if there was just a single method, with
    some "if e.level > 2" style blocks. Sure, copy-paste-and-specialize
    can increase performance (at a cost of having more code), but it can
    easily be a follow-up CL.
    Sure - I will probably go the route of trying to factor out the common
    code in smaller functions. That way the compiler can inline it, and we
    don't have the branching overhead.


    /Klaus
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nigel Tao at Mar 24, 2016 at 12:10 pm

    On Wed, Mar 23, 2016 at 11:43 PM, wrote:
    Re the overall pictures at http://imgur.com/a/5sWM4 and at
    https://blog.klauspost.com/rebalancing-deflate-compression-levels/
    it's not obvious to me whether the graphs are showing the old or new
    algorithms for levels 1, 2 and 3. In fact, I think the best graph
    would show both.
    I didn't want to go into details about the re-balancing, since we can take
    that discussion separately, but the point was to show that:

    * There is a big gap from the proposed level 3 to level 4, which will be
    Is part of this bullet point missing? The sentence appears incomplete.

    In any case, I will repeat myself in case there was a
    mis-communication, but even if there is or was or both being a big gap
    between level 3 and level 4, it would be great to see both the old and
    new algorithms for levels 1-3 on the graph.

    Flate is normally Huffman + LZ77. In
    https://go-review.googlesource.com/#/c/20982/ you are also proposing a
    Huffman-only compression level. How does that relate (if at all) to
    this proposal?
    It is not directly related, so I separated it out in a separate CL, and
    didn't mention it here to have the discussion a bit more focused.
    OK. It is an API change (an addition), so I think it's worth at least
    mentioning at some point on golang-dev. But it's mentioned now, and
    the change LGTM (modulo some minor issues).

    Should we have more than one such level? Conversely, when introducing
    Snappy style matching, should we replace levels 1, 2 and 3, or just
    level 1 aka BestSpeed?
    Level 2+3 offer good progression. The "generic" deflater leaves a big gap to
    level 1, and hits a performance ceiling. This is what you see in my blog
    post, which was written before I wrote the specialized level 2+3.
    They may offer good progression (again, seeing both the before and the
    after for levels 1-3 on the graphs would help inform the discussion),
    but in practice, do people use anything other than BestSpeed,
    BestCompression, NoCompression or DefaultCompression? I don't know the
    answer, but if not, it might not be worth having three copy/pastes of
    the new algorithm if nobody uses levels 2 or 3.

    Please don't do that. We have had problems with this before:
    Don't worry, I know what I am doing. Each deflate block is never longer
    thank 64KB, so skipping is reset for every 64KB data processed. This works
    as intended. If it didn't 10GB.tar would end up considerably worse.
    I'm sorry, but I don't find "don't worry, I know what I am doing"
    persuasive. That sort of reassurance would not have prevented the bug
    that https://github.com/golang/snappy/commit/d1d908a252c22fd7afd36190d5cffb144aa8f777
    fixed.

    Even if you are now resetting every 64KB, that still means that the "s
    += 1 + ((s - lit) >> 4)" skipping can be very aggressive within each
    64KB block. Specifically, it is exponential instead of quadratic (see,
    for example, the table after "It prints" on the golang/snappy commit
    linked to immediately above), which I think is too aggressive.

    We clearly disagree on where to pick the trade-off for speed vs size,
    but I feel that "do more or less what C++ snappy does" is a safe
    choice, and using a novel algorithm has some risk: I discovered and
    fixed one bug in d1d908a2 but there may be others, on other classes of
    input.

    I know that the compression experts that made google/snappy test
    algorithmic tweaks on large corpora of test data, whether samples of a
    web crawl or of RPC traffic, although it's clearly not practical to
    check in 100s of megabytes of test data into the google/snappy git
    repository. If you can convince the C++ google/snappy maintainers to
    change their skipping algorithm to yours, then I'm happy to adopt your
    skipping algorithm. But until then, I don't think I've seen enough
    evidence to move off the safe choice.

    For example, https://go-review.googlesource.com/#/c/20513/ points to a
    spreadsheet at https://docs.google.com/spreadsheets/d/1VLxi-ac0BAtf735HyH3c1xRulbkYYUkFecKdLPH7NIQ/edit?usp=sharing
    but to repeat what I said earlier on that code review: If I'm reading
    the spreadsheet linked to from CL description right, there are only
    three test files used to compare the before and after size. I don't
    see links to the actual test files, but all three of them ("very
    compressible", "random data", "xml") sound like they're either all
    compressible or all incompressible. Three isn't a large sample size,
    and it doesn't cover things like PDF files which, IIUC, is like a
    textual format (i.e. compressible) that can embed binary resources
    such as JPEG images (i.e. incompressible) all in the one file.

    Btw, now that you mention it, you still haven't fixed Snappy which will
    completely stop compressing data once it encounters 1MB uncompressible data
    This is drifting off-topic, but that was fixed in
    https://github.com/golang/snappy/commit/bf2ded9d81f5c22b62cf76363673c6f9765a6703
    which was committed a month ago. The test cases added in that commit
    includes a 2MB input of which the first 1MB is uncompressible.


    Finally, I hope that I haven't come across as being too negative in
    the various flate discussions. I have made a number of criticisms, but
    I aspire to be constructive. Please let me know if you are not finding
    that to be the case. I do appreciate the work you've done, and I do
    think that it will significantly improve Go 1.7 and beyond.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Klauspost at Mar 28, 2016 at 2:55 pm
    Hi!

    Thanks for the feedback.

    I have adjusted the CL, since the benchmark numbers have changed after CL
    20929 has been merged.

    I have added several more benchmarks to the spreadsheet (A disk image,
    10GB.tar, GOB stream, Go Object files, Go source files, web server
    benchmarks), as well as the revision numbers used for the benchmark. Files
    for each benchmark can now be downloaded.

    https://docs.google.com/spreadsheets/d/1VLxi-ac0BAtf735HyH3c1xRulbkYYUkFecKdLPH7NIQ/edit?usp=sharing

    It should show the strengths and weaknesses of the revised algorithm
    doesn't perform so well compared to the current master. However, the "bad"
    cases are mostly on content that is already very fast to compress, so this
    CL should hopefully reduce the "worst case" (in terms of speed) compression
    scenarios, where it provides a very good speedup. I want to be transparent
    about this tradeoff.


    I have made some small adjustments to the code:
    * I have taken out most of the shared code, so each function now only has
    the main encoding loop.
    * I have removed an optimization that may be too aggressive for a standard
    package, which skips Huffman compression if no matches are found in a 64KB
    block. This would have a negative impact on encoding random base64 encoded
    data for instance, so to avoid this case it now always attempts a Huffman
    encode.
    * Hash values are now 16 bits instead of 14 bits. This does not impact
    performance negatively and allows for better compression than the previous
    version.
    * Minor tweaks.

    On Thursday, 24 March 2016 13:10:46 UTC+1, Nigel Tao wrote:


    In any case, I will repeat myself in case there was a
    mis-communication, but even if there is or was or both being a big gap
    between level 3 and level 4, it would be great to see both the old and
    new algorithms for levels 1-3 on the graph.
    I will do that, when we should look at re-balancing the levels. I have
    updated the spreadsheet with the new numbers, which you are welcome to grab
    if you should have the time.

    They may offer good progression (again, seeing both the before and the
    after for levels 1-3 on the graphs would help inform the discussion),
    but in practice, do people use anything other than BestSpeed,
    BestCompression, NoCompression or DefaultCompression? I don't know the
    answer, but if not, it might not be worth having three copy/pastes of
    the new algorithm if nobody uses levels 2 or 3.
    You do have a good point. I cannot be the judge of whether it should be
    included. I have tried to factor out the common code in the CL now - at
    least the things that could be done without sacrificing performance in a
    measurable way.

    If it is still "too much", I think I would go for adding "encodeL2" as
    BestSpeed. It is often very close to "encodeL1" in terms of speed (within
    5-10%), but offers 1% better compression. If we should only replace level
    1, I would choose that. Maybe other people have input on whether it should
    be a choice, or we should just implement 1 level.


    Even if you are now resetting every 64KB, that still means that the "s
    += 1 + ((s - lit) >> 4)" skipping can be very aggressive within each
    64KB block. Specifically, it is exponential instead of quadratic (see,
    for example, the table after "It prints" on the golang/snappy commit
    linked to immediately above), which I think is too aggressive.
    I have some "mixed content" data, a typical backup set and a virtual disk
    image. I do not see any significant compression loss from the aggressive
    skipping. IMO the risk is small, since only up to 64KB will be affected. In
    my optinion that is a reasonable trade for level 1-3, where 2&3 are set
    less aggressive.


    I know that the compression experts that made google/snappy test
    algorithmic tweaks on large corpora of test data, whether samples of a
    web crawl or of RPC traffic, although it's clearly not practical to
    check in 100s of megabytes of test data into the google/snappy git
    repository. If you can convince the C++ google/snappy maintainers to
    change their skipping algorithm to yours, then I'm happy to adopt your
    skipping algorithm. But until then, I don't think I've seen enough
    evidence to move off the safe choice.
    I have added a bunch more test sets, which you can download test. I cannot
    "compete" with a secret testset, plus we are talking an algorithm where the
    user has selected "I want the fastest option". If we where talking
    "DefaultCompression", I would definitely go for the safe option.


    compressible or all incompressible. Three isn't a large sample size,
    and it doesn't cover things like PDF files which, IIUC, is like a
    textual format (i.e. compressible) that can embed binary resources
    such as JPEG images (i.e. incompressible) all in the one file.
    The 10GB testset is a good "mixed content", as well as the disk image. If
    you have any more specific ones, I would love to test it.


    Finally, I hope that I haven't come across as being too negative in
    the various flate discussions. I have made a number of criticisms, but
    I aspire to be constructive. Please let me know if you are not finding
    that to be the case. I do appreciate the work you've done, and I do
    think that it will significantly improve Go 1.7 and beyond.
    And I also love a good discussion. For issues like this it is important to
    discuss tradeoffs, but I don't like to be presented with "facts" where you
    subtly imply I don't know what I am doing. We both do sub-optimal things
    from time to time, but bringing up the same thing for 4th or 5th time and
    implying "you made a mistake there, therefore this is garbage" does not
    feel like a constructive way of discussing this.

    I don't think we have wildly different views on this. I think I try to
    cater a bit more toward the typical use case. Whether we as "compression
    experts" know that gzip compressing a JPEG is usually a silly thing, that
    is something that will happen quite often, (think backups, disk images) so
    for the compressor to quickly (5x faster) being able to "skip" that is an
    important in my design. That is a much more likely scenario than a tiny
    chance that there might be some slightly sub-optimal compression in a 64KB
    block, so I optimize for the former.

    The optimization I just removed is another case. In cases where no matches
    were found in a 64KB block, it was always skipped (stored). For *most*
    content that is fine, but for something like random data that was base 64
    encoded, it would never be able to reduce the entropy. To me this is a
    clear case of the worst case being bad, and a realistic usage scenario -
    yes, people will compress base 64 encoded data and it should be compressed.

    We are not dealing with "perfect" compression here, which makes this
    discussion harder than average. We are dealing with a "give me a fast"
    (BestSpeed), "give me a reasonable" (Default) and "give me a close to
    optimal" (BestCompression) compression spectrum. For the "fast", you *will*
    be able to create worst cases (both in terms of speed and compression),
    however, they should be rare. To me (and hopefully to a users) it is
    obvious that BestSpeed will not deliver the best compression because it
    makes certain choices for speed. To me a worst case where speed is very low
    is bad when you select a "fast" option, and my choices in the code should
    reflect that.


    /Klaus


    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nigel Tao at Mar 31, 2016 at 12:08 pm

    On Tue, Mar 29, 2016 at 1:54 AM, wrote:
    Even if you are now resetting every 64KB, that still means that the "s
    += 1 + ((s - lit) >> 4)" skipping can be very aggressive within each
    64KB block. Specifically, it is exponential instead of quadratic (see,
    for example, the table after "It prints" on the golang/snappy commit
    linked to immediately above), which I think is too aggressive.
    I have some "mixed content" data, a typical backup set and a virtual disk
    image. I do not see any significant compression loss from the aggressive
    skipping. IMO the risk is small, since only up to 64KB will be affected. In
    my optinion that is a reasonable trade for level 1-3, where 2&3 are set less
    aggressive.
    We're repeating ourselves, but IMO the risk is non-zero, and the
    existing snappy algorithm is plenty fast enough.

    I asked the snappy-compression mailing list (where the C++ snappy code
    is discussed) for their thoughts. I'll crunch some C++ numbers
    tomorrow (it's getting late here in Sydney):

    https://groups.google.com/forum/#!topic/snappy-compression/3Qa3fASLkNA

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nigel Tao at Apr 3, 2016 at 1:45 am
    A couple of responses below, but note that for
    github.com/golang/snappy, I have just updated the Go encoder to have
    byte-for-byte identical output to the C++ encoder. There were a few
    smaller commits recently to set this up, and further optimization work
    remains, but the punchline commit at
    https://github.com/golang/snappy/commit/8939696c2214cde5dda896a76f5bf56e80a16855
    shows both faster throughput (on GOARCH=amd64) and smaller output.

    On Tue, Mar 29, 2016 at 1:54 AM, wrote:
    They may offer good progression (again, seeing both the before and the
    after for levels 1-3 on the graphs would help inform the discussion),
    but in practice, do people use anything other than BestSpeed,
    BestCompression, NoCompression or DefaultCompression? I don't know the
    answer, but if not, it might not be worth having three copy/pastes of
    the new algorithm if nobody uses levels 2 or 3.
    You do have a good point. I cannot be the judge of whether it should be
    included. I have tried to factor out the common code in the CL now - at
    least the things that could be done without sacrificing performance in a
    measurable way.

    If it is still "too much", I think I would go for adding "encodeL2" as
    BestSpeed. It is often very close to "encodeL1" in terms of speed (within
    5-10%), but offers 1% better compression. If we should only replace level 1,
    I would choose that. Maybe other people have input on whether it should be a
    choice, or we should just implement 1 level.
    I'd prefer only replacing level 1, and leaving 2 and 3 alone,
    especially since, AFAICT,
    https://go-review.googlesource.com/#/c/21021/4/src/compress/flate/deflatefast.go
    still has a lot of copy/paste between encodeL1, encodeL2 and encodeL3.

    In any case, I suspect that being based on the C++ encoder algorithm
    (as mentioned above) will help.

    On Thu, Mar 31, 2016 at 11:08 PM, Nigel Tao wrote:
    I asked the snappy-compression mailing list (where the C++ snappy code
    is discussed) for their thoughts. I'll crunch some C++ numbers
    tomorrow (it's getting late here in Sydney):

    https://groups.google.com/forum/#!topic/snappy-compression/3Qa3fASLkNA
    FWIW, my reading of that discussion thread is that Steinar Gunderson
    (one of the C++ snappy authors) reckons that while it's much better
    throughput on binary data and slightly worse on textual data, the "1 +
    ((s - lit) >> 5)" skipping algorithm is a net negative for the C++
    implementation, so I'm less inclined to adopt it for the Go
    implementation, whether in golang/snappy or in compress/flate.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Klauspost at Apr 3, 2016 at 10:56 am

    On Sunday, 3 April 2016 03:45:24 UTC+2, Nigel Tao wrote:

    I'd prefer only replacing level 1, and leaving 2 and 3 alone,
    especially since, AFAICT,

    https://go-review.googlesource.com/#/c/21021/4/src/compress/flate/deflatefast.go
    still has a lot of copy/paste between encodeL1, encodeL2 and encodeL3.
    If you prefer that, I will keep the "encodeL2".

    encodeL3 does not have any duplicate code, except the first few lines.
    "encodeL1" and "encodeL2" share some code, and encodeL1 could be achieved
    by just not copying the buffer on each call and calling encodeL2. However,
    you would lose all the speed gained, except the memcpy. We could look at
    the possibility of swapping two encoding buffers in the future, so we can
    keep a reference to the previous block, and avoid the memcpy.

    I will remove all but encodeL2, and make that level 1, and restore the
    previous level 2+3. I am in the middle of starting a new job, so it may
    take a little time, but I hope we can get it finalized before the 1.7
    lockdown.

    FWIW, my reading of that discussion thread is that Steinar Gunderson
    (one of the C++ snappy authors) reckons that while it's much better
    throughput on binary data and slightly worse on textual data, the "1 +
    ((s - lit) >> 5)" skipping algorithm is a net negative for the C++
    implementation, so I'm less inclined to adopt it for the Go
    implementation, whether in golang/snappy or in compress/flate.
    Yes, it is not a clear-cut win/lose. On some archs, I would expect the
    subtraction to be faster than a separate variable, not so much on others.
    It really depends on register allocation, etc.

    I will add the "conservative" skipping. Another change I have to make is
    to make writeBlockDynamic able to output raw bytes if the bitcount is
    smaller for storing it uncompressed. That should also make random data
    faster to make up for the conservative skipping.

    /Klaus

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nigel Tao at Apr 5, 2016 at 12:23 am

    On Sun, Apr 3, 2016 at 11:45 AM, Nigel Tao wrote:
    On Thu, Mar 31, 2016 at 11:08 PM, Nigel Tao wrote:
    I asked the snappy-compression mailing list (where the C++ snappy code
    is discussed) for their thoughts. I'll crunch some C++ numbers
    tomorrow (it's getting late here in Sydney):

    https://groups.google.com/forum/#!topic/snappy-compression/3Qa3fASLkNA
    FWIW, my reading of that discussion thread is that Steinar Gunderson
    (one of the C++ snappy authors) reckons that while it's much better
    throughput on binary data and slightly worse on textual data, the "1 +
    ((s - lit) >> 5)" skipping algorithm is a net negative for the C++
    implementation, so I'm less inclined to adopt it for the Go
    implementation, whether in golang/snappy or in compress/flate.
    Ah, the sentiment seems to be turning on that thread. As I said
    earlier, if the C++ google/snappy maintainers can be convinced to
    change their skipping algorithm to yours, then I'm happy to adopt your
    skipping algorithm. Kudos to you.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Giovanni Bajo at Apr 5, 2016 at 9:51 pm

    On Tuesday, April 5, 2016 at 2:23:50 AM UTC+2, Nigel Tao wrote:
    On Sun, Apr 3, 2016 at 11:45 AM, Nigel Tao <nige...@golang.org
    <javascript:>> wrote:
    On Thu, Mar 31, 2016 at 11:08 PM, Nigel Tao <nige...@golang.org
    <javascript:>> wrote:
    I asked the snappy-compression mailing list (where the C++ snappy code
    is discussed) for their thoughts. I'll crunch some C++ numbers
    tomorrow (it's getting late here in Sydney):

    https://groups.google.com/forum/#!topic/snappy-compression/3Qa3fASLkNA
    FWIW, my reading of that discussion thread is that Steinar Gunderson
    (one of the C++ snappy authors) reckons that while it's much better
    throughput on binary data and slightly worse on textual data, the "1 +
    ((s - lit) >> 5)" skipping algorithm is a net negative for the C++
    implementation, so I'm less inclined to adopt it for the Go
    implementation, whether in golang/snappy or in compress/flate.
    Ah, the sentiment seems to be turning on that thread. As I said
    earlier, if the C++ google/snappy maintainers can be convinced to
    change their skipping algorithm to yours, then I'm happy to adopt your
    skipping algorithm. Kudos to you.
    Just for the sake of history records, the original idea comes from the LZO
    compressor, that I ported to Go a few months ago:
    https://github.com/rasky/go-lzo/blob/master/compress.go#L59

    Klaus spotted it while doing benchmarks between LZO and Snappy, extracted
    it, and adapted it, so all credits to him anyway. I just love the fact that
    this little, uncommented compression trick went through several packages
    now, to be fully re-analyzed some 20 years after it was first conceived :)

    Giovanni

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Klauspost at Apr 10, 2016 at 2:06 pm


    Just for the sake of history records, the original idea comes from the LZO
    compressor, that I ported to Go a few months ago:
    https://github.com/rasky/go-lzo/blob/master/compress.go#L59

    Klaus spotted it while doing benchmarks between LZO and Snappy, extracted
    it, and adapted it, so all credits to him anyway. I just love the fact that
    this little, uncommented compression trick went through several packages
    now, to be fully re-analyzed some 20 years after it was first conceived :)
    Ah yes - thanks for reminding me! It is a very neat trick indeed.

    I have updated the CL, which I now think is ready for review. I have
    removed two of the modes, so we now only replace "BestSpeed". As I
    mentioned, I would go for the proposed level 2, which is a tiny bit slower,
    but offers better compression than the originally proposed level 1. If we
    are only to replace one, that one has the best trade-off.

    I have updated the benchmarks for those
    interested: https://docs.google.com/spreadsheets/d/1VLxi-ac0BAtf735HyH3c1xRulbkYYUkFecKdLPH7NIQ/edit?usp=sharing

    I have also submitted a separate change, which also improves compression on
    some edge cases - mainly base64 encoded content and
    similar: https://go-review.googlesource.com/#/c/21757/

    Thanks for all the feedback and testing!

    /Klaus

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedMar 22, '16 at 10:53a
activeApr 10, '16 at 2:06p
posts20
users10
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase