This is discussion for CL
What I am proposing in the CL is to replace the levels 1 to 3 with
specialized versions, that are significantly faster. Usually throughput is
around 1.7x of the standard encoder. This does comes at a compression cost;
usually the reduction loss is between 2 and 3 percent compared to the
original level 1,2 and 3.
This affects the following other packages: "compress/gzip", "compres/zlib",
"archive/zip", "image/png". My main target has been webservers, but I have
also had talks with the Docker crew, who would greatly appreciate faster
deflate for disk image creation. Also Russ Cox has proposed zip for object
file archives, which will also be affected by this change.
My rationale for this is that levels 1 to 3 indicate that we want the best
speed. Typically the user has done this be selecting flate.BestSpeed (or
similar), so they have already indicated they are willing to sacrifice
compression efficiency for better speed.
My main discussion points:
* Is it worth the additional code complexity for a 1.7x speed up?
* Is the 2-3% compression loss worth the speedup?
* Do you see any other downsides, or do you see any places where this would
Other discussion points:
* In the future the remaining levels 4-8 should be re-adjusted, but that
isn't strictly needed now.
DETAILED TEST SET DESCRIPTION:
---> Web content. This benchmark compresses a selection of individual HTML,
JS, CSS, SVG and small individual JSON files. This was selected to give an
indication of typical web server performance on content that is likely to
be gzip encoded.
Web content, level 1: 2.5% less compression reduction. 2.6x speedup.
Web content, level 2: 2.1% less compression reduction. 2.3x speedup.
Web content, level 3: 2.1% less compression reduction. 2.4x speedup.
- Test set: http://files.klauspost.com/sites.7z
---> A huge JSON stream, containing highly repetitive JSON. Typically
compresses around 95%. This content could represent a database dump, a
network stream or similar content.
JSON, level 1: 1.3% less compression reduction. 2.6x speedup.
JSON, level 2: 0.1% more compression reduction. 1.9x speedup.
JSON, level 3: 0.1% more compression reduction. 1.9x speedup.
- Test set: http://18.104.22.168/static/dicts/json-testset.gz
---> enwik9 is a standard Corpus, containing the first 10^9 bytes of the
XML text dump of the English version of Wikipedia on Mar. 3, 2006 used by
the "Large Text Compression benchmark". See more
enwik9, level 1: 2.7% less compression reduction. 1.8x speedup.
enwik9, level 2: 2.8% less compression reduction. 1.9x speedup.
enwik9, level 3: 2.6% less compression reduction. 2.0x speedup.
- Test set: http://mattmahoney.net/dc/enwik9.zip
---> 10GB is a standard test set, that represents a typical backup
scenario. The test data is designed to test archivers in realistic backup
scenarios with lots of already-compressed or hard to compress files and
lots of duplicate or nearly identical files. It consists of exactly 10 GB
(1010) bytes in 79,431 files in 4006 directories. For this test, a TAR file
has been created. See http://mattmahoney.net/dc/10gb.html
10gb, level 1: 3.1% less compression reduction. 3.0x speedup.
10gb, level 2: 3.1% less compression reduction. 3.2x speedup.
10gb, level 3: 3.4% less compression reduction. 3.4x speedup.
- Test set: http://mattmahoney.net/dc/10gb.html
---> Random data, should be uncompressible. Could be JPG, MP3, MP4, MKV
files in the "real world".
random, level 1: 0.0% less compression reduction. 19.5x speedup.
random, level 2: 0.0% less compression reduction. 19.2x speedup.
random, level 3: 0.0% less compression reduction. 19.4x speedup.
Test set: http://mattmahoney.net/dc/#sharnd - 200000000 bytes
Your input is appreciated.
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
For more options, visit https://groups.google.com/d/optout.