FAQ
Is the goal to merge this into the standard library?
Yep, the intent was to fulfill #4828
<https://github.com/golang/go/issues/4828>, which asks for a bzip2 encoder.
In an indirect way, it was also trying to fulfill #6754
<https://github.com/golang/go/issues/6754>, which is regarding bzip2
performance. The specific API I ended up with is slightly different from
the standard library, but it should be pretty easy to adapt my
implementation to match the existing API. I don't know if we want to merge
both encoder and decoder at the same time, or just do one of them at a time.


To answer some questions I have been getting in private messages:
Q: How is the decoder much faster?
I did make a mistake in reporting that the my implementation was 4x faster.
The standard library benchmark reports the compression rate relative to the
size of the compressed file, rather than the uncompressed size. Thus, the
standard library will appear to perform slower than it really is. With the
corrected benchmarks, my version is anywhere between 1.5x to 3x faster.
Here's some data for decoding (ds is my version, std is standard library,
cgo is the C library):
BENCHMARK: bz2:decRate
benchmark std MB/s delta ds MB/s delta cgo MB/s
delta
digits.txt:1:1e4 11.51 1.00x 22.04 1.91x 28.28
2.46x
digits.txt:1:1e5 13.85 1.00x 20.00 1.44x 27.02
1.95x
digits.txt:1:1e6 14.36 1.00x 19.91 1.39x 27.26
1.90x
digits.txt:6:1e4 5.98 1.00x 21.85 3.65x 28.43
4.75x
digits.txt:6:1e5 11.92 1.00x 20.62 1.73x 26.35
2.21x
digits.txt:6:1e6 12.86 1.00x 18.00 1.40x 24.24
1.89x
digits.txt:9:1e4 5.39 1.00x 21.39 3.97x 28.68
5.32x
digits.txt:9:1e5 11.36 1.00x 20.14 1.77x 27.21
2.39x
digits.txt:9:1e6 12.50 1.00x 17.90 1.43x 22.84
1.83x
twain.txt:1:1e4 10.22 1.00x 22.26 2.18x 27.64
2.70x
twain.txt:1:1e5 15.91 1.00x 24.17 1.52x 30.59
1.92x
twain.txt:1:1e6 16.45 1.00x 23.65 1.44x 31.39
1.91x
twain.txt:6:1e4 5.64 1.00x 22.29 3.95x 27.93
4.95x
twain.txt:6:1e5 13.40 1.00x 23.76 1.77x 30.49
2.28x
twain.txt:6:1e6 15.99 1.00x 21.76 1.36x 30.39
1.90x
twain.txt:9:1e4 5.14 1.00x 21.73 4.22x 27.89
5.42x
twain.txt:9:1e5 12.99 1.00x 23.64 1.82x 30.34
2.33x
twain.txt:9:1e6 15.61 1.00x 20.10 1.29x 28.21
1.81x

As for implementation details, my decoder has two major improvements over
the standard library version:

    - The huffman decoder isn't represented as a tree, which requires a loop
    iteration for every bit. Instead, it uses the LUT method that the zlib
    library uses <http://www.gzip.org/algorithm.txt>. For most prefix codes,
    it can decode them in a single table lookup. It also uses careful coding to
    make sure critical functions are inlined by the compiler.
    - In decoding, the ReadByte method becomes a bottleneck because it is
    called for every byte. The bit reader I have avoids using ReadByte if the
    input Reader is a bufio.Reader (or something that looks like it). Thus, the
    decoder can grab more bytes in a batch, rather going at them one at a time.

Q: Why won't the library build?
The library depends on at least Go1.5.

JT
On Thursday, December 10, 2015 at 8:46:30 AM UTC-8, bradfitz wrote:

Is the goal to merge this into the standard library?


On Tue, Dec 8, 2015 at 5:48 PM, <thebroke...@gmail.com <javascript:>>
wrote:
Just an update, I have completed a working decoder. It decompresses at
~20MB/s, which about 25% faster than the C library (~16MB/s), and 200%
faster than the Go standard library (~5MB/s).

https://godoc.org/github.com/dsnet/compress/bzip2

Enjoy.

JT

On Monday, December 7, 2015 at 11:55:41 PM UTC-8, thebroke...@gmail.com
wrote:
Hey all,

github: https://github.com/dsnet/compress
godoc: https://godoc.org/github.com/dsnet/compress/bzip2

I finished writing a bzip2 encoder. I haven't heavily tested it, and
there's still alot to do in the area of optimizations. However, it still
performs relatively well relative to the C implementation and is actually
faster in encoding than the standard Go library is at in decoding. The
speed is about half that of the C version, but I think I can close them gap
to be about 1.5x or even 1x. The compression ratio is about 10% worse since
I haven't implemented a K-means based huffman encoder yet. I have plans to
add a decoder eventually that will outperform the standard library version.

BENCHMARK: bz2:encRate
benchmark ds MB/s delta cgo MB/s delta
digits.txt:1:1e4 5.44 1.00x 7.55 1.39x
digits.txt:1:1e5 5.57 1.00x 12.51 2.25x
digits.txt:1:1e6 5.76 1.00x 12.88 2.24x
digits.txt:6:1e4 4.86 1.00x 7.47 1.54x
digits.txt:6:1e5 5.54 1.00x 12.46 2.25x
digits.txt:6:1e6 5.39 1.00x 12.46 2.31x
digits.txt:9:1e4 4.66 1.00x 7.54 1.62x
digits.txt:9:1e5 5.48 1.00x 12.60 2.30x
digits.txt:9:1e6 5.02 1.00x 12.11 2.41x
twain.txt:1:1e4 5.26 1.00x 7.89 1.50x
twain.txt:1:1e5 6.02 1.00x 12.89 2.14x
twain.txt:1:1e6 6.19 1.00x 12.83 2.07x
twain.txt:6:1e4 4.75 1.00x 8.07 1.70x
twain.txt:6:1e5 6.02 1.00x 12.93 2.15x
twain.txt:6:1e6 6.09 1.00x 12.51 2.06x
twain.txt:9:1e4 4.54 1.00x 7.93 1.75x
twain.txt:9:1e5 6.00 1.00x 12.50 2.08x
twain.txt:9:1e6 5.73 1.00x 12.13 2.12x
BENCHMARK: bz2:ratio
benchmark ds ratio delta cgo ratio delta
digits.txt:1:1e4 2.20x 1.00x 2.28x 1.04x
digits.txt:1:1e5 2.23x 1.00x 2.32x 1.04x
digits.txt:1:1e6 2.22x 1.00x 2.31x 1.04x
digits.txt:6:1e4 2.20x 1.00x 2.28x 1.04x
digits.txt:6:1e5 2.23x 1.00x 2.32x 1.04x
digits.txt:6:1e6 2.12x 1.00x 2.23x 1.05x
digits.txt:9:1e4 2.20x 1.00x 2.28x 1.04x
digits.txt:9:1e5 2.23x 1.00x 2.32x 1.04x
digits.txt:9:1e6 2.05x 1.00x 2.16x 1.05x
twain.txt:1:1e4 2.07x 1.00x 2.25x 1.09x
twain.txt:1:1e5 2.68x 1.00x 2.90x 1.08x
twain.txt:1:1e6 2.68x 1.00x 2.89x 1.08x
twain.txt:6:1e4 2.07x 1.00x 2.25x 1.09x
twain.txt:6:1e5 2.68x 1.00x 2.90x 1.08x
twain.txt:6:1e6 2.87x 1.00x 3.12x 1.09x
twain.txt:9:1e4 2.07x 1.00x 2.25x 1.09x
twain.txt:9:1e5 2.68x 1.00x 2.90x 1.08x
twain.txt:9:1e6 2.87x 1.00x 3.12x 1.09x

This package is still in development, but I would appreciate if people
can try using it and tell me about any bugs. I'll make a bigger
announcement about it when I feel that the package is stable enough. I
eventually plan on merging it into the standard library, but that
definitely won't happen until go1.7 or later.

JT
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to golang-nuts...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 7 of 11 | next ›
Discussion Overview
groupgolang-nuts @
categoriesgo
postedDec 8, '15 at 7:55a
activeDec 13, '15 at 12:53p
posts11
users5
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase