The format of the benchmarks (I should have labelled them) is

<test_file>:<compression_level>:<test_filesize>, as you correctly surmised.

I didn't really test too many large files yet and plan to do so for both

the encoder and decoder. Fortunately, other people are doing benchmarks and

tests have reported an out-of-bounds error on the encoder side, which I'll

look into later.

As for the decoder, I just ran a benchmark on a go1.5.2.linux-amd64.tar,

which is 270MiB uncompressed, and 64MiB compressed. Here are the results:

BenchmarkStd-4 1 14094699464 ns/op 20.07 MB/s 15858536 B/op 10118

allocs/op

BenchmarkDS-4 1 8983998803 ns/op 31.48 MB/s 2524556680 B/op 15996

allocs/op

Currently, my implementation allocates a lot of garbage, since I don't

cache the slices. Hence, my implementation performs 150x more allocations.

This is a relatively low-hanging fruit in terms of optimization, and I plan

on addressing it soon.

Also, looking at the implementations under pprof. Here are the top

functions for the standard library implementation:

flat flat% sum% cum cum%

4470ms 29.96% 29.96% 5810ms 38.94%

compress/bzip2.(*huffmanTree).Decode

4430ms 29.69% 59.65% 4430ms 29.69%

compress/bzip2.(*reader).readFromBlock

1830ms 12.27% 71.92% 9320ms 62.47%

compress/bzip2.(*reader).readBlock

1000ms 6.70% 78.62% 1000ms 6.70% compress/bzip2.updateCRC

810ms 5.43% 84.05% 810ms 5.43% compress/bzip2.inverseBWT

We can see that huffman tree decoding tops this list. The second function

that does most work is readFromBlock, which is probably because the

de-scrambling phase of inverse BWT is performed there along with RLE.

Here's my version under the same load:

4790ms 50.32% 50.32% 4960ms 52.10%

github.com/dsnet/compress/bzip2.(*burrowsWheelerTransform).Decode

1060ms 11.13% 79.20% 1060ms 11.13%

github.com/dsnet/compress/bzip2.(*runLengthEncoding).Read

800ms 8.40% 87.61% 1860ms 19.54%

github.com/dsnet/compress/bzip2.(*moveToFront).Decode

460ms 4.83% 92.44% 1380ms 14.50%

github.com/dsnet/compress/bzip2.(*Reader).decodePrefix

110ms 1.16% 95.48% 760ms 7.98%

github.com/dsnet/compress/internal/prefix.(*Reader).ReadSymbol

Currently, the burrowsWheelerTransform tops the list (which makes sense

given that it is the heart of bzip2).

I realize that these two lists are not exactly comparable since the

implementation is different. Also, the standard library has a single large

function called readBlock, while mine is broken up into more smaller

function and its not easy to determine a one-to-one relationship between

these. An additional difficulty in comparing why these two perform

differently is that my BWT decoding occurs in a single function call, while

the standard library currently spreads the work across multiple stages,

making it difficult to actually track how much CPU time is spent on each

component.

JT

On Saturday, December 12, 2015 at 6:24:01 AM UTC-8, alb.do...@gmail.com

wrote:

It would be great if we could merge the encoder, since the bzip2 package

currently does not have one!

I have a few questions about the decoder benchmarks, since I benchmarked

the standard library one when I mailed CL 13853, that closed issue #6754

(the one you linked above). What are the sizes of the archives you are

decoding?

I see numbers at the end of your benchmarks names, but I'm not sure how to

read

them. I'm asking because if my assumption that the benchmarks are ordered

by

size, (i.e. the size is the last number in the name) is correct, it

appears that your

approach is much faster (4x) than the current one on very small files, and

about

1.4x faster on the biggest files you are benchmarking. I think you'll

agree with me

if I say that being faster on a big file (that takes a lot of time to

decode) is better

that being faster on a 0.1 millisecond decode operation : )

So I'm asking how big is the biggest archive you benchmarked, and if you

tested

your decoder on a really big archive (let's say 100MB of english text: you

can

download for example a 100MB archive of wikipedia articles, that's what I

did when

emailed my CL).

On Saturday, December 12, 2015 at 2:16:49 AM UTC+1, thebroke...@gmail.com

wrote:

--currently does not have one!

I have a few questions about the decoder benchmarks, since I benchmarked

the standard library one when I mailed CL 13853, that closed issue #6754

(the one you linked above). What are the sizes of the archives you are

decoding?

I see numbers at the end of your benchmarks names, but I'm not sure how to

read

them. I'm asking because if my assumption that the benchmarks are ordered

by

size, (i.e. the size is the last number in the name) is correct, it

appears that your

approach is much faster (4x) than the current one on very small files, and

about

1.4x faster on the biggest files you are benchmarking. I think you'll

agree with me

if I say that being faster on a big file (that takes a lot of time to

decode) is better

that being faster on a 0.1 millisecond decode operation : )

So I'm asking how big is the biggest archive you benchmarked, and if you

tested

your decoder on a really big archive (let's say 100MB of english text: you

can

download for example a 100MB archive of wikipedia articles, that's what I

did when

emailed my CL).

On Saturday, December 12, 2015 at 2:16:49 AM UTC+1, thebroke...@gmail.com

wrote:

Is the goal to merge this into the standard library?

Yep, the intent was to fulfill #4828

<https://github.com/golang/go/issues/4828>, which asks for a bzip2

encoder. In an indirect way, it was also trying to fulfill #6754

<https://github.com/golang/go/issues/6754>, which is regarding bzip2

performance. The specific API I ended up with is slightly different from

the standard library, but it should be pretty easy to adapt my

implementation to match the existing API. I don't know if we want to merge

both encoder and decoder at the same time, or just do one of them at a time.

To answer some questions I have been getting in private messages:

Q: How is the decoder much faster?

I did make a mistake in reporting that the my implementation was 4x

faster. The standard library benchmark reports the compression rate

relative to the size of the compressed file, rather than the uncompressed

size. Thus, the standard library will appear to perform slower than it

really is. With the corrected benchmarks, my version is anywhere between

1.5x to 3x faster. Here's some data for decoding (ds is my version, std is

standard library, cgo is the C library):

As for implementation details, my decoder has two major improvements over

the standard library version:

- The huffman decoder isn't represented as a tree, which requires a

loop iteration for every bit. Instead, it uses the LUT method that

the zlib library uses <http://www.gzip.org/algorithm.txt>. For most

prefix codes, it can decode them in a single table lookup. It also uses

careful coding to make sure critical functions are inlined by the compiler.

- In decoding, the ReadByte method becomes a bottleneck because it is

called for every byte. The bit reader I have avoids using ReadByte if the

input Reader is a bufio.Reader (or something that looks like it). Thus, the

decoder can grab more bytes in a batch, rather going at them one at a time.

Q: Why won't the library build?

The library depends on at least Go1.5.

JT

Yep, the intent was to fulfill #4828

<https://github.com/golang/go/issues/4828>, which asks for a bzip2

encoder. In an indirect way, it was also trying to fulfill #6754

<https://github.com/golang/go/issues/6754>, which is regarding bzip2

performance. The specific API I ended up with is slightly different from

the standard library, but it should be pretty easy to adapt my

implementation to match the existing API. I don't know if we want to merge

both encoder and decoder at the same time, or just do one of them at a time.

To answer some questions I have been getting in private messages:

Q: How is the decoder much faster?

I did make a mistake in reporting that the my implementation was 4x

faster. The standard library benchmark reports the compression rate

relative to the size of the compressed file, rather than the uncompressed

size. Thus, the standard library will appear to perform slower than it

really is. With the corrected benchmarks, my version is anywhere between

1.5x to 3x faster. Here's some data for decoding (ds is my version, std is

standard library, cgo is the C library):

BENCHMARK: bz2:decRate

benchmark std MB/s delta ds MB/s delta cgo MB/s

delta

digits.txt:1:1e4 11.51 1.00x 22.04 1.91x 28.28

2.46x

digits.txt:1:1e5 13.85 1.00x 20.00 1.44x 27.02

1.95x

digits.txt:1:1e6 14.36 1.00x 19.91 1.39x 27.26

1.90x

digits.txt:6:1e4 5.98 1.00x 21.85 3.65x 28.43

4.75x

digits.txt:6:1e5 11.92 1.00x 20.62 1.73x 26.35

2.21x

digits.txt:6:1e6 12.86 1.00x 18.00 1.40x 24.24

1.89x

digits.txt:9:1e4 5.39 1.00x 21.39 3.97x 28.68

5.32x

digits.txt:9:1e5 11.36 1.00x 20.14 1.77x 27.21

2.39x

digits.txt:9:1e6 12.50 1.00x 17.90 1.43x 22.84

1.83x

twain.txt:1:1e4 10.22 1.00x 22.26 2.18x 27.64

2.70x

twain.txt:1:1e5 15.91 1.00x 24.17 1.52x 30.59

1.92x

twain.txt:1:1e6 16.45 1.00x 23.65 1.44x 31.39

1.91x

twain.txt:6:1e4 5.64 1.00x 22.29 3.95x 27.93

4.95x

twain.txt:6:1e5 13.40 1.00x 23.76 1.77x 30.49

2.28x

twain.txt:6:1e6 15.99 1.00x 21.76 1.36x 30.39

1.90x

twain.txt:9:1e4 5.14 1.00x 21.73 4.22x 27.89

5.42x

twain.txt:9:1e5 12.99 1.00x 23.64 1.82x 30.34

2.33x

twain.txt:9:1e6 15.61 1.00x 20.10 1.29x 28.21

1.81x

benchmark std MB/s delta ds MB/s delta cgo MB/s

delta

digits.txt:1:1e4 11.51 1.00x 22.04 1.91x 28.28

2.46x

digits.txt:1:1e5 13.85 1.00x 20.00 1.44x 27.02

1.95x

digits.txt:1:1e6 14.36 1.00x 19.91 1.39x 27.26

1.90x

digits.txt:6:1e4 5.98 1.00x 21.85 3.65x 28.43

4.75x

digits.txt:6:1e5 11.92 1.00x 20.62 1.73x 26.35

2.21x

digits.txt:6:1e6 12.86 1.00x 18.00 1.40x 24.24

1.89x

digits.txt:9:1e4 5.39 1.00x 21.39 3.97x 28.68

5.32x

digits.txt:9:1e5 11.36 1.00x 20.14 1.77x 27.21

2.39x

digits.txt:9:1e6 12.50 1.00x 17.90 1.43x 22.84

1.83x

twain.txt:1:1e4 10.22 1.00x 22.26 2.18x 27.64

2.70x

twain.txt:1:1e5 15.91 1.00x 24.17 1.52x 30.59

1.92x

twain.txt:1:1e6 16.45 1.00x 23.65 1.44x 31.39

1.91x

twain.txt:6:1e4 5.64 1.00x 22.29 3.95x 27.93

4.95x

twain.txt:6:1e5 13.40 1.00x 23.76 1.77x 30.49

2.28x

twain.txt:6:1e6 15.99 1.00x 21.76 1.36x 30.39

1.90x

twain.txt:9:1e4 5.14 1.00x 21.73 4.22x 27.89

5.42x

twain.txt:9:1e5 12.99 1.00x 23.64 1.82x 30.34

2.33x

twain.txt:9:1e6 15.61 1.00x 20.10 1.29x 28.21

1.81x

As for implementation details, my decoder has two major improvements over

the standard library version:

- The huffman decoder isn't represented as a tree, which requires a

loop iteration for every bit. Instead, it uses the LUT method that

the zlib library uses <http://www.gzip.org/algorithm.txt>. For most

prefix codes, it can decode them in a single table lookup. It also uses

careful coding to make sure critical functions are inlined by the compiler.

- In decoding, the ReadByte method becomes a bottleneck because it is

called for every byte. The bit reader I have avoids using ReadByte if the

input Reader is a bufio.Reader (or something that looks like it). Thus, the

decoder can grab more bytes in a batch, rather going at them one at a time.

Q: Why won't the library build?

The library depends on at least Go1.5.

JT

On Thursday, December 10, 2015 at 8:46:30 AM UTC-8, bradfitz wrote:

Is the goal to merge this into the standard library?

Is the goal to merge this into the standard library?

On Tue, Dec 8, 2015 at 5:48 PM, wrote:

Just an update, I have completed a working decoder. It decompresses at

~20MB/s, which about 25% faster than the C library (~16MB/s), and 200%

faster than the Go standard library (~5MB/s).

https://godoc.org/github.com/dsnet/compress/bzip2

Enjoy.

JT

On Monday, December 7, 2015 at 11:55:41 PM UTC-8, thebroke...@gmail.com

wrote:

You received this message because you are subscribed to the Google

Groups "golang-nuts" group.

To unsubscribe from this group and stop receiving emails from it, send

an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Just an update, I have completed a working decoder. It decompresses at

~20MB/s, which about 25% faster than the C library (~16MB/s), and 200%

faster than the Go standard library (~5MB/s).

https://godoc.org/github.com/dsnet/compress/bzip2

Enjoy.

JT

On Monday, December 7, 2015 at 11:55:41 PM UTC-8, thebroke...@gmail.com

wrote:

Hey all,

github: https://github.com/dsnet/compress

godoc: https://godoc.org/github.com/dsnet/compress/bzip2

I finished writing a bzip2 encoder. I haven't heavily tested it, and

there's still alot to do in the area of optimizations. However, it still

performs relatively well relative to the C implementation and is actually

faster in encoding than the standard Go library is at in decoding. The

speed is about half that of the C version, but I think I can close them gap

to be about 1.5x or even 1x. The compression ratio is about 10% worse since

I haven't implemented a K-means based huffman encoder yet. I have plans to

add a decoder eventually that will outperform the standard library version.

BENCHMARK: bz2:encRate

This package is still in development, but I would appreciate if people

can try using it and tell me about any bugs. I'll make a bigger

announcement about it when I feel that the package is stable enough. I

eventually plan on merging it into the standard library, but that

definitely won't happen until go1.7 or later.

JT

--github: https://github.com/dsnet/compress

godoc: https://godoc.org/github.com/dsnet/compress/bzip2

I finished writing a bzip2 encoder. I haven't heavily tested it, and

there's still alot to do in the area of optimizations. However, it still

performs relatively well relative to the C implementation and is actually

faster in encoding than the standard Go library is at in decoding. The

speed is about half that of the C version, but I think I can close them gap

to be about 1.5x or even 1x. The compression ratio is about 10% worse since

I haven't implemented a K-means based huffman encoder yet. I have plans to

add a decoder eventually that will outperform the standard library version.

BENCHMARK: bz2:encRate

benchmark ds MB/s delta cgo MB/s delta

digits.txt:1:1e4 5.44 1.00x 7.55 1.39x

digits.txt:1:1e5 5.57 1.00x 12.51 2.25x

digits.txt:1:1e6 5.76 1.00x 12.88 2.24x

digits.txt:6:1e4 4.86 1.00x 7.47 1.54x

digits.txt:6:1e5 5.54 1.00x 12.46 2.25x

digits.txt:6:1e6 5.39 1.00x 12.46 2.31x

digits.txt:9:1e4 4.66 1.00x 7.54 1.62x

digits.txt:9:1e5 5.48 1.00x 12.60 2.30x

digits.txt:9:1e6 5.02 1.00x 12.11 2.41x

twain.txt:1:1e4 5.26 1.00x 7.89 1.50x

twain.txt:1:1e5 6.02 1.00x 12.89 2.14x

twain.txt:1:1e6 6.19 1.00x 12.83 2.07x

twain.txt:6:1e4 4.75 1.00x 8.07 1.70x

twain.txt:6:1e5 6.02 1.00x 12.93 2.15x

twain.txt:6:1e6 6.09 1.00x 12.51 2.06x

twain.txt:9:1e4 4.54 1.00x 7.93 1.75x

twain.txt:9:1e5 6.00 1.00x 12.50 2.08x

twain.txt:9:1e6 5.73 1.00x 12.13 2.12x

BENCHMARK: bz2:ratio

benchmark ds ratio delta cgo ratio delta

digits.txt:1:1e4 2.20x 1.00x 2.28x 1.04x

digits.txt:1:1e5 2.23x 1.00x 2.32x 1.04x

digits.txt:1:1e6 2.22x 1.00x 2.31x 1.04x

digits.txt:6:1e4 2.20x 1.00x 2.28x 1.04x

digits.txt:6:1e5 2.23x 1.00x 2.32x 1.04x

digits.txt:6:1e6 2.12x 1.00x 2.23x 1.05x

digits.txt:9:1e4 2.20x 1.00x 2.28x 1.04x

digits.txt:9:1e5 2.23x 1.00x 2.32x 1.04x

digits.txt:9:1e6 2.05x 1.00x 2.16x 1.05x

twain.txt:1:1e4 2.07x 1.00x 2.25x 1.09x

twain.txt:1:1e5 2.68x 1.00x 2.90x 1.08x

twain.txt:1:1e6 2.68x 1.00x 2.89x 1.08x

twain.txt:6:1e4 2.07x 1.00x 2.25x 1.09x

twain.txt:6:1e5 2.68x 1.00x 2.90x 1.08x

twain.txt:6:1e6 2.87x 1.00x 3.12x 1.09x

twain.txt:9:1e4 2.07x 1.00x 2.25x 1.09x

twain.txt:9:1e5 2.68x 1.00x 2.90x 1.08x

twain.txt:9:1e6 2.87x 1.00x 3.12x 1.09x

digits.txt:1:1e4 5.44 1.00x 7.55 1.39x

digits.txt:1:1e5 5.57 1.00x 12.51 2.25x

digits.txt:1:1e6 5.76 1.00x 12.88 2.24x

digits.txt:6:1e4 4.86 1.00x 7.47 1.54x

digits.txt:6:1e5 5.54 1.00x 12.46 2.25x

digits.txt:6:1e6 5.39 1.00x 12.46 2.31x

digits.txt:9:1e4 4.66 1.00x 7.54 1.62x

digits.txt:9:1e5 5.48 1.00x 12.60 2.30x

digits.txt:9:1e6 5.02 1.00x 12.11 2.41x

twain.txt:1:1e4 5.26 1.00x 7.89 1.50x

twain.txt:1:1e5 6.02 1.00x 12.89 2.14x

twain.txt:1:1e6 6.19 1.00x 12.83 2.07x

twain.txt:6:1e4 4.75 1.00x 8.07 1.70x

twain.txt:6:1e5 6.02 1.00x 12.93 2.15x

twain.txt:6:1e6 6.09 1.00x 12.51 2.06x

twain.txt:9:1e4 4.54 1.00x 7.93 1.75x

twain.txt:9:1e5 6.00 1.00x 12.50 2.08x

twain.txt:9:1e6 5.73 1.00x 12.13 2.12x

BENCHMARK: bz2:ratio

benchmark ds ratio delta cgo ratio delta

digits.txt:1:1e4 2.20x 1.00x 2.28x 1.04x

digits.txt:1:1e5 2.23x 1.00x 2.32x 1.04x

digits.txt:1:1e6 2.22x 1.00x 2.31x 1.04x

digits.txt:6:1e4 2.20x 1.00x 2.28x 1.04x

digits.txt:6:1e5 2.23x 1.00x 2.32x 1.04x

digits.txt:6:1e6 2.12x 1.00x 2.23x 1.05x

digits.txt:9:1e4 2.20x 1.00x 2.28x 1.04x

digits.txt:9:1e5 2.23x 1.00x 2.32x 1.04x

digits.txt:9:1e6 2.05x 1.00x 2.16x 1.05x

twain.txt:1:1e4 2.07x 1.00x 2.25x 1.09x

twain.txt:1:1e5 2.68x 1.00x 2.90x 1.08x

twain.txt:1:1e6 2.68x 1.00x 2.89x 1.08x

twain.txt:6:1e4 2.07x 1.00x 2.25x 1.09x

twain.txt:6:1e5 2.68x 1.00x 2.90x 1.08x

twain.txt:6:1e6 2.87x 1.00x 3.12x 1.09x

twain.txt:9:1e4 2.07x 1.00x 2.25x 1.09x

twain.txt:9:1e5 2.68x 1.00x 2.90x 1.08x

twain.txt:9:1e6 2.87x 1.00x 3.12x 1.09x

This package is still in development, but I would appreciate if people

can try using it and tell me about any bugs. I'll make a bigger

announcement about it when I feel that the package is stable enough. I

eventually plan on merging it into the standard library, but that

definitely won't happen until go1.7 or later.

JT

You received this message because you are subscribed to the Google

Groups "golang-nuts" group.

To unsubscribe from this group and stop receiving emails from it, send

an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups "golang-nuts" group.

To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.