FAQ
Hi.

Played with go-snappy (http://code.google.com/p/snappy-go/)
performance improving using sparse arrays
(http://research.swtch.com/sparse). Got mixed results, some numbers
are better while some others are worse:

(17:14) jnml@fsc-r550:~/go/misc$ ./benchcmp ~/tmp/old-tip.txt ~/tmp/new-tip.txt
benchmark old ns/op new ns/op delta
BenchmarkDecodeWords1e3 6887 6891 +0.06%
BenchmarkDecodeWords1e4 67504 67037 -0.69%
BenchmarkDecodeWords1e5 691593 692624 +0.15%
BenchmarkDecodeWords1e6 6108067 6020448 -1.43%
BenchmarkEncodeWords1e3 24139 16070 -33.43%
BenchmarkEncodeWords1e4 157504 148469 -5.74%
BenchmarkEncodeWords1e5 1464519 1683214 +14.93%
BenchmarkEncodeWords1e6 11771157 13947030 +18.48%

benchmark old MB/s new MB/s speedup
BenchmarkDecodeWords1e3 98.58 98.52 1.00x
BenchmarkDecodeWords1e4 89.24 89.86 1.01x
BenchmarkDecodeWords1e5 89.77 89.64 1.00x
BenchmarkDecodeWords1e6 85.36 86.60 1.01x
BenchmarkEncodeWords1e3 41.43 62.23 1.50x
BenchmarkEncodeWords1e4 63.49 67.35 1.06x
BenchmarkEncodeWords1e5 68.28 59.41 0.87x
BenchmarkEncodeWords1e6 84.95 71.70 0.84x

benchmark old allocs new allocs delta
BenchmarkDecodeWords1e3 0 0 n/a%
BenchmarkDecodeWords1e4 0 0 n/a%
BenchmarkDecodeWords1e5 0 0 n/a%
BenchmarkDecodeWords1e6 0 0 n/a%
BenchmarkEncodeWords1e3 1 0 -100.00%
BenchmarkEncodeWords1e4 1 0 -100.00%
BenchmarkEncodeWords1e5 1 0 -100.00%
BenchmarkEncodeWords1e6 1 0 -100.00%

benchmark old bytes new bytes delta
BenchmarkDecodeWords1e3 0 0 n/a%
BenchmarkDecodeWords1e4 0 5 n/a%
BenchmarkDecodeWords1e5 0 59 n/a%
BenchmarkDecodeWords1e6 0 598 n/a%
BenchmarkEncodeWords1e3 135168 2 -100.00%
BenchmarkEncodeWords1e4 135168 29 -99.98%
BenchmarkEncodeWords1e5 135168 299 -99.78%
BenchmarkEncodeWords1e6 135168 2990 -97.79%
(17:16) jnml@fsc-r550:~/go/misc$

Additionally, there is an API change. The sparse array (used for the
hash table) is now an Encoder struct and Encode is its method. Reason:
No need to (re)initialize the (reusable) Encoder (less allocations,
zero additional initialization overhead). Docs:

----

type Encoder struct {
// contains filtered or unexported fields
}

Encoder provides snappy encoding. A zero value of Encoder is ready for
use. After an invocation of Encode, Encoder is reusable for another
invocation of Encode.

func (e *Encoder) Encode(dst, src []byte) ([]byte, error)

Encode returns the encoded form of src. The returned slice may be a
sub- slice of dst if dst was large enough to hold the entire encoded
block. Otherwise, a newly allocated slice will be returned. It is
valid to pass a nil dst.

----

Not able to evaluate it properly, seeking opinions of others. Is it worth a CL?

Thanks.

-j

Search Discussions

  • Nigel Tao at Jan 18, 2013 at 1:54 am

    On Fri, Jan 18, 2013 at 4:05 AM, Jan Mercl wrote:
    Not able to evaluate it properly, seeking opinions of others. Is it worth a CL?
    I'd hesistate to diverge from the C++ Snappy implementation [0] unless
    there's good reason to, and the benchmarks seem inconclusive.

    [0] https://code.google.com/p/snappy/source/browse/trunk/snappy.cc
  • Jan Mercl at Jan 18, 2013 at 10:09 am

    On Fri, Jan 18, 2013 at 2:54 AM, Nigel Tao wrote:
    I'd hesistate to diverge from the C++ Snappy implementation [0] unless
    there's good reason to, and the benchmarks seem inconclusive.
    Agreed. Round 2: No API changes, no slowdowns (above a 0.5% noise
    level), minor speedups from small tweaks.

    (10:44) jnml@fsc-r550:~$ benchcmp ~/tmp/old-tip.txt ~/tmp/new-tip.txt
    benchmark old ns/op new ns/op delta
    BenchmarkDecodeWords1e3 6835 6868 +0.48%
    BenchmarkDecodeWords1e4 66808 66886 +0.12%
    BenchmarkDecodeWords1e5 688943 689114 +0.02%
    BenchmarkDecodeWords1e6 5991699 5998360 +0.11%
    BenchmarkEncodeWords1e3 24150 23089 -4.39%
    BenchmarkEncodeWords1e4 157335 138771 -11.80%
    BenchmarkEncodeWords1e5 1464949 1437098 -1.90%
    BenchmarkEncodeWords1e6 11773207 11611877 -1.37%

    benchmark old MB/s new MB/s speedup
    BenchmarkDecodeWords1e3 99.33 98.86 1.00x
    BenchmarkDecodeWords1e4 90.17 90.06 1.00x
    BenchmarkDecodeWords1e5 90.11 90.09 1.00x
    BenchmarkDecodeWords1e6 87.01 86.92 1.00x
    BenchmarkEncodeWords1e3 41.41 43.31 1.05x
    BenchmarkEncodeWords1e4 63.56 72.06 1.13x
    BenchmarkEncodeWords1e5 68.26 69.58 1.02x
    BenchmarkEncodeWords1e6 84.94 86.12 1.01x

    benchmark old allocs new allocs delta
    BenchmarkDecodeWords1e3 0 0 n/a%
    BenchmarkDecodeWords1e4 0 0 n/a%
    BenchmarkDecodeWords1e5 0 0 n/a%
    BenchmarkDecodeWords1e6 0 0 n/a%
    BenchmarkEncodeWords1e3 1 1 0.00%
    BenchmarkEncodeWords1e4 1 1 0.00%
    BenchmarkEncodeWords1e5 1 1 0.00%
    BenchmarkEncodeWords1e6 1 1 0.00%

    benchmark old bytes new bytes delta
    BenchmarkDecodeWords1e3 0 0 n/a%
    BenchmarkDecodeWords1e4 0 0 n/a%
    BenchmarkDecodeWords1e5 0 0 n/a%
    BenchmarkDecodeWords1e6 0 0 n/a%
    BenchmarkEncodeWords1e3 135168 135168 0.00%
    BenchmarkEncodeWords1e4 135168 135168 0.00%
    BenchmarkEncodeWords1e5 135168 135168 0.00%
    BenchmarkEncodeWords1e6 135168 135168 0.00%
    (10:46) jnml@fsc-r550:~$

    (10:58) jnml@fsc-r550:~/src/code.google.com/p/snappy-go/snappy$ diff
    -u encode.go ../../snappy-go-fork/snappy/encode.go
    --- encode.go 2013-01-17 17:05:40.307321778 +0100
    +++ ../../snappy-go-fork/snappy/encode.go 2013-01-18 10:43:35.530802616 +0100
    @@ -110,9 +110,6 @@
    tableSize *= 2
    }
    var table [maxTableSize]int
    - for i := 0; i < tableSize; i++ {
    - table[i] = -1
    - }

    // Iterate over the source bytes.
    var (
    @@ -123,8 +120,8 @@
    for s+3 < len(src) {
    // Update the hash table.
    h := uint32(src[s]) | uint32(src[s+1])<<8 | uint32(src[s+2])<<16 |
    uint32(src[s+3])<<24
    - h = (h * 0x1e35a7bd) >> shift
    - t, table[h] = table[h], s
    + p := &table[(h*0x1e35a7bd)>>shift]
    + t, *p = *p-1, s+1
    // If t is invalid or src[s:s+4] differs from src[t:t+4], accumulate
    a literal byte.
    if t < 0 || s-t >= maxOffset || !equal4(src, t, s) {
    s++
    (10:59) jnml@fsc-r550:~/src/code.google.com/p/snappy-go/snappy$

    -j
  • Nigel Tao at Feb 10, 2013 at 4:27 am
    CC'ing maruel@chromium.org, who was looking recently at snappy-go
    benchmarks. Marc-Antoine, if you missed the earlier conversation, it's
    at https://groups.google.com/d/topic/golang-dev/LS7ynr4dhPY/discussion

    The diff below looks sane (albeit no longer anything to do with sparse
    arrays), but the -1/+1 will need some comments somewhere. Feel free to
    send out a CL after https://codereview.appspot.com/7271049/ lands.

    On Fri, Jan 18, 2013 at 9:09 PM, Jan Mercl wrote:
    (10:58) jnml@fsc-r550:~/src/code.google.com/p/snappy-go/snappy$ diff
    -u encode.go ../../snappy-go-fork/snappy/encode.go
    --- encode.go 2013-01-17 17:05:40.307321778 +0100
    +++ ../../snappy-go-fork/snappy/encode.go 2013-01-18 10:43:35.530802616 +0100
    @@ -110,9 +110,6 @@
    tableSize *= 2
    }
    var table [maxTableSize]int
    - for i := 0; i < tableSize; i++ {
    - table[i] = -1
    - }

    // Iterate over the source bytes.
    var (
    @@ -123,8 +120,8 @@
    for s+3 < len(src) {
    // Update the hash table.
    h := uint32(src[s]) | uint32(src[s+1])<<8 | uint32(src[s+2])<<16 |
    uint32(src[s+3])<<24
    - h = (h * 0x1e35a7bd) >> shift
    - t, table[h] = table[h], s
    + p := &table[(h*0x1e35a7bd)>>shift]
    + t, *p = *p-1, s+1
    // If t is invalid or src[s:s+4] differs from src[t:t+4], accumulate
    a literal byte.
    if t < 0 || s-t >= maxOffset || !equal4(src, t, s) {
    s++
    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Jan Mercl at Feb 10, 2013 at 9:07 am

    On Sun, Feb 10, 2013 at 5:27 AM, Nigel Tao wrote:
    The diff below looks sane (albeit no longer anything to do with sparse
    arrays), but the -1/+1 will need some comments somewhere. Feel free to
    send out a CL after https://codereview.appspot.com/7271049/ lands.
    Ack. Will do.

    -j

    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Jan Mercl at Feb 18, 2013 at 1:11 pm

    On Sun, Feb 10, 2013 at 5:27 AM, Nigel Tao wrote:

    CC'ing maruel@chromium.org, who was looking recently at snappy-go
    benchmarks. Marc-Antoine, if you missed the earlier conversation, it's
    at https://groups.google.com/d/topic/golang-dev/LS7ynr4dhPY/discussion

    The diff below looks sane (albeit no longer anything to do with sparse
    arrays), but the -1/+1 will need some comments somewhere. Feel free to
    send out a CL after https://codereview.appspot.com/7271049/ lands.

    https://codereview.appspot.com/7346051

    -j

    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • Dmitry Chestnykh at Feb 10, 2013 at 4:29 pm

    On Friday, January 18, 2013 11:09:27 AM UTC+1, Jan Mercl wrote:

    // Iterate over the source bytes.
    var (
    @@ -123,8 +120,8 @@
    for s+3 < len(src) {
    // Update the hash table.
    h := uint32(src[s]) | uint32(src[s+1])<<8 | uint32(src[s+2])<<16 |
    uint32(src[s+3])<<24
    - h = (h * 0x1e35a7bd) >> shift
    - t, table[h] = table[h], s
    + p := &table[(h*0x1e35a7bd)>>shift]
    + t, *p = *p-1, s+1
    // If t is invalid or src[s:s+4] differs from src[t:t+4], accumulate
    a literal byte.
    if t < 0 || s-t >= maxOffset || !equal4(src, t, s) {
    s++
    (10:59) jnml@fsc-r550:~/src/code.google.com/p/snappy-go/snappy$
    You can also eliminate duplicate loads from src, which equal4 does:

    b0 := src[s+0]
    b1 := src[s+1]
    b2 := src[s+2]
    b3 := src[s+3]
    h := ((uint32(b0) | uint32(b1)<<8 | uint32(b2)<<16 |
    uint32(b3)<<24)
    p := &table[(h*0x1e35a7bd)>>shift]
    t, *p = *p-1, s+1
    // If t is invalid or src[s:s+4] differs from src[t:t+4],
    accumulate a literal byte.
    if t < 0 || s-t >= maxOffset || b0 != src[t+0] || b1 !=
    src[t+1] || b2 != src[t+2] || b3 != src[t+3] {
    s++
    continue
    }

    --

    ---
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedJan 17, '13 at 6:09p
activeFeb 18, '13 at 1:11p
posts7
users3
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase