FAQ
Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com),

Please take another look.


http://codereview.appspot.com/6820096/

Search Discussions

  • Notcarl at Nov 6, 2012 at 3:22 am
    Reviewers: golang-dev_googlegroups.com,

    Message:
    Hello golang-dev@googlegroups.com,

    I'd like you to review this change to
    https://code.google.com/p/go/


    Description:
    crypto/sha1: Make sha-1 do block mixup in place

    Please review this at http://codereview.appspot.com/6820096/

    Affected files:
    M src/pkg/crypto/sha1/sha1block.go


    Index: src/pkg/crypto/sha1/sha1block.go
    ===================================================================
    --- a/src/pkg/crypto/sha1/sha1block.go
    +++ b/src/pkg/crypto/sha1/sha1block.go
    @@ -16,7 +16,7 @@
    )

    func block(dig *digest, p []byte) {
    - var w [80]uint32
    + var w [16]uint32

    h0, h1, h2, h3, h4 := dig.h[0], dig.h[1], dig.h[2], dig.h[3], dig.h[4]
    for len(p) >= chunk {
    @@ -26,42 +26,54 @@
    j := i * 4
    w[i] = uint32(p[j])<<24 | uint32(p[j+1])<<16 | uint32(p[j+2])<<8 |
    uint32(p[j+3])
    }
    - for i := 16; i < 80; i++ {
    - tmp := w[i-3] ^ w[i-8] ^ w[i-14] ^ w[i-16]
    - w[i] = tmp<<1 | tmp>>(32-1)
    - }

    a, b, c, d, e := h0, h1, h2, h3, h4

    // Each of the four 20-iteration rounds
    // differs only in the computation of f and
    // the choice of K (_K0, _K1, etc).
    - for i := 0; i < 20; i++ {
    + for i := 0; i < 16; i++ {
    f := b&c | (^b)&d
    a5 := a<<5 | a>>(32-5)
    b30 := b<<30 | b>>(32-30)
    t := a5 + f + e + w[i] + _K0
    a, b, c, d, e = t, a, b30, c, d
    }
    + for i := 16; i < 20; i++ {
    + tmp := w[(i-3)&0xf] ^ w[(i-8)&0xf] ^ w[(i-14)&0xf] ^ w[(i)&0xf]
    + w[i&0xf] = tmp<<1 | tmp>>(32-1)
    +
    + f := b&c | (^b)&d
    + a5 := a<<5 | a>>(32-5)
    + b30 := b<<30 | b>>(32-30)
    + t := a5 + f + e + w[i&0xf] + _K0
    + a, b, c, d, e = t, a, b30, c, d
    + }
    for i := 20; i < 40; i++ {
    + tmp := w[(i-3)&0xf] ^ w[(i-8)&0xf] ^ w[(i-14)&0xf] ^ w[(i)&0xf]
    + w[i&0xf] = tmp<<1 | tmp>>(32-1)
    f := b ^ c ^ d
    a5 := a<<5 | a>>(32-5)
    b30 := b<<30 | b>>(32-30)
    - t := a5 + f + e + w[i] + _K1
    + t := a5 + f + e + w[i&0xf] + _K1
    a, b, c, d, e = t, a, b30, c, d
    }
    for i := 40; i < 60; i++ {
    + tmp := w[(i-3)&0xf] ^ w[(i-8)&0xf] ^ w[(i-14)&0xf] ^ w[(i)&0xf]
    + w[i&0xf] = tmp<<1 | tmp>>(32-1)
    f := b&c | b&d | c&d
    a5 := a<<5 | a>>(32-5)
    b30 := b<<30 | b>>(32-30)
    - t := a5 + f + e + w[i] + _K2
    + t := a5 + f + e + w[i&0xf] + _K2
    a, b, c, d, e = t, a, b30, c, d
    }
    for i := 60; i < 80; i++ {
    + tmp := w[(i-3)&0xf] ^ w[(i-8)&0xf] ^ w[(i-14)&0xf] ^ w[(i)&0xf]
    + w[i&0xf] = tmp<<1 | tmp>>(32-1)
    f := b ^ c ^ d
    a5 := a<<5 | a>>(32-5)
    b30 := b<<30 | b>>(32-30)
    - t := a5 + f + e + w[i] + _K3
    + t := a5 + f + e + w[i&0xf] + _K3
    a, b, c, d, e = t, a, b30, c, d
    }
  • Dave at Nov 6, 2012 at 4:01 am

    On 2012/11/06 02:12:48, notcarl wrote:
    Hello mailto:golang-dev@googlegroups.com (cc:
    mailto:golang-dev@googlegroups.com),
    Please take another look.
    lucky(~/go/src/pkg/crypto/sha1) % ~/go/misc/benchcmp {old,new}.txt
    benchmark old ns/op new ns/op delta
    BenchmarkHash8Bytes 863 831 -3.71%
    BenchmarkHash1K 10049 9693 -3.54%
    BenchmarkHash8K 74555 72132 -3.25%

    benchmark old MB/s new MB/s speedup
    BenchmarkHash8Bytes 9.27 9.62 1.04x
    BenchmarkHash1K 101.90 105.64 1.04x
    BenchmarkHash8K 109.88 113.57 1.03x

    I think this can be made a little faster.

    https://codereview.appspot.com/6820096/
  • Dave at Nov 6, 2012 at 4:06 am
    https://codereview.appspot.com/6820096/diff/6001/src/pkg/crypto/sha1/sha1block.go
    File src/pkg/crypto/sha1/sha1block.go (right):

    https://codereview.appspot.com/6820096/diff/6001/src/pkg/crypto/sha1/sha1block.go#newcode35
    src/pkg/crypto/sha1/sha1block.go:35: for i := 0; i < 16; i++ {
    lifting i outside the for loop, then not initalising it again lower down
    gets another 2% on amd64

    i := 0
    for ; i < 16 ; i++

    lucky(~/go/src/pkg/crypto/sha1) % ~/go/misc/benchcmp {old,new}.txt
    benchmark old ns/op new ns/op delta
    BenchmarkHash8Bytes 863 819 -5.10%
    BenchmarkHash1K 10049 9499 -5.47%
    BenchmarkHash8K 74555 70395 -5.58%

    benchmark old MB/s new MB/s speedup
    BenchmarkHash8Bytes 9.27 9.76 1.05x
    BenchmarkHash1K 101.90 107.80 1.06x
    BenchmarkHash8K 109.88 116.37 1.06x

    https://codereview.appspot.com/6820096/diff/6001/src/pkg/crypto/sha1/sha1block.go#newcode42
    src/pkg/crypto/sha1/sha1block.go:42: for i := 16; i < 20; i++ {
    for ; i < 20; i++ { .. }
    and so forth

    https://codereview.appspot.com/6820096/
  • Rsc at Nov 6, 2012 at 6:52 pm
    looks good to me. will leave for dfc to lgtm and submit



    https://codereview.appspot.com/6820096/diff/6001/src/pkg/crypto/sha1/sha1block.go
    File src/pkg/crypto/sha1/sha1block.go (right):

    https://codereview.appspot.com/6820096/diff/6001/src/pkg/crypto/sha1/sha1block.go#newcode39
    src/pkg/crypto/sha1/sha1block.go:39: t := a5 + f + e + w[i] + _K0
    i&0xf will get you something here.

    https://codereview.appspot.com/6820096/diff/6001/src/pkg/crypto/sha1/sha1block.go#newcode42
    src/pkg/crypto/sha1/sha1block.go:42: for i := 16; i < 20; i++ {
    On 2012/11/06 04:06:47, dfc wrote:
    for ; i < 20; i++ { .. }
    and so forth
    FWIW this is unlikely to matter.

    https://codereview.appspot.com/6820096/
  • Rsc at Nov 6, 2012 at 7:39 pm
    Please put the current benchcmp numbers in the CL description.
    Leaving for Dave.

    Thanks.
    Russ


    https://codereview.appspot.com/6820096/
  • Notcarl at Nov 6, 2012 at 7:39 pm
    Thanks for the quick review!


    https://codereview.appspot.com/6820096/diff/6001/src/pkg/crypto/sha1/sha1block.go
    File src/pkg/crypto/sha1/sha1block.go (right):

    https://codereview.appspot.com/6820096/diff/6001/src/pkg/crypto/sha1/sha1block.go#newcode39
    src/pkg/crypto/sha1/sha1block.go:39: t := a5 + f + e + w[i] + _K0
    On 2012/11/06 18:52:06, rsc wrote:
    i&0xf will get you something here.
    I am not sure I understand. Since the loop only goes from [0, 15],
    anding with 0xf should be a no-op.

    https://codereview.appspot.com/6820096/diff/6001/src/pkg/crypto/sha1/sha1block.go#newcode42
    src/pkg/crypto/sha1/sha1block.go:42: for i := 16; i < 20; i++ {
    On 2012/11/06 18:52:06, rsc wrote:
    On 2012/11/06 04:06:47, dfc wrote:
    for ; i < 20; i++ { .. }
    and so forth
    FWIW this is unlikely to matter.

    I was hoping that maybe the compiler saw this. I didn't unroll the
    loops because the compiler should be doing it and it decreases
    readability. In addition, the comments at the top of this file hint at
    it not being a priority to make this as fast as possible.

    https://codereview.appspot.com/6820096/
  • Russ Cox at Nov 6, 2012 at 7:08 pm

    i&0xf will get you something here.
    I am not sure I understand. Since the loop only goes from [0, 15],
    anding with 0xf should be a no-op.
    The compiler isn't smart enough to recognize that i is guaranteed to
    be in range here, but &0xf will force it to notice that and avoid the
    bounds check. It is unlikely to matter much.
    On 2012/11/06 04:06:47, dfc wrote:
    for ; i < 20; i++ { .. }
    and so forth
    FWIW this is unlikely to matter.
    I was hoping that maybe the compiler saw this. I didn't unroll the
    loops because the compiler should be doing it and it decreases
    readability. In addition, the comments at the top of this file hint at
    it not being a priority to make this as fast as possible.
    Dave suggested a single i instead of 5 different i variables. My
    comment was only that from a performance standpoint it is unlikely to
    matter. The compiler does not unroll the loops, and in general I think
    it is unreasonable to expect it to. The big win you can get from the
    SHA1 block function is if you unroll the loops 5x and do the variable
    renamings in each iteration, so that all the rotating assignments
    disappear. If you want to do this it should be done in a separate CL
    and the speed gain should justify the increase in code. The last time
    I looked at this I think it was not quite enough. crypto/md5/gen.go is
    the program that generates the MD5 block routine. You could use
    something similar for SHA1.

    Russ
  • Notcarl at Nov 6, 2012 at 7:39 pm

    On 2012/11/06 19:08:54, rsc wrote:
    i&0xf will get you something here.
    I am not sure I understand. Since the loop only goes from [0, 15],
    anding with 0xf should be a no-op.
    The compiler isn't smart enough to recognize that i is guaranteed to
    be in range here, but &0xf will force it to notice that and avoid the
    bounds check. It is unlikely to matter much.

    I adding this change, and got a small increase in speed.

    On 2012/11/06 04:06:47, dfc wrote:
    for ; i < 20; i++ { .. }
    and so forth
    FWIW this is unlikely to matter.
    I was hoping that maybe the compiler saw this. I didn't unroll the
    loops because the compiler should be doing it and it decreases
    readability. In addition, the comments at the top of this file hint
    at
    it not being a priority to make this as fast as possible.
    Dave suggested a single i instead of 5 different i variables. My
    comment was only that from a performance standpoint it is unlikely to
    matter. The compiler does not unroll the loops, and in general I think
    it is unreasonable to expect it to. The big win you can get from the
    SHA1 block function is if you unroll the loops 5x and do the variable
    renamings in each iteration, so that all the rotating assignments
    disappear. If you want to do this it should be done in a separate CL
    and the speed gain should justify the increase in code. The last time
    I looked at this I think it was not quite enough. crypto/md5/gen.go is
    the program that generates the MD5 block routine. You could use
    something similar for SHA1.
    Russ
    The fear is that unrolling the loop means potentially more i-cache
    misses, so I agree it should be in a future CL. Hoisting i out of the
    loop in the meantime showed a little extra gain.



    https://codereview.appspot.com/6820096/
  • Notcarl at Nov 6, 2012 at 7:39 pm
    Hello dave@cheney.net, rsc@golang.org (cc: golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6820096/
  • Notcarl at Nov 6, 2012 at 7:57 pm

    On 2012/11/06 19:38:57, rsc wrote:
    Please put the current benchcmp numbers in the CL description.
    Leaving for Dave.
    Thanks.
    Russ
    Done.


    https://codereview.appspot.com/6820096/
  • Dave at Nov 6, 2012 at 9:46 pm
    hmmm, i can't replicate your benchmark results.

    linux/amd64: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz

    benchmark old ns/op new ns/op delta
    BenchmarkHash8Bytes 866 824 -4.85%
    BenchmarkHash1K 10058 9520 -5.35%
    BenchmarkHash8K 74567 70953 -4.85%

    benchmark old MB/s new MB/s speedup
    BenchmarkHash8Bytes 9.24 9.71 1.05x
    BenchmarkHash1K 101.81 107.56 1.06x
    BenchmarkHash8K 109.86 115.46 1.05x

    https://codereview.appspot.com/6820096/
  • Notcarl at Nov 6, 2012 at 9:50 pm
    $ cat /proc/cpuinfo
    processor : 0
    vendor_id : GenuineIntel
    cpu family : 6
    model : 44
    model name : Intel(R) Xeon(R) CPU W3690 @ 3.47GHz
    ...



    https://codereview.appspot.com/6820096/
  • Dave at Nov 6, 2012 at 10:04 pm
    Looks pretty good to me as well. I'll run some benchmarks on 32 bit
    platforms once mine get through building the overnight changes.


    https://codereview.appspot.com/6820096/diff/4002/src/pkg/crypto/sha1/sha1block.go
    File src/pkg/crypto/sha1/sha1block.go (right):

    https://codereview.appspot.com/6820096/diff/4002/src/pkg/crypto/sha1/sha1block.go#newcode26
    src/pkg/crypto/sha1/sha1block.go:26: j := i * 4
    j := i << 2 avoids the imul on intel. The compiler should be smarter
    about this.

    https://codereview.appspot.com/6820096/
  • Notcarl at Nov 6, 2012 at 10:51 pm

    On 2012/11/06 22:04:24, dfc wrote:
    Looks pretty good to me as well. I'll run some benchmarks on 32 bit platforms
    once mine get through building the overnight changes.

    https://codereview.appspot.com/6820096/diff/4002/src/pkg/crypto/sha1/sha1block.go
    File src/pkg/crypto/sha1/sha1block.go (right):

    https://codereview.appspot.com/6820096/diff/4002/src/pkg/crypto/sha1/sha1block.go#newcode26
    src/pkg/crypto/sha1/sha1block.go:26: j := i * 4
    j := i << 2 avoids the imul on intel. The compiler should be smarter
    about this.

    I tried changing the variables to unsigned to see if it would change
    anything. I also tried adding in the shift instead of the mult, but
    neither of these two changes affected the results of my benchmarks. I
    think it would be safe to leave this alone for now.

    https://codereview.appspot.com/6820096/
  • Dave at Nov 6, 2012 at 11:07 pm
    Fair enough. The compiler should be able to do this for us eventually.

    I know you're a Googler, so most of the CLA process does not apply to
    you, but have you done whatever is needed on your side ?

    https://codereview.appspot.com/6820096/
  • Ian Lance Taylor at Nov 6, 2012 at 11:18 pm

    On Tue, Nov 6, 2012 at 3:07 PM, wrote:
    Fair enough. The compiler should be able to do this for us eventually.

    I know you're a Googler, so most of the CLA process does not apply to
    you, but have you done whatever is needed on your side ?
    To be clear, are you approving this patch? Let me know.

    We just need to get him into CONTRIBUTORS.

    Ian
  • Dave Cheney at Nov 6, 2012 at 11:18 pm
    Yes, LGTM.
    On Wed, Nov 7, 2012 at 10:18 AM, Ian Lance Taylor wrote:
    On Tue, Nov 6, 2012 at 3:07 PM, wrote:
    Fair enough. The compiler should be able to do this for us eventually.

    I know you're a Googler, so most of the CLA process does not apply to
    you, but have you done whatever is needed on your side ?
    To be clear, are you approving this patch? Let me know.

    We just need to get him into CONTRIBUTORS.

    Ian
  • Dave at Nov 6, 2012 at 11:24 pm
    Some results from

    linux/arm:

    benchmark old ns/op new ns/op delta
    BenchmarkHash8Bytes 6487 6497 +0.15%
    BenchmarkHash1K 69794 70773 +1.40%
    BenchmarkHash8K 517529 525604 +1.56%

    benchmark old MB/s new MB/s speedup
    BenchmarkHash8Bytes 1.23 1.23 1.00x
    BenchmarkHash1K 14.67 14.47 0.99x
    BenchmarkHash8K 15.83 15.59 0.98x

    Wasn't expecting much, and it looks like my expectations were met. This
    is within the margin of error.

    https://codereview.appspot.com/6820096/
  • Dave at Nov 7, 2012 at 2:41 am
    *** Submitted as
    http://code.google.com/p/go/source/detail?r=1369d7bb329d ***

    crypto/sha1: Make sha-1 do block mixup in place

    Benchmarks:

    benchmark old ns/op new ns/op delta
    BenchmarkHash8Bytes 762 674 -11.55%
    BenchmarkHash1K 8791 7375 -16.11%
    BenchmarkHash8K 65094 54881 -15.69%

    benchmark old MB/s new MB/s speedup
    BenchmarkHash8Bytes 10.50 11.86 1.13x
    BenchmarkHash1K 116.48 138.84 1.19x
    BenchmarkHash8K 125.85 149.27 1.19x

    R=dave, rsc, iant
    CC=golang-dev
    http://codereview.appspot.com/6820096

    Committer: Dave Cheney <dave@cheney.net>


    http://codereview.appspot.com/6820096/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedNov 6, '12 at 3:22a
activeNov 7, '12 at 2:41a
posts20
users4
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase