FAQ
Reviewers: golang-dev_googlegroups.com,

Message:
Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com),

I'd like you to review this change to
https://go.googlecode.com/hg/


Description:
cmd/gc, cmd/6g: add division rewrite to walk pass.

This allows 5g and 8g to benefit from the rewrite as shifts.
The shift optimization is removed from 6g but the magic
multiply is not yet implemented in gc.

Update issue 2230.

Please review this at http://codereview.appspot.com/6819123/

Affected files:
M src/cmd/6g/ggen.c
M src/cmd/gc/walk.c

Search Discussions

  • Remyoudompheng at Nov 11, 2012 at 11:20 am
    Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6819123/
  • Remyoudompheng at Nov 11, 2012 at 11:24 am
    Hello golang-dev@googlegroups.com (cc: golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6819123/
  • Remyoudompheng at Nov 11, 2012 at 11:25 am
    The latest patch adds support for the /= %= case.
    Results on benchamrks from CL 6819087

    Before (arm):
    BenchmarkDiv1 20000000 25.3 ns/op
    BenchmarkDiv2 20000000 38.2 ns/op
    BenchmarkDiv4 20000000 38.2 ns/op
    BenchmarkMod1 20000000 26.5 ns/op
    BenchmarkMod2 20000000 36.5 ns/op
    BenchmarkMod4 20000000 36.5 ns/op

    After (arm):
    BenchmarkDiv1 500000000 1.77 ns/op
    BenchmarkDiv2 500000000 2.65 ns/op
    BenchmarkDiv4 100000000 3.53 ns/op
    BenchmarkMod1 500000000 2.06 ns/op
    BenchmarkMod2 100000000 5.29 ns/op
    BenchmarkMod4 100000000 7.64 ns/op


    http://codereview.appspot.com/6819123/
  • Dave Cheney at Nov 11, 2012 at 12:03 pm
    Wow.
    On 11/11/2012, at 22:25, remyoudompheng@gmail.com wrote:

    The latest patch adds support for the /= %= case.
    Results on benchamrks from CL 6819087

    Before (arm):
    BenchmarkDiv1 20000000 25.3 ns/op
    BenchmarkDiv2 20000000 38.2 ns/op
    BenchmarkDiv4 20000000 38.2 ns/op
    BenchmarkMod1 20000000 26.5 ns/op
    BenchmarkMod2 20000000 36.5 ns/op
    BenchmarkMod4 20000000 36.5 ns/op

    After (arm):
    BenchmarkDiv1 500000000 1.77 ns/op
    BenchmarkDiv2 500000000 2.65 ns/op
    BenchmarkDiv4 100000000 3.53 ns/op
    BenchmarkMod1 500000000 2.06 ns/op
    BenchmarkMod2 100000000 5.29 ns/op
    BenchmarkMod4 100000000 7.64 ns/op


    http://codereview.appspot.com/6819123/
  • Rémy Oudompheng at Nov 11, 2012 at 12:15 pm
    On 386 (actually a Core Quad with GOARCH=386)

    benchmark old ns/op new ns/op delta
    BenchmarkDiv1 7 0.43 -93.97%
    BenchmarkDiv2 7 1.79 -75.41%
    BenchmarkDiv4 7 2.71 -61.99%
    BenchmarkDiv10 7 7 -2.47%
    BenchmarkMod1 7 0.86 -88.79%
    BenchmarkMod2 7 3.69 -51.19%
    BenchmarkMod4 7 3.90 -49.28%
    BenchmarkMod10 7 7 -1.33%
  • Remyoudompheng at Nov 11, 2012 at 1:08 pm
    Hello golang-dev@googlegroups.com, dave@cheney.net (cc:
    golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6819123/
  • Remyoudompheng at Nov 11, 2012 at 1:17 pm
    Added unsigned magic multiply treatment:
    * 32-bit case is suboptimal because it needs a 32*32->64 bit multiply
    for which we don't have a portable AST representation (other than uint64
    product) but would be faster (64bit*64bit needs 3 MUL, 32*32->64 needs
    one MUL).
    * 64-bit case is not implemented and left to 6g (it would need a 128-bit
    value).

    In the benchmarks (UDiv is uint32, UDivh is uint16):

    386
    benchmark old ns/op new ns/op delta
    BenchmarkUDiv10 6 3 -54.78%
    BenchmarkUDiv123456 6 6 +0.29%
    BenchmarkUDivh10 7 2 -67.00%
    BenchmarkUDivh123 7 3 -50.25%

    arm
    benchmark old ns/op new ns/op delta
    BenchmarkUDiv10 43 10 -74.65%
    BenchmarkUDiv123456 37 13 -64.10%
    BenchmarkUDivh10 38 9 -74.40%
    BenchmarkUDivh123 38 13 -64.66%

    5g generates codes like this for 16-bit division (R3 /= 10).

    0572 (divmul_test.go:129) MOVHU R3,R0
    0573 (divmul_test.go:129) MOVHU R0,R1
    0574 (divmul_test.go:129) MOVW R1,R0
    0575 (divmul_test.go:129) MOVW $52429,R1
    0576 (divmul_test.go:129) MULU R1,R0,R0
    0577 (divmul_test.go:129) MOVW R0>>19,R0
    0578 (divmul_test.go:129) MOVHU R0,R1
    0579 (divmul_test.go:129) MOVHU R1,R3

    It could be simplified to
    MOVHU R3, R0
    MOVW $52429, R1
    MULU R1, R0, R0
    MOVW R0>>19, R0
    MOVHU R0, R3


    http://codereview.appspot.com/6819123/
  • Remyoudompheng at Nov 11, 2012 at 1:43 pm
    Hello golang-dev@googlegroups.com, dave@cheney.net (cc:
    golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6819123/
  • Remyoudompheng at Nov 11, 2012 at 1:43 pm
    Added the signed magic multiply. Feel free to benchmark. Adding support
    for constants in cgen64.c for OMUL could be useful to improve further.


    http://codereview.appspot.com/6819123/
  • Dave at Nov 11, 2012 at 10:51 pm
    Some minor suggestions. I don't feel qualified to review the gc changes,
    apart from observing the benchmark numbers are impressive.

    pando(~/go/test/bench/go) % ~/go/misc/benchcmp {old,new}.txt
    benchmark old ns/op new ns/op delta
    BenchmarkDiv1 43 3 -92.68%
    BenchmarkDiv2 65 4 -93.07%
    BenchmarkDiv3 65 21 -67.80%
    BenchmarkDiv4 65 6 -90.79%
    BenchmarkDiv10 65 24 -63.17%
    BenchmarkUDiv1 45 3 -93.34%
    BenchmarkUDiv2 65 3 -94.61%
    BenchmarkUDiv3 65 17 -73.77%
    BenchmarkUDiv4 65 3 -94.61%
    BenchmarkUDiv10 65 17 -73.77%
    BenchmarkMod1 45 3 -92.23%
    BenchmarkMod2 62 9 -85.49%
    BenchmarkMod3 62 25 -59.71%
    BenchmarkMod4 62 13 -79.61%
    BenchmarkMod10 62 29 -53.24%



    https://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c
    File src/cmd/gc/walk.c (right):

    https://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c#newcode894
    src/cmd/gc/walk.c:894: * on 386, rewrite float ops into l = l op r.
    should this be 386/arm ?

    https://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c#newcode961
    src/cmd/gc/walk.c:961: }
    why not move the block into this switch then push the default down to
    the bottom of the switch and make goto ret; unconditional.

    https://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c#newcode2899
    src/cmd/gc/walk.c:2899: // by a constant
    nit: maybe move this comment to like 2908

    https://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c#newcode2910
    src/cmd/gc/walk.c:2910: // just sign bit
    i know this came from 6g, but can the comment be improved ? ie.

    // nothing to do because divisor is larger than operand ?

    https://codereview.appspot.com/6819123/
  • Michael Jones at Nov 11, 2012 at 11:19 pm
    This is all awesome, but, don't we already know that some of these are more
    than special...

    div 1 means noop
    div 2 means shift by 1
    div 4 means shift by 2
    mod 1 means 0
    mod 2 means x & 1
    mod 4 means x & 3
    On Sun, Nov 11, 2012 at 5:51 PM, wrote:

    Some minor suggestions. I don't feel qualified to review the gc changes,
    apart from observing the benchmark numbers are impressive.

    pando(~/go/test/bench/go) % ~/go/misc/benchcmp {old,new}.txt

    benchmark old ns/op new ns/op delta
    BenchmarkDiv1 43 3 -92.68%
    BenchmarkDiv2 65 4 -93.07%
    BenchmarkDiv3 65 21 -67.80%
    BenchmarkDiv4 65 6 -90.79%
    BenchmarkDiv10 65 24 -63.17%
    BenchmarkUDiv1 45 3 -93.34%
    BenchmarkUDiv2 65 3 -94.61%
    BenchmarkUDiv3 65 17 -73.77%
    BenchmarkUDiv4 65 3 -94.61%
    BenchmarkUDiv10 65 17 -73.77%
    BenchmarkMod1 45 3 -92.23%
    BenchmarkMod2 62 9 -85.49%
    BenchmarkMod3 62 25 -59.71%
    BenchmarkMod4 62 13 -79.61%
    BenchmarkMod10 62 29 -53.24%



    https://codereview.appspot.**com/6819123/diff/13001/src/**cmd/gc/walk.c<https://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c>
    File src/cmd/gc/walk.c (right):

    https://codereview.appspot.**com/6819123/diff/13001/src/**
    cmd/gc/walk.c#newcode894<https://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c#newcode894>
    src/cmd/gc/walk.c:894: * on 386, rewrite float ops into l = l op r.
    should this be 386/arm ?

    https://codereview.appspot.**com/6819123/diff/13001/src/**
    cmd/gc/walk.c#newcode961<https://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c#newcode961>
    src/cmd/gc/walk.c:961: }
    why not move the block into this switch then push the default down to
    the bottom of the switch and make goto ret; unconditional.

    https://codereview.appspot.**com/6819123/diff/13001/src/**
    cmd/gc/walk.c#newcode2899<https://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c#newcode2899>
    src/cmd/gc/walk.c:2899: // by a constant
    nit: maybe move this comment to like 2908

    https://codereview.appspot.**com/6819123/diff/13001/src/**
    cmd/gc/walk.c#newcode2910<https://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c#newcode2910>
    src/cmd/gc/walk.c:2910: // just sign bit
    i know this came from 6g, but can the comment be improved ? ie.

    // nothing to do because divisor is larger than operand ?

    https://codereview.appspot.**com/6819123/<https://codereview.appspot.com/6819123/>


    --
    Michael T. Jones | Chief Technology Advocate | mtj@google.com | +1
    650-335-5765
  • Remyoudompheng at Nov 12, 2012 at 12:38 am
    Hello golang-dev@googlegroups.com, dave@cheney.net, mtj@google.com (cc:
    golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6819123/
  • Dave at Nov 12, 2012 at 4:13 am
    There may be an error with patch set #8. ./all.bash fails the second
    phase with

    # runtime/cgo
    cannot load imported symbols from ELF file
    $WORK/runtime/cgo/_obj/_cgo_.o: length of symbol section is not a
    multiple of SymSize




    https://codereview.appspot.com/6819123/
  • Remyoudompheng at Nov 12, 2012 at 6:55 am
    http://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c
    File src/cmd/gc/walk.c (right):

    http://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c#newcode894
    src/cmd/gc/walk.c:894: * on 386, rewrite float ops into l = l op r.
    I don't know.

    http://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c#newcode961
    src/cmd/gc/walk.c:961: }
    It made the diff smaller. but it's weird indeed.

    http://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c#newcode2899
    src/cmd/gc/walk.c:2899: // by a constant
    Makes a good description for the function.

    http://codereview.appspot.com/6819123/diff/13001/src/cmd/gc/walk.c#newcode2910
    src/cmd/gc/walk.c:2910: // just sign bit
    i'm not even sure what it means. We could add a special treatment here.

    http://codereview.appspot.com/6819123/
  • Remyoudompheng at Nov 12, 2012 at 6:56 am
    Hello golang-dev@googlegroups.com, dave@cheney.net, mtj@google.com (cc:
    golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6819123/
  • Dave at Nov 12, 2012 at 7:34 am
    Thank you, patch set #9 works well.

    linux/arm:

    benchmark old ns/op new ns/op delta
    BenchmarkBinaryTree17 45683930000 46160431000 +1.04%
    BenchmarkFannkuch11 31897766000 32261871000 +1.14%
    BenchmarkGobDecode 114538550 116271950 +1.51%
    BenchmarkGobEncode 57059940 55352780 -2.99%
    BenchmarkGzip 5586822000 5547333000 -0.71%
    BenchmarkGunzip 1000793000 999066200 -0.17%
    BenchmarkJSONEncode 674365200 660949600 -1.99%
    BenchmarkJSONDecode 1574219000 1392120000 -11.57%
    BenchmarkMandelbrot200 45697620 45659180 -0.08%
    BenchmarkParse 55767820 56271380 +0.90%
    BenchmarkRevcomp 118286100 129815600 +9.75%
    BenchmarkTemplate 1715119000 1657440000 -3.36%

    benchmark old MB/s new MB/s speedup
    BenchmarkGobDecode 6.70 6.60 0.99x
    BenchmarkGobEncode 13.45 13.87 1.03x
    BenchmarkGzip 3.47 3.50 1.01x
    BenchmarkGunzip 19.39 19.42 1.00x
    BenchmarkJSONEncode 2.88 2.94 1.02x
    BenchmarkJSONDecode 1.23 1.39 1.13x
    BenchmarkParse 1.04 1.03 0.99x
    BenchmarkRevcomp 21.49 19.58 0.91x
    BenchmarkTemplate 1.13 1.17 1.04x



    https://codereview.appspot.com/6819123/
  • Remyoudompheng at Nov 13, 2012 at 6:51 am
    I'm not sure why JSONDecode moves. Does it use division or is it an
    effect of stack size changes?

    http://codereview.appspot.com/6819123/
  • Dave Cheney at Nov 13, 2012 at 6:53 am
    I'll add in some debugging and see.

    Also, https://codereview.appspot.com/6842045/
    On Tue, Nov 13, 2012 at 5:51 PM, wrote:
    I'm not sure why JSONDecode moves. Does it use division or is it an
    effect of stack size changes?

    http://codereview.appspot.com/6819123/
  • Dave at Nov 13, 2012 at 9:20 pm
    here is an additional datapoint from linux/386

    220887(~/go/test/bench/go1) % ~/go/misc/benchcmp {old,new}.txt
    benchmark old ns/op new ns/op delta
    BenchmarkBinaryTree17 2147483647 2147483647 +0.24%
    BenchmarkFannkuch11 2147483647 2147483647 -0.84%
    BenchmarkGobDecode 108116600 108685200 +0.53%
    BenchmarkGobEncode 45225340 46541140 +2.91%
    BenchmarkGzip 2147483647 2147483647 +0.30%
    BenchmarkGunzip 603468800 600955000 -0.42%
    BenchmarkJSONEncode 421999200 765123600 +81.31%
    BenchmarkJSONDecode 1043933000 1255016000 +20.22%
    BenchmarkMandelbrot200 54298660 54281840 -0.03%
    BenchmarkParse 37363320 37803220 +1.18%
    BenchmarkRevcomp 2147483647 2147483647 +3.66%
    BenchmarkTemplate 1478883000 1601209000 +8.27%

    benchmark old MB/s new MB/s speedup
    BenchmarkGobDecode 7.10 7.06 0.99x
    BenchmarkGobEncode 16.97 16.49 0.97x
    BenchmarkGzip 5.83 5.81 1.00x
    BenchmarkGunzip 32.16 32.29 1.00x
    BenchmarkJSONEncode 4.60 2.54 0.55x
    BenchmarkJSONDecode 1.86 1.55 0.83x
    BenchmarkParse 1.55 1.53 0.99x
    BenchmarkRevcomp 52.19 50.35 0.96x
    BenchmarkTemplate 1.31 1.21 0.92x

    Perhaps walkdiv should be skipped for 386 ?

    https://codereview.appspot.com/6819123/
  • Dave at Nov 15, 2012 at 11:37 am
    patch set 11 fails for me on linux/386

    220887(~/go/src) % go run ~/go/test/ken/modconst.go
    u8 121 121 60 1
    panic: fail

    goroutine 1 [running]:
    main.u8test(0x23c7979, 0xb0bff12)
    /home/dfc/go/test/ken/modconst.go:578 +0xee
    main.u8run()
    /home/dfc/go/test/ken/modconst.go:613 +0x313
    main.main()
    /home/dfc/go/test/ken/modconst.go:630 +0x3f

    goroutine 2 [syscall]:
    created by runtime.main
    /home/dfc/go/src/pkg/runtime/proc.c:225
    exit status 2

    https://codereview.appspot.com/6819123/
  • Dave at Nov 15, 2012 at 11:45 am
    The previous comment notwithstanding, the 386 regression is gone in
    patchset 11

    220887(~/go/test/bench/go1) % ~/go/misc/benchcmp {old,new}.txt
    benchmark old ns/op new ns/op delta
    BenchmarkBinaryTree17 2147483647 2147483647 -0.06%
    BenchmarkFannkuch11 2147483647 2147483647 -0.11%
    BenchmarkGobDecode 107863800 107488150 -0.35%
    BenchmarkGobEncode 46741000 45501680 -2.65%
    BenchmarkGzip 2147483647 2147483647 -0.11%
    BenchmarkGunzip 601875800 602400400 +0.09%
    BenchmarkJSONEncode 726187800 722774600 -0.47%
    BenchmarkJSONDecode 1154932000 1045558000 -9.47%
    BenchmarkMandelbrot200 54279720 54285060 +0.01%
    BenchmarkParse 38234860 36985280 -3.27%
    BenchmarkRevcomp 2147483647 2147483647 -3.04%
    BenchmarkTemplate 1498506000 1470787000 -1.85%

    benchmark old MB/s new MB/s speedup
    BenchmarkGobDecode 7.12 7.14 1.00x
    BenchmarkGobEncode 16.42 16.87 1.03x
    BenchmarkGzip 5.83 5.84 1.00x
    BenchmarkGunzip 32.24 32.21 1.00x
    BenchmarkJSONEncode 2.67 2.68 1.00x
    BenchmarkJSONDecode 1.68 1.86 1.11x
    BenchmarkParse 1.51 1.57 1.04x
    BenchmarkRevcomp 50.79 52.39 1.03x
    BenchmarkTemplate 1.29 1.32 1.02x

    https://codereview.appspot.com/6819123/
  • Remyoudompheng at Nov 15, 2012 at 3:45 pm
    Sorry, the latest patchset is a work in progress and it misses ARM
    support. It cannot work properly due to a bug in 8l. See CL 6846057.

    I'll upload a complete patch later.

    http://codereview.appspot.com/6819123/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedNov 11, '12 at 11:02a
activeNov 15, '12 at 3:45p
posts23
users3
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase