FAQ
Reviewers: remyoudompheng, minux, rsc,

Message:
Hello remyoudompheng@gmail.com, minux.ma@gmail.com, rsc@golang.org (cc:
golang-dev@googlegroups.com),

I'd like you to review this change to
https://go.googlecode.com/hg/


Description:
cmd/5g: use MOVB for fixed array nil check

Fixes issue 4396.

For fixed arrays larger than the unmapped page, agenr would general a
nil check by loading the first word of the array. However there is no
requirement for the first element of a byte array to be word aligned, so
this check causes a trap on ARMv5 hardware (ARMv6 since relaxed that
restriction, but it probably still comes at a cost).

Switching the check to MOVB ensures alignment is not an issue. This
check is only invoked in a few places in the code where large fixed
arrays are embedded into structs, compress/lzw is the biggest offender,
and switching to MOVB has no observable performance penalty.

Thanks to Rémy and Daniel Morsing for helping me debug this on IRC last
night.

Please review this at http://codereview.appspot.com/6854063/

Affected files:
M src/cmd/5g/cgen.c


Index: src/cmd/5g/cgen.c
===================================================================
--- a/src/cmd/5g/cgen.c
+++ b/src/cmd/5g/cgen.c
@@ -951,7 +951,7 @@
if(isfixedarray(nl->type) && nl->type->width >= unmappedzero) {
regalloc(&n4, types[tptr], N);
gmove(&n3, &n4);
- p1 = gins(AMOVW, &n4, &n4);
+ p1 = gins(AMOVB, &n4, &n4);
p1->from.type = D_OREG;
p1->from.offset = 0;
regfree(&n4);

Search Discussions

  • Remyoudompheng at Nov 17, 2012 at 10:48 pm
    6g uses TESTB for that purpose, so LGTM
    I think 6g calls gins with nodes of the correct type, the current
    sequence in 5g is cheaty.

    http://codereview.appspot.com/6854063/
  • Dave at Nov 18, 2012 at 4:20 am
    PTAL. I tried to make the check less cheaty and in the process it now
    produces one less instruction.

    before:

    d.output[d.o] = uint8(code)
    42ec4: e1a00806 lsl r0, r6, #16
    42ec8: e1a00820 lsr r0, r0, #16
    42ecc: e20010ff and r1, r0, #255 ; 0xff
    42ed0: e20150ff and r5, r1, #255 ; 0xff
    42ed4: e5941000 ldr r1, [r4]
    42ed8: e59fb8c0 ldr fp, [pc, #2240] ; 437a0
    <compress/lzw.(*decoder).decode+0xa00>
    42edc: e794300b ldr r3, [r4, fp]
    42ee0: e5941000 ldr r1, [r4]
    42ee4: e5942000 ldr r2, [r4]
    42ee8: e59fb8b4 ldr fp, [pc, #2228] ; 437a4
    <compress/lzw.(*decoder).decode+0xa04>
    42eec: e084100b add r1, r4, fp
    42ef0: e1d120d0 ldrsb r2, [r1]
    42ef4: e1a00003 mov r0, r3
    42ef8: e3a02a02 mov r2, #8192 ; 0x2000
    42efc: e1530002 cmp r3, r2
    42f00: 3a000000 bcc 42f08
    <compress/lzw.(*decoder).decode+0x168>
    42f04: ebff7d8d bl 22540 <runtime.panicindex>
    42f08: e0811000 add r1, r1, r0
    42f0c: e20500ff and r0, r5, #255 ; 0xff
    42f10: e5c10000 strb r0, [r1]

    after:

    d.output[d.o] = uint8(code)
    42ec4: e1a00807 lsl r0, r7, #16
    42ec8: e1a00820 lsr r0, r0, #16
    42ecc: e20010ff and r1, r0, #255 ; 0xff
    42ed0: e20150ff and r5, r1, #255 ; 0xff
    42ed4: e5941000 ldr r1, [r4]
    42ed8: e59fb8a4 ldr fp, [pc, #2212] ; 43784
    <compress/lzw.(*decoder).decode+0x9e4>
    42edc: e794000b ldr r0, [r4, fp]
    42ee0: e5941000 ldr r1, [r4]
    42ee4: e5942000 ldr r2, [r4]
    42ee8: e59fb898 ldr fp, [pc, #2200] ; 43788
    <compress/lzw.(*decoder).decode+0x9e8>
    42eec: e084100b add r1, r4, fp
    42ef0: e5d13000 ldrb r3, [r1]
    42ef4: e3a02a02 mov r2, #8192 ; 0x2000
    42ef8: e1500002 cmp r0, r2
    42efc: 3a000000 bcc 42f04
    <compress/lzw.(*decoder).decode+0x164>
    42f00: ebff7d8e bl 22540 <runtime.panicindex>
    42f04: e0811000 add r1, r1, r0
    42f08: e20500ff and r0, r5, #255 ; 0xff
    42f0c: e5c10000 strb r0, [r1]

    The important part is ~ 42ef0.

    https://codereview.appspot.com/6854063/
  • Dave at Nov 18, 2012 at 4:21 am
    Hello remyoudompheng@gmail.com, minux.ma@gmail.com, rsc@golang.org (cc:
    golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6854063/
  • Minux Ma at Nov 18, 2012 at 8:21 am
    https://codereview.appspot.com/6854063/diff/2002/src/cmd/5g/cgen.c
    File src/cmd/5g/cgen.c (right):

    https://codereview.appspot.com/6854063/diff/2002/src/cmd/5g/cgen.c#newcode953
    src/cmd/5g/cgen.c:953: gmove(&n3, &n4);
    i think this suffices:
    regalloc(&n4, types[TUINT8], N);
    gmove(&n3, &n4);
    regfree(&n4);
    but this might be regarded as a hack.

    https://codereview.appspot.com/6854063/diff/2002/test/fixedbugs/issue4396.go
    File test/fixedbugs/issue4396.go (right):

    https://codereview.appspot.com/6854063/diff/2002/test/fixedbugs/issue4396.go#newcode21
    test/fixedbugs/issue4396.go:21: g [4096]uint8
    i think you can slightly simplify this struct to:
    struct {
    _ uint32 // force &a[0] to align to 4-byte boundary
    a [4095]byte // &b[0] won't ever be aligned to 4-byte
    b [4096]byte
    }

    https://codereview.appspot.com/6854063/
  • Dave Cheney at Nov 18, 2012 at 8:24 am
    Thanks for the suggestion, I'll try it.

    As for the test, the types come from lzw.decoder, I should probably mention that in the comment
    On 18/11/2012, at 19:15, minux.ma@gmail.com wrote:


    https://codereview.appspot.com/6854063/diff/2002/src/cmd/5g/cgen.c
    File src/cmd/5g/cgen.c (right):

    https://codereview.appspot.com/6854063/diff/2002/src/cmd/5g/cgen.c#newcode953
    src/cmd/5g/cgen.c:953: gmove(&n3, &n4);
    i think this suffices:
    regalloc(&n4, types[TUINT8], N);
    gmove(&n3, &n4);
    regfree(&n4);
    but this might be regarded as a hack.

    https://codereview.appspot.com/6854063/diff/2002/test/fixedbugs/issue4396.go
    File test/fixedbugs/issue4396.go (right):

    https://codereview.appspot.com/6854063/diff/2002/test/fixedbugs/issue4396.go#newcode21
    test/fixedbugs/issue4396.go:21: g [4096]uint8
    i think you can slightly simplify this struct to:
    struct {
    _ uint32 // force &a[0] to align to 4-byte boundary
    a [4095]byte // &b[0] won't ever be aligned to 4-byte
    b [4096]byte
    }

    https://codereview.appspot.com/6854063/
  • Dave at Nov 18, 2012 at 9:57 am
    PTAL


    https://codereview.appspot.com/6854063/diff/2002/src/cmd/5g/cgen.c
    File src/cmd/5g/cgen.c (right):

    https://codereview.appspot.com/6854063/diff/2002/src/cmd/5g/cgen.c#newcode953
    src/cmd/5g/cgen.c:953: gmove(&n3, &n4);
    On 2012/11/18 08:15:41, minux wrote:
    i think this suffices:
    regalloc(&n4, types[TUINT8], N);
    gmove(&n3, &n4);
    regfree(&n4);
    I do not think that is sufficient as it just computes the address of
    a[0], then masks it to a UINT8

    42ee8: e59fb8b4 ldr fp, [pc, #2228] ; 437a4
    <compress/lzw.(*decoder).decode+0xa04>
    42eec: e084100b add r1, r4, fp
    42ef0: e20120ff and r2, r1, #255 ; 0xff
    42ef4: e1a00003 mov r0, r3
    42ef8: e3a02a02 mov r2, #8192 ; 0x2000
    42efc: e1530002 cmp r3, r2

    https://codereview.appspot.com/6854063/
  • Minux Ma at Nov 18, 2012 at 10:21 am
    you might also need to change gsubr.c
    ine 1196 and 1969.

    On 2012/11/18 09:57:01, dfc wrote:
    https://codereview.appspot.com/6854063/diff/2002/src/cmd/5g/cgen.c#newcode953
    src/cmd/5g/cgen.c:953: gmove(&n3, &n4);
    On 2012/11/18 08:15:41, minux wrote:
    i think this suffices:
    regalloc(&n4, types[TUINT8], N);
    gmove(&n3, &n4);
    regfree(&n4);
    I do not think that is sufficient as it just computes the address of
    a[0], then
    masks it to a UINT8
    oops, you're right. i missed something obvious.


    https://codereview.appspot.com/6854063/diff/2002/src/cmd/5g/cgen.c
    File src/cmd/5g/cgen.c (right):

    https://codereview.appspot.com/6854063/diff/2002/src/cmd/5g/cgen.c#newcode703
    src/cmd/5g/cgen.c:703: p1 = gins(AMOVW, &n1, &n1);
    you also need to change here?

    https://codereview.appspot.com/6854063/diff/2002/src/cmd/5g/cgen.c#newcode784
    src/cmd/5g/cgen.c:784: p1 = gins(AMOVW, &n1, &n1);
    and here.

    https://codereview.appspot.com/6854063/
  • Remyoudompheng at Nov 18, 2012 at 10:30 am
    It might be useful to use checkref (defined in gsubr.c) to the task.

    https://codereview.appspot.com/6854063/
  • Dave Cheney at Nov 18, 2012 at 11:07 am
    I'd like to leave this proposal as is for the moment, so I can enable
    alignment traps on the arm5 builder, then take it from there.
    On Sun, Nov 18, 2012 at 9:30 PM, wrote:
    It might be useful to use checkref (defined in gsubr.c) to the task.

    https://codereview.appspot.com/6854063/
  • Remyoudompheng at Nov 24, 2012 at 7:43 pm
    http://codereview.appspot.com/6854063/diff/5003/src/cmd/5g/cgen.c
    File src/cmd/5g/cgen.c (right):

    http://codereview.appspot.com/6854063/diff/5003/src/cmd/5g/cgen.c#newcode957
    src/cmd/5g/cgen.c:957: regalloc(&tmp, types[TUINT8], N);
    i would use &n4 as third argument here so that we need one register
    instead of 2.

    Then the op=INDREG part must move below the regalloc.

    http://codereview.appspot.com/6854063/
  • Dave at Nov 25, 2012 at 11:59 am
    https://codereview.appspot.com/6854063/diff/5003/src/cmd/5g/cgen.c
    File src/cmd/5g/cgen.c (right):

    https://codereview.appspot.com/6854063/diff/5003/src/cmd/5g/cgen.c#newcode957
    src/cmd/5g/cgen.c:957: regalloc(&tmp, types[TUINT8], N);
    On 2012/11/24 19:43:45, remyoudompheng wrote:
    i would use &n4 as third argument here so that we need one register
    instead of
    2.
    Then the op=INDREG part must move below the regalloc.
    I think that gets us back to where we started, using gins(AMOVB, ...)

    https://codereview.appspot.com/6854063/
  • Remyoudompheng at Nov 25, 2012 at 12:06 pm
    https://codereview.appspot.com/6854063/diff/5003/src/cmd/5g/cgen.c
    File src/cmd/5g/cgen.c (right):

    https://codereview.appspot.com/6854063/diff/5003/src/cmd/5g/cgen.c#newcode957
    src/cmd/5g/cgen.c:957: regalloc(&tmp, types[TUINT8], N);
    On 2012/11/25 11:59:15, dfc wrote:
    On 2012/11/24 19:43:45, remyoudompheng wrote:
    i would use &n4 as third argument here so that we need one register
    instead of
    2.

    Then the op=INDREG part must move below the regalloc.
    I think that gets us back to where we started, using gins(AMOVB, ...)
    I am thinking of:

    regalloc(&n4, types[tptr], N);
    regalloc(&tmp, types[TUINT8], &n4);
    n4.op = OINDREG;
    n4.type = types[TUINT8];
    n4.xoffset = 0;
    gmove(&n4, &tmp);

    It's not cheaty like as gins(MOVB, &n4, &n4), but it saves a register.

    https://codereview.appspot.com/6854063/
  • Dave at Nov 25, 2012 at 12:11 pm
    Hello remyoudompheng@gmail.com, minux.ma@gmail.com, rsc@golang.org (cc:
    golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6854063/
  • Remyoudompheng at Nov 25, 2012 at 12:21 pm
    http://codereview.appspot.com/6854063/diff/4002/test/fixedbugs/issue4396.go
    File test/fixedbugs/issue4396.go (right):

    http://codereview.appspot.com/6854063/diff/4002/test/fixedbugs/issue4396.go#newcode24
    test/fixedbugs/issue4396.go:24:
    can you add the following test:

    type T struct {
    U uint16
    V T2
    }

    type T2 struct {
    T [4096]byte
    A, B byte
    }

    var s, t = new(T), new(T)

    func main() {
    var u, v *T2 = &s.V, &t.V
    u.B = v.B
    }

    http://codereview.appspot.com/6854063/
  • Dave at Nov 25, 2012 at 12:38 pm
    Hello remyoudompheng@gmail.com, minux.ma@gmail.com, rsc@golang.org (cc:
    golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6854063/
  • Remyoudompheng at Nov 25, 2012 at 12:52 pm
    http://codereview.appspot.com/6854063/diff/4003/src/cmd/5g/cgen.c
    File src/cmd/5g/cgen.c (right):

    http://codereview.appspot.com/6854063/diff/4003/src/cmd/5g/cgen.c#newcode783
    src/cmd/5g/cgen.c:783: gmove(a, &n1);
    hum, the test should not have passed unless this part is fixed.

    If you want to only fix the arrays part, the previous patch looks good
    to me

    http://codereview.appspot.com/6854063/
  • Rsc at Nov 26, 2012 at 5:28 pm
    LGTM once you're happy about the testing



    https://codereview.appspot.com/6854063/diff/4003/test/fixedbugs/issue4396.go
    File test/fixedbugs/issue4396.go (right):

    https://codereview.appspot.com/6854063/diff/4003/test/fixedbugs/issue4396.go#newcode24
    test/fixedbugs/issue4396.go:24: T [4096]byte
    maybe call this something other than T, like Pad or Buf?

    https://codereview.appspot.com/6854063/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedNov 17, '12 at 10:12p
activeNov 26, '12 at 5:28p
posts18
users4
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase