0x00dd 00221 (main.go:37) MOVQ DI, CX
0x00e0 00224 (main.go:37) SHRQ $5, DI
0x00e4 00228 (main.go:37) MOVL (AX)(DI*4), R9
0x00e8 00232 (main.go:37) MOVQ CX, R10
0x00eb 00235 (main.go:37) ANDQ $31, CX
0x00ef 00239 (main.go:37) MOVL R8, R11
0x00f2 00242 (main.go:37) SHLL CX, R8
0x00f5 00245 (main.go:37) ORL R8, R9
0x00f8 00248 (main.go:37) MOVL R9, (AX)(DI*4)
0x00fc 00252 (main.go:36) LEAQ 3(R10)(SI*2), DI
0x0101 00257 (main.go:37) MOVL R11, R8
0x0104 00260 (main.go:36) CMPQ DI, DX
0x0107 00263 (main.go:36) JLS $0, 221
It is now almost as fast as C/C++ code, and isn't for the same reasons as explained before: excessively using registers to store things and not using the read/modify/write instruction (which also saves the use of a register).
The current beta will work not too badly with amd64 code but still doesn't use registers efficiently enough to support x86 code as it uses too many register. optimized C/C++ code only uses six or at most 7 registers, which the x86 architecture has, but not the nine registers that the above requires.
So for this tight loop, golang is still slower than optimized C/C++ code, but not by very much if array bounds checks are disabled.
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
For more options, visit https://groups.google.com/d/optout.