golang version 1.7beta1 does indeed help, and the time is now not much worse than C#/Java, but still not as good as C/C++ due to the single array bounds check:

Using the same procedure to obtain an assembly listing (go tool compiler -S PrimeTest.go > PrimeTest.s):

line 36 for j := (p*p - 3) >> 1; j <= lmtndx; j += p {
line 37 cmpsts[j>>5] |= 1 << (j & 31)
line 38 }

  0x00f1 00241 (Main.go:37) MOVQ R8, CX ;; move 'j' in r8 to r
  0x00f4 00244 (Main.go:37) SHRQ $5, R8 ;; shift cx right by 5 to get word address
  0x00f8 00248 (Main.go:37) CMPQ R8, DX ;; array bounds check to array length stored in dx
  0x00fb 00251 (Main.go:37) JCC $0, 454 ;; panic if fail bounds check
  0x0101 00257 (Main.go:37) MOVL (AX)(R8*4), R10 ;; get element to r10 in one step
  0x0105 00261 (Main.go:37) MOVQ CX, R11 ;; save 'j' for later in r11
  0x0108 00264 (Main.go:37) ANDQ $31, CX ;; leave 'j' & 31 in cx
  0x010c 00268 (Main.go:37) MOVL R9, R12 ;; save r9 to r12 to preserve the 1 it contains - WHY NOT JUST MAKE R12 CONTAIN 1 AT ALL TIMES IF USING IT IS QUICKER THAN AN IMMEDIATE LOAD
  0x010f 00271 (Main.go:37) SHLL CX, R9 ;; R9 SHOULD JUST BE LOADED WITH 1 ABOVE - now cx contains 1 << ('j' & 31)
  0x0112 00274 (Main.go:37) ORL R10, R9 ;; r9 contains cmpsts[j >> 5] | (1 << ('j' & 31)) - the bit or is done here
  0x0115 00277 (Main.go:37) MOVL R9, (AX)(R8*4) ;; element now contains the modified value
  0x0119 00281 (Main.go:36) LEAQ 3(R11)(DI*2), R8 ;; tricky way to calculate 'j' + 2 * 'j' + 3 where 2 * 'j' + 3 is p, answer to r8, saves a register
  0x011e 00286 (Main.go:37) MOVL R12, R9 ;; RESTORE R9 FROM R12 - SHOULD NOT BE NECESSARY, but doesn't really cost in time as CPU is waiting for results of LEAQ operation
  0x0121 00289 (Main.go:36) CMPQ R8, BX ;; check if 'j' in r8 is up to limit stored in bx
  0x0124 00292 (Main.go:36) JLS $0, 241 ;; loop if not complete

This is much better than the 1.6.2 code in that it no longer does the array bounds check twice, although there is still the minor use of an extra r12 register used to store 1 instead of using an immediate load of 1 into the r9 register as above, where it could have been used to store 'p' to save a slight amount of time instead of the tricky code to calculate 'p' (quickly) every loop (the tricky bit is still about a half cycle slower than just using a pre-calculated 'p' value). The C/C++ code will still be quicker, mainly because of no array bounds check for a couple of CPU clock cycles, but also because it is more efficient to use the single read/modify/write version of the ORL instruction instead of MOVL from the array element to a register, ORL with the bit modifier, then MOVL from the register back to the array element. It seems it is now almost trying too hard to save registers at the cost of time in the tricky 'p' calculation, but costing registers for no gain or an actual loss in saving the 1 to a register.

So it is good to see that golang compiler optimization is taking some steps forward, but it isn't quite there yet.

You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 27 | next ›
Discussion Overview
groupgolang-nuts @
postedJun 14, '16 at 12:31p
activeJun 18, '16 at 10:54a



site design / logo © 2021 Grokbase