You may also find that swapping the order of some of the instructions such as the second and the third in the loop may also reduce backend latency further.
I am not on a high end Intel CPU now, but when I was I found that with a buffer size adjusted to the L1 cache size (8192 32-bit words or 32 Kilobytes) that eratspeed ran on an Intel 2700K @ 3.5 GHz at about 3.5 clock cycles per loop (about 405,000,000 loops for this range).
My current AMD Bulldozer CPU has a severe cache bottleneck and can't come close to this speed by a factor of about two.
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
For more options, visit https://groups.google.com/d/optout.