Modern x86 CPUs don't work like that.
In general, optimally scheduled assembly code which uses more registers has higher performance than optimally scheduled assembly code which uses smaller number of registers. Assuming both assembly codes correspond to the same source code.
Register renaming: since Intel Pentium Pro and AMD K5.
Suggestion for reading: http://www.agner.org/optimize/microarchitecture.pdf
An excerpt from the above PDF document (Section 10 about Haswell and Broadwell pipeline): "... the register file has 168 integer registers and 168 vector registers ..."
I am aware of all of the above and have already read Agner Fogg's publications. In addition modern CPU's do Out of Order Execution (OOE) so rearrange the instructions to best reduce instruction latencies and increase throughput given that there are parallel execution pipelines and ahead-of-time execution, so the actual execution order is almost certainly not as per the assembly listing.

Yes, both assembly listings are from the same tight loop code, but the "C/C++" one has been converted from another assembly format to the golang assembly format.

Daniel Bernstein, the author of "primegen" wrote for the Pentium 3 in x86 (32-bit) code, as the Pentium Pro processor wasn't commonly available at that time and 64-bit code didn't exist. His hand optimized C code for the Sieve of Eratosthenes ("eratspeed.c" in the "doit()" function for the "while k < B loop") uses six registers for this inner culling loop being discussed, and takes about 3.5 CPU clock cycles per loop on a modern CPU (Haswell).

The number of internal CPU registers actually used by the CPU to effect OOE is beside the point, as they have to do with the CPU's internal optimizations and not compiler optimizations; my point is that the compiler's incorrect use of registers still costs time.

While I don't expect golang, with its philosophy of preserving "safe" paradigms in doing array bounds checks by default, to run as fast as C/C++ code that doesn't have that philosophy, I do expect it to run at least as fast as C#/Java code which are Just In Time (JIT) compiled and do have the "safe" philosophy. The golang compiler version 1.7beta1 is not quite there yet for the indicated reasons: inconsistent use of registers, using one too many in one place in order to avoid an immediate load which doesn't cost any execution time, and saving one register by use of the "trick" which does cost execution time as compared to the use of a single register.

However, there is hope as version 1.7 has made great advances since version 1.6; surely version 1.8, which is said to intend to improve this further will be faster yet. At any rate, version 1.7 speed is "adequate" for many purposes as at least it comes close (probably within about 15% to 20% or less) of C#/Java speed in many of the most demanding tight loop algorithms, and thus is quite usable as compared to previous versions. But even the most avid golang protagonists must admit that it isn't the language to re-write Kim Walisch's "primesieve" with its extreme loop unrolling that takes an average of about 1.4 CPU clock cycles per composite number cull for small ranges of primes, as even with array bounds checking turned off, golang would still take at least twice and more likely three times as long.

That is also why I first started posting to this thread: the only reason the golang version of "primegen" is reasonably comparable in speed to C/C++ "primegen" is that it uses multi-threading on a multi-core processor, which weren't available to Daniel Bernstein when he wrote "primegen". My point was one should compare like with like.

You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 15 of 27 | next ›
Discussion Overview
groupgolang-nuts @
postedJun 14, '16 at 12:31p
activeJun 18, '16 at 10:54a



site design / logo © 2021 Grokbase