This does seem a lot more expensive than it should be, at least for such a
simple case. I'm not sure how high priority this is, but go ahead and file
an issue so we can at least keep track of it.
On Wed, Mar 23, 2016 at 8:41 PM, minux wrote:

On Wed, Mar 23, 2016 at 8:35 PM, Austin Clements wrote:

Here's the perf output I get for BenchmarkCgoCall on my laptop:

13.04% cgo cgo [.] runtime/internal/atomic.Cas
10.26% cgo cgo [.] runtime.deferreturn
7.79% cgo cgo [.] runtime.newdefer
6.23% cgo cgo [.] runtime.cgocall
5.86% cgo cgo [.] runtime.systemstack
5.65% cgo cgo [.] runtime.casgstatus
5.55% cgo cgo [.] main.BenchmarkCgoCall
4.87% cgo cgo [.] runtime.reentersyscall
4.71% cgo cgo [.] runtime.freedefer
4.38% cgo cgo [.] runtime/internal/atomic.Store
3.92% cgo cgo [.] main._Cfunc_rand
3.83% cgo cgo [.] runtime.deferproc
3.24% cgo cgo [.] runtime.exitsyscallfast
3.01% cgo cgo [.] runtime.exitsyscall
2.88% cgo cgo [.] runtime.deferproc.func1
1.99% cgo cgo [.] runtime.getcallerpc
1.76% cgo cgo [.] runtime.memmove
1.66% cgo cgo [.] runtime.asmcgocall
1.62% cgo cgo [.] runtime.entersyscall
1.46% cgo cgo [.] runtime.getcallersp
1.41% cgo cgo [.] runtime.unlockOSThread

Less than 2% of BenchmarkCgoCall's time is spent in the actual stack
switch (asmcgocall). Most of the time appears to be spent dealing with the
defer in cgocall (deferreturn, newdefer, systemstack, freedefer,
deferproc). Most of the remaining time is doing atomic CAS and store
operations, which is probably from manipulating scheduler state.

So, optimizing the stack switch is unlikely to help. However, it's
probably possible to improve the overhead from the defer. It may also be
possible to improve the overhead from scheduler interaction, but that's
probably harder to do without changing semantics.
Indeed, if I remove the "defer endcgo(mp)" and call endcgo(mp) to the
end of the function, the cgocall time is reduced from 144ns/op to
(We can't just remove the defer this way though, it will break
with Go->C->Go call sequence.)

Why is defer this slow? This benchmark:
func defers() (r int) {
defer func() {
r = 42
return 0
func BenchmarkDefer(b *testing.B) {
for i := 0; i < b.N; i++ {

Showed that calling defers() uses 77.7ns/op on my system. Should I file an
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 11 of 13 | next ›
Discussion Overview
groupgolang-dev @
postedMar 23, '16 at 9:32p
activeMar 24, '16 at 8:34a



site design / logo © 2022 Grokbase