I have this code that allocates a lot of data when starting up and, then,
it is called to do some processing and at each call it allocates around 50k
(which gets garbage collected eventually as expected).
I wrote a simple test benchmark that loads all data on init() and then, on
the benchmark loop, only calls the method that does the processing
mentioned above.
The benchmark results in around 0.0002 seconds per request. I was trying to
debug it to see if I can improve things but I am a bit confused about the
profiling output.
First, here is how I obtained it:
go test -bench . -benchtime 10s -cpuprofile cpu.out
When looking at the top offenders, I see this:
(pprof) top20
30140ms of 38880ms total (77.52%)
Dropped 140 nodes (cum <= 194.40ms)
Showing top 20 nodes out of 104 (cum >= 360ms)
flat flat% sum% cum cum%
6380ms 16.41% 16.41% 12550ms 32.28% runtime.scang
3660ms 9.41% 25.82% 3660ms 9.41% [REDACTED]
3200ms 8.23% 34.05% 3200ms 8.23% runtime.cas
2640ms 6.79% 40.84% 7620ms 19.60% runtime.scanobject
2070ms 5.32% 46.17% 2070ms 5.32% runtime.atomicor8
1770ms 4.55% 50.72% 2790ms 7.18% runtime.readgstatus
1650ms 4.24% 54.96% 3950ms 10.16% runtime.greyobject
1480ms 3.81% 58.77% 1500ms 3.86% runtime.heapBitsForObject
1080ms 2.78% 61.55% 1080ms 2.78% runtime.futex
1030ms 2.65% 64.20% 6110ms 15.72% [REDACTED]
900ms 2.31% 66.51% 1250ms 3.22% runtime.atomicload
660ms 1.70% 68.21% 2520ms 6.48% runtime.mallocgc
630ms 1.62% 69.83% 930ms 2.39% runtime.heapBitsSweepSpan
530ms 1.36% 71.19% 2970ms 7.64% runtime.findrunnable
490ms 1.26% 72.45% 510ms 1.31% runtime.newstack
430ms 1.11% 73.56% 430ms 1.11% runtime.memclr
410ms 1.05% 74.61% 410ms 1.05% [REDACTED]
390ms 1.00% 75.62% 2310ms 5.94% runtime.makeslice
380ms 0.98% 76.59% 4040ms 10.39% [REDACTED]
360ms 0.93% 77.52% 360ms 0.93% runtime.nop
The [REDACTED] entries are simply references to call into my code but the
vast majority of the time seems to be the actual GC doing its work.
So, my question is: Does it really mean that from those 0.0002 seconds, GC
is taking around 32% of this time? Or is it not really the case as the GC
runs in a separate goroutine/thread? Again, I only allocated around 50 Kb
per processing call (as per benchmem) so I am wondering why would the GC be
doing that much work. Could it simply be an after-effect of loading the
initial data?
Thanks in advance.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.