I am curious.

I have this code that allocates a lot of data when starting up and, then,
it is called to do some processing and at each call it allocates around 50k
(which gets garbage collected eventually as expected).

I wrote a simple test benchmark that loads all data on init() and then, on
the benchmark loop, only calls the method that does the processing
mentioned above.

The benchmark results in around 0.0002 seconds per request. I was trying to
debug it to see if I can improve things but I am a bit confused about the
profiling output.

First, here is how I obtained it:

go test -bench . -benchtime 10s -cpuprofile cpu.out

When looking at the top offenders, I see this:

(pprof) top20
30140ms of 38880ms total (77.52%)
Dropped 140 nodes (cum <= 194.40ms)
Showing top 20 nodes out of 104 (cum >= 360ms)
       flat flat% sum% cum cum%
     6380ms 16.41% 16.41% 12550ms 32.28% runtime.scang
     3660ms 9.41% 25.82% 3660ms 9.41% [REDACTED]
     3200ms 8.23% 34.05% 3200ms 8.23% runtime.cas
     2640ms 6.79% 40.84% 7620ms 19.60% runtime.scanobject
     2070ms 5.32% 46.17% 2070ms 5.32% runtime.atomicor8
     1770ms 4.55% 50.72% 2790ms 7.18% runtime.readgstatus
     1650ms 4.24% 54.96% 3950ms 10.16% runtime.greyobject
     1480ms 3.81% 58.77% 1500ms 3.86% runtime.heapBitsForObject
     1080ms 2.78% 61.55% 1080ms 2.78% runtime.futex
     1030ms 2.65% 64.20% 6110ms 15.72% [REDACTED]
      900ms 2.31% 66.51% 1250ms 3.22% runtime.atomicload
      660ms 1.70% 68.21% 2520ms 6.48% runtime.mallocgc
      630ms 1.62% 69.83% 930ms 2.39% runtime.heapBitsSweepSpan
      530ms 1.36% 71.19% 2970ms 7.64% runtime.findrunnable
      490ms 1.26% 72.45% 510ms 1.31% runtime.newstack
      430ms 1.11% 73.56% 430ms 1.11% runtime.memclr
      410ms 1.05% 74.61% 410ms 1.05% [REDACTED]
      390ms 1.00% 75.62% 2310ms 5.94% runtime.makeslice
      380ms 0.98% 76.59% 4040ms 10.39% [REDACTED]
      360ms 0.93% 77.52% 360ms 0.93% runtime.nop

The [REDACTED] entries are simply references to call into my code but the
vast majority of the time seems to be the actual GC doing its work.

So, my question is: Does it really mean that from those 0.0002 seconds, GC
is taking around 32% of this time? Or is it not really the case as the GC
runs in a separate goroutine/thread? Again, I only allocated around 50 Kb
per processing call (as per benchmem) so I am wondering why would the GC be
doing that much work. Could it simply be an after-effect of loading the
initial data?

Thanks in advance.

You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
postedAug 27, '15 at 10:20a
activeAug 27, '15 at 10:20a

1 user in discussion

Bruno Albuquerque: 1 post



site design / logo © 2022 Grokbase