machine) than in serial, so the concurrency seems to be pretty effective.
However, when I look at the CPU output from go tool pprof, the top couple
of items returned runtime waiting.
I don't understand how that can be. How can I have almost linear speedup
while half the time is spent waiting?
Parallel output
(pprof) top10
87.37s of 92.98s total (93.97%)
Dropped 215 nodes (cum <= 0.46s)
Showing top 10 nodes out of 73 (cum >= 0.91s)
flat flat% sum% cum cum%
54.94s 59.09% 59.09% 54.94s 59.09% runtime.mach_semaphore_wait
18.35s 19.74% 78.82% 18.35s 19.74% runtime.mach_semaphore_timedwait
10.92s 11.74% 90.57% 10.92s 11.74% runtime.usleep
1.84s 1.98% 92.55% 1.84s 1.98% runtime.mach_semaphore_signal
Serial Output
(pprof) top5
18.96s of 79.70s total (23.79%)
Dropped 153 nodes (cum <= 0.40s)
Showing top 5 nodes out of 124 (cum >= 7.54s)
flat flat% sum% cum cum%
5.92s 7.43% 7.43% 11.94s 14.98% runtime.mallocgc
3.84s 4.82% 12.25% 11.38s 14.28% runtime.makeslice
3.46s 4.34% 16.59% 10.77s 13.51%
github.com/gonum/lapack/native.Implementation.Dlatrs
3.32s 4.17% 20.75% 3.32s 4.17%
github.com/gonum/blas/native.Implementation.Dtrsv
2.42s 3.04% 23.79% 7.54s 9.46% runtime.newarray
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.