FAQ
I'm new to Go but here's my best guess: Each worker() pauses after each
request to get a lock on a counter and increment it. This causes
contention. Instead, have them sum their request count locally and write
it to a channel when they exit. (Oh, and they have to exit, too. :) Spawn
a separate goroutine to read the channel and update the counter.
I've tried this version: https://gist.github.com/3550291 . It shows similar
performance: it is still ~40% slower on 2-cores machine. But thanks for
suggestion because I thought it will be much slower because of using select.

That wasn't meant to be a hard and fast truth. My *assumption* was that
people understand that multithreading can often slow things down. My intent
was to point out that setting it to 1 was the same as the default and if he
wanted to experiment then he should raise it to the number of CPUs.
Thanks. I did not know that GOMAXPROCS(1) is default value.

So far I am not able to achieve even similar performance on 2-cores machine.

On Thu, Aug 30, 2012 at 1:41 PM, Kyle Lemons wrote:
On Thu, Aug 30, 2012 at 8:56 AM, Sam Freiberg wrote:

GOMAXPROCS should be set to the number of processors on the machine. I
usually put something like this in my init function:
Not always.

faq: Why Does Using GOMAXPROCS > 1 Sometimes Make My Program Slower? -
http://golang.org/doc/go_faq.html#Why_GOMAXPROCS

runtime.GOMAXPROCS(runtime.NumCPU())

On Aug 30, 2012, at 8:11 AM, Vladimir Mihailenco wrote:

As I understand you suggest to use runtime.GOMAXPROCS(1)? It did not
help:

(pprof) top10
Total: 570 samples
284 49.8% 49.8% 525 92.1% syscall.Syscall
259 45.4% 95.3% 259 45.4% runtime.futex
4 0.7% 96.0% 4 0.7% syscall.RawSyscall6
2 0.4% 96.3% 544 95.4% main.worker
2 0.4% 96.7% 2 0.4% mget
2 0.4% 97.0% 2 0.4% runtime.MCache_Alloc
2 0.4% 97.4% 2 0.4% runtime.memmove
2 0.4% 97.7% 2 0.4% sync/atomic.AddUint32
1 0.2% 97.9% 147 25.8% net.(*netFD).Read
1 0.2% 98.1% 5 0.9% net.(*pollServer).AddFD

I guess the next suggestion is to don't use multiple goroutines, but
that obviously makes benchmark useless...
On Thu, Aug 30, 2012 at 5:42 PM, Rob Pike wrote:

http://golang.org/doc/go_faq.html#Why_GOMAXPROCS

I bet there's no parallelism in the test, so you're adding overhead
(inter-CPU synchronization) to a sequential computation.

-rob

--
Sam Freiberg
+ samueldean@gmail.com
+- http://www.darkaslight.com/

Search Discussions

  • Tomwilde at Aug 31, 2012 at 12:01 pm
    exits := make([]chan int, 0, *w)
    What's the 3rd argument doing here?

    From your gist: `counters` is an unbuffered channel, so when you broadcast
    `exit`, the workers will still block on line 35 (`counters <- counter`)
    until you actually read from the channel on line 90 (`counter +=
    <-counters`).

    Am Freitag, 31. August 2012 10:50:23 UTC+2 schrieb Vladimir Mihailenco:
    I'm new to Go but here's my best guess: Each worker() pauses after each
    request to get a lock on a counter and increment it. This causes
    contention. Instead, have them sum their request count locally and write
    it to a channel when they exit. (Oh, and they have to exit, too. :) Spawn
    a separate goroutine to read the channel and update the counter.
    I've tried this version: https://gist.github.com/3550291 . It shows
    similar performance: it is still ~40% slower on 2-cores machine. But thanks
    for suggestion because I thought it will be much slower because of using
    select.

    That wasn't meant to be a hard and fast truth. My *assumption* was that
    people understand that multithreading can often slow things down. My intent
    was to point out that setting it to 1 was the same as the default and if he
    wanted to experiment then he should raise it to the number of CPUs.
    Thanks. I did not know that GOMAXPROCS(1) is default value.

    So far I am not able to achieve even similar performance on 2-cores
    machine.

    On Thu, Aug 30, 2012 at 1:41 PM, Kyle Lemons <kev...@google.com<javascript:>
    wrote:
    On Thu, Aug 30, 2012 at 8:56 AM, Sam Freiberg <samue...@gmail.com<javascript:>
    wrote:
    GOMAXPROCS should be set to the number of processors on the machine. I
    usually put something like this in my init function:
    Not always.

    faq: Why Does Using GOMAXPROCS > 1 Sometimes Make My Program Slower? -
    http://golang.org/doc/go_faq.html#Why_GOMAXPROCS

    runtime.GOMAXPROCS(runtime.NumCPU())

    On Aug 30, 2012, at 8:11 AM, Vladimir Mihailenco wrote:

    As I understand you suggest to use runtime.GOMAXPROCS(1)? It did not
    help:

    (pprof) top10
    Total: 570 samples
    284 49.8% 49.8% 525 92.1% syscall.Syscall
    259 45.4% 95.3% 259 45.4% runtime.futex
    4 0.7% 96.0% 4 0.7% syscall.RawSyscall6
    2 0.4% 96.3% 544 95.4% main.worker
    2 0.4% 96.7% 2 0.4% mget
    2 0.4% 97.0% 2 0.4% runtime.MCache_Alloc
    2 0.4% 97.4% 2 0.4% runtime.memmove
    2 0.4% 97.7% 2 0.4% sync/atomic.AddUint32
    1 0.2% 97.9% 147 25.8% net.(*netFD).Read
    1 0.2% 98.1% 5 0.9% net.(*pollServer).AddFD

    I guess the next suggestion is to don't use multiple goroutines, but
    that obviously makes benchmark useless...

    On Thu, Aug 30, 2012 at 5:42 PM, Rob Pike <r...@golang.org<javascript:>
    wrote:
    http://golang.org/doc/go_faq.html#Why_GOMAXPROCS

    I bet there's no parallelism in the test, so you're adding overhead
    (inter-CPU synchronization) to a sequential computation.

    -rob

    --
    Sam Freiberg
    + samue...@gmail.com <javascript:>
    +- http://www.darkaslight.com/
  • Vladimir Mihailenco at Aug 31, 2012 at 12:29 pm

    exits := make([]chan int, 0, *w)
    What's the 3rd argument doing here?
    It will make appending to the slice slightly faster.

    From your gist: `counters` is an unbuffered channel, so when you broadcast
    `exit`, the workers will still block on line 35 (`counters <- counter`)
    until you actually read from the channel on line 90
    (`counter += <-counters`).
    >
    Thanks for note, but I don't see how this will make program any faster on 2
    VCPUs.

    Am Freitag, 31. August 2012 10:50:23 UTC+2 schrieb Vladimir Mihailenco:
    I'm new to Go but here's my best guess: Each worker() pauses after each
    request to get a lock on a counter and increment it. This causes
    contention. Instead, have them sum their request count locally and write
    it to a channel when they exit. (Oh, and they have to exit, too. :) Spawn
    a separate goroutine to read the channel and update the counter.
    I've tried this version: https://gist.github.com/**3550291<https://gist.github.com/3550291>. It shows similar performance: it is still ~40% slower on 2-cores machine.
    But thanks for suggestion because I thought it will be much slower because
    of using select.

    That wasn't meant to be a hard and fast truth. My *assumption* was that
    people understand that multithreading can often slow things down. My intent
    was to point out that setting it to 1 was the same as the default and if he
    wanted to experiment then he should raise it to the number of CPUs.
    Thanks. I did not know that GOMAXPROCS(1) is default value.

    So far I am not able to achieve even similar performance on 2-cores
    machine.

    On Thu, Aug 30, 2012 at 1:41 PM, Kyle Lemons wrote:
    On Thu, Aug 30, 2012 at 8:56 AM, Sam Freiberg wrote:

    GOMAXPROCS should be set to the number of processors on the machine. I
    usually put something like this in my init function:
    Not always.

    faq: Why Does Using GOMAXPROCS > 1 Sometimes Make My Program Slower? -
    http://golang.org/doc/go_faq.**html#Why_GOMAXPROCS<http://golang.org/doc/go_faq.html#Why_GOMAXPROCS>

    runtime.GOMAXPROCS(runtime.**NumCPU())

    On Aug 30, 2012, at 8:11 AM, Vladimir Mihailenco wrote:

    As I understand you suggest to use runtime.GOMAXPROCS(1)? It did not
    help:

    (pprof) top10
    Total: 570 samples
    284 49.8% 49.8% 525 92.1% syscall.Syscall
    259 45.4% 95.3% 259 45.4% runtime.futex
    4 0.7% 96.0% 4 0.7% syscall.RawSyscall6
    2 0.4% 96.3% 544 95.4% main.worker
    2 0.4% 96.7% 2 0.4% mget
    2 0.4% 97.0% 2 0.4% runtime.MCache_Alloc
    2 0.4% 97.4% 2 0.4% runtime.memmove
    2 0.4% 97.7% 2 0.4% sync/atomic.AddUint32
    1 0.2% 97.9% 147 25.8% net.(*netFD).Read
    1 0.2% 98.1% 5 0.9% net.(*pollServer).AddFD

    I guess the next suggestion is to don't use multiple goroutines, but
    that obviously makes benchmark useless...
    On Thu, Aug 30, 2012 at 5:42 PM, Rob Pike wrote:

    http://golang.org/doc/go_faq.**html#Why_GOMAXPROCS<http://golang.org/doc/go_faq.html#Why_GOMAXPROCS>

    I bet there's no parallelism in the test, so you're adding overhead
    (inter-CPU synchronization) to a sequential computation.

    -rob

    --
    Sam Freiberg
    + samue...@gmail.com
    +- http://www.darkaslight.com/
  • Tomwilde at Aug 31, 2012 at 1:42 pm

    Am Freitag, 31. August 2012 14:29:10 UTC+2 schrieb Vladimir Mihailenco:
    exits := make([]chan int, 0, *w)
    What's the 3rd argument doing here?
    Oh, duh, I totally oversaw the braces there.
  • Tomwilde at Aug 31, 2012 at 1:44 pm
    I would re-implement this with connection pooling and perform the benchmark
    again.
  • Vladimir Mihailenco at Aug 31, 2012 at 2:23 pm
    Firstly I noticed this behaviour with Go Redis client that has connection
    pooling (https://github.com/vmihailenco/redis). Then I wrote this program (
    https://gist.github.com/3529263) to prove myself that problem is not in my
    client. Also there are only 10 open connections. So it should not be a
    problem.
    Perhaps your 2-core machine is just slower than the 1-core machine in
    general?
    No, redis-benchmark which is written in C shows better results on 2-core
    machine (AFAIR 60k on 2-core vs 40k on 1-core). Also test program (
    https://gist.github.com/3529263) is very short and simple so you can try to
    reproduce my results yourself. Basically what it does:
    - Start 10 goroutines to send requests to the Redis.
    - Let them execute 10 seconds.
    - Print number of requests that were sent in 10 seconds.
    - Exit.
    On Fri, Aug 31, 2012 at 4:44 PM, tomwilde wrote:

    I would re-implement this with connection pooling and perform the
    benchmark again.
  • Erwin at Aug 31, 2012 at 2:48 pm
    I missed some info that you might have given earlier on in this thread:
    does your program run with GOMAXPROCS set to 1, on both your single core
    and dual core machine?
  • Paul at Aug 31, 2012 at 6:36 pm
    you should be aware that there is currently development work going on to
    fix contention bottlenecks in the runtime. For example see this one:
    http://codereview.appspot.com/6454142/ it makes a change to the net package
    to fix a contention issue. Your redis benchmark uses that package. (while I
    am not sure what this benchmark redis attempts to prove.... ) ... as
    another commenter already pointed out in this thread you are highly likely
    to have a lot of contention on &counter . I would redesign that as
    described earlier, just from a common sense point of view.

    Are you using the latest version of Go? Do you have the latest patches
    apply'd ?

    There is a new contention profiler, did you use that? It will probably give
    significantly more detailed diagnostics of where some contention bottleneck
    is.

    Thats just all that comes to mind thus far, hope some of this is usefull to
    you.



    On Friday, August 31, 2012 4:23:27 PM UTC+2, Vladimir Mihailenco wrote:

    Firstly I noticed this behaviour with Go Redis client that has connection
    pooling (https://github.com/vmihailenco/redis). Then I wrote this program
    (https://gist.github.com/3529263) to prove myself that problem is not in
    my client. Also there are only 10 open connections. So it should not be a
    problem.
    Perhaps your 2-core machine is just slower than the 1-core machine in
    general?
    No, redis-benchmark which is written in C shows better results on 2-core
    machine (AFAIR 60k on 2-core vs 40k on 1-core). Also test program (
    https://gist.github.com/3529263) is very short and simple so you can try
    to reproduce my results yourself. Basically what it does:
    - Start 10 goroutines to send requests to the Redis.
    - Let them execute 10 seconds.
    - Print number of requests that were sent in 10 seconds.
    - Exit.

    On Fri, Aug 31, 2012 at 4:44 PM, tomwilde <sedevel...@gmail.com<javascript:>
    wrote:
    I would re-implement this with connection pooling and perform the
    benchmark again.
  • John Asmuth at Aug 31, 2012 at 1:23 pm
    Perhaps your 2-core machine is just slower than the 1-core machine in
    general?
    On Friday, August 31, 2012 4:50:23 AM UTC-4, Vladimir Mihailenco wrote:

    I'm new to Go but here's my best guess: Each worker() pauses after each
    request to get a lock on a counter and increment it. This causes
    contention. Instead, have them sum their request count locally and write
    it to a channel when they exit. (Oh, and they have to exit, too. :) Spawn
    a separate goroutine to read the channel and update the counter.
    I've tried this version: https://gist.github.com/3550291 . It shows
    similar performance: it is still ~40% slower on 2-cores machine. But thanks
    for suggestion because I thought it will be much slower because of using
    select.

    That wasn't meant to be a hard and fast truth. My *assumption* was that
    people understand that multithreading can often slow things down. My intent
    was to point out that setting it to 1 was the same as the default and if he
    wanted to experiment then he should raise it to the number of CPUs.
    Thanks. I did not know that GOMAXPROCS(1) is default value.

    So far I am not able to achieve even similar performance on 2-cores
    machine.

    On Thu, Aug 30, 2012 at 1:41 PM, Kyle Lemons <kev...@google.com<javascript:>
    wrote:
    On Thu, Aug 30, 2012 at 8:56 AM, Sam Freiberg <samue...@gmail.com<javascript:>
    wrote:
    GOMAXPROCS should be set to the number of processors on the machine. I
    usually put something like this in my init function:
    Not always.

    faq: Why Does Using GOMAXPROCS > 1 Sometimes Make My Program Slower? -
    http://golang.org/doc/go_faq.html#Why_GOMAXPROCS

    runtime.GOMAXPROCS(runtime.NumCPU())

    On Aug 30, 2012, at 8:11 AM, Vladimir Mihailenco wrote:

    As I understand you suggest to use runtime.GOMAXPROCS(1)? It did not
    help:

    (pprof) top10
    Total: 570 samples
    284 49.8% 49.8% 525 92.1% syscall.Syscall
    259 45.4% 95.3% 259 45.4% runtime.futex
    4 0.7% 96.0% 4 0.7% syscall.RawSyscall6
    2 0.4% 96.3% 544 95.4% main.worker
    2 0.4% 96.7% 2 0.4% mget
    2 0.4% 97.0% 2 0.4% runtime.MCache_Alloc
    2 0.4% 97.4% 2 0.4% runtime.memmove
    2 0.4% 97.7% 2 0.4% sync/atomic.AddUint32
    1 0.2% 97.9% 147 25.8% net.(*netFD).Read
    1 0.2% 98.1% 5 0.9% net.(*pollServer).AddFD

    I guess the next suggestion is to don't use multiple goroutines, but
    that obviously makes benchmark useless...

    On Thu, Aug 30, 2012 at 5:42 PM, Rob Pike <r...@golang.org<javascript:>
    wrote:
    http://golang.org/doc/go_faq.html#Why_GOMAXPROCS

    I bet there's no parallelism in the test, so you're adding overhead
    (inter-CPU synchronization) to a sequential computation.

    -rob

    --
    Sam Freiberg
    + samue...@gmail.com <javascript:>
    +- http://www.darkaslight.com/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedAug 31, '12 at 8:50a
activeAug 31, '12 at 6:36p
posts9
users5
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase