FAQ
Hi guys.

*Foreword:*
Im trying to find out a reason why the following sample has decrease in
performance (if fd is set to non-blocking mode).

*Code:* http://play.golang.org/p/Itb9Gs7RCp
*Go-version:* go1.2 linux/amd64

*What handler does:*

    1. Accept connection
    2. Do several number of read/write
    3. Close connection

The getFile() function is a *trick* to set an underlying fd into a blocking
mode.

I did some tests and here is the result:

*In non-blocking mode:* (do not call getFile()):

*CPU:* ~40% (boost to 45%)


*Top:*

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

10650 root 19 0 63596 8488 1548 S 39.3 0.5 0:03.63 hello


*Prof:*

(pprof) top
Total: 2670 samples
     1362 51.0% 51.0% 1437 53.8% syscall.Syscall
      122 4.6% 55.6% 122 4.6% runtime.futex
      119 4.5% 60.0% 119 4.5% runtime.epollctl
      118 4.4% 64.5% 118 4.4% runtime.epollwait
       61 2.3% 66.7% 136 5.1% syscall.Syscall6
       53 2.0% 68.7% 109 4.1% runtime.mallocgc
       38 1.4% 70.1% 38 1.4% runtime.xchg
       35 1.3% 71.5% 35 1.3% scanblock
       23 0.9% 72.3% 151 5.7% runtime.entersyscall
       22 0.8% 73.1% 22 0.8% runtime.markallocated


*Prof (in order):*

(pprof) top20 -cum
Total: 2512 samples
        0 0.0% 0.0% 1562 62.2% System
     1161 46.2% 46.2% 1278 50.9% syscall.Syscall
        0 0.0% 46.2% 945 37.6% runtime.gosched0
        2 0.1% 46.3% 532 21.2% main.main
        0 0.0% 46.3% 532 21.2% runtime.main
        1 0.0% 46.3% 512 20.4% net.(*TCPListener).Accept
       10 0.4% 46.7% 511 20.3% net.(*TCPListener).AcceptTCP
        9 0.4% 47.1% 467 18.6% net.(*netFD).accept
       11 0.4% 47.5% 413 16.4% main.handleConnection
       20 0.8% 48.3% 271 10.8% runtime.entersyscall
        4 0.2% 48.5% 243 9.7% net.accept
        1 0.0% 48.5% 234 9.3% runtime.notewakeup
        2 0.1% 48.6% 231 9.2% runtime.futexwakeup
      229 9.1% 57.7% 229 9.1% runtime.futex
       61 2.4% 60.2% 216 8.6% syscall.Syscall6
        4 0.2% 60.3% 190 7.6% syscall.Accept4
        2 0.1% 60.4% 157 6.2% syscall.accept4
        1 0.0% 60.4% 150 6.0% net.(*conn).Read
       11 0.4% 60.9% 149 5.9% net.(*netFD).Read
      130 5.2% 66.0% 130 5.2% runtime.epollwait


*In blocking mode:* (call getFile())

*CPU:* ~20% (boost to 25%)


*Top:*

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

10708 root 20 0 153m 8880 1540 S 22.3 0.5 0:03.33 hello


*Prof:*

(pprof) top
Total: 1996 samples
     1225 61.4% 61.4% 1254 62.8% syscall.Syscall
       92 4.6% 66.0% 92 4.6% runtime.epollctl
       40 2.0% 68.0% 72 3.6% runtime.mallocgc
       28 1.4% 69.4% 28 1.4% syscall.RawSyscall
       25 1.3% 70.6% 25 1.3% scanblock
       24 1.2% 71.8% 29 1.5% syscall.Syscall6
       22 1.1% 72.9% 22 1.1% runtime.memmove
       20 1.0% 73.9% 20 1.0% runtime.exitsyscall
       19 1.0% 74.9% 29 1.5% runtime.entersyscall
       15 0.8% 75.7% 45 2.3% runtime.deferreturn


*Prof (in order):*

(pprof) top20 -cum
Total: 1996 samples
        0 0.0% 0.0% 1370 68.6% System
     1225 61.4% 61.4% 1254 62.8% syscall.Syscall
        0 0.0% 61.4% 623 31.2% runtime.gosched0
       10 0.5% 61.9% 328 16.4% main.handleConnection
        1 0.1% 61.9% 295 14.8% main.main
        0 0.0% 61.9% 295 14.8% runtime.main
        0 0.0% 61.9% 169 8.5% net.(*TCPListener).Accept
        3 0.2% 62.1% 169 8.5% net.(*TCPListener).AcceptTCP
        5 0.3% 62.3% 149 7.5% net.(*netFD).accept
       11 0.6% 62.9% 128 6.4% fmt.Fprintf
        3 0.2% 63.0% 114 5.7% main.getFile
        1 0.1% 63.1% 106 5.3% net.(*conn).Close
        2 0.1% 63.2% 105 5.3% net.(*netFD).Close
        2 0.1% 63.3% 96 4.8% net.(*conn).File
        0 0.0% 63.3% 94 4.7% net.(*netFD).decref
        1 0.1% 63.3% 94 4.7% net.(*netFD).dup
        0 0.0% 63.3% 92 4.6% net.(*netFD).destroy
       92 4.6% 67.9% 92 4.6% runtime.epollctl
       40 2.0% 69.9% 72 3.6% runtime.mallocgc
        6 0.3% 70.2% 71 3.6% net.(*conn).Write



Based on Top output, i can figure out that non-blocking mode takes much
more CPU resources than blocking mode. But blocking mode works a bit slower
(total execution time).
I think, that the reason could be in scheduler (switching tasks within a
single os thread), but i'm not sure.

Performance is very important for me. CPU boosted to 100% when the handler
has a lot of read/write functions (non-blocking mode).
I don't know what to do. I can't keep CPU usage so high by using
non-blocking mode. And i can't use blocking mode, cause of:

    1. I don't know what happened with the scheduler after setting fd to
    blocking mode.
    2. Total execution time is a bit slower.


I would be glad of any help. Thanks.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Dmitry Vyukov at Mar 25, 2014 at 12:43 pm
    check the return value from conn.Read, the program can be reading
    different amount of data in blocking/non-blocking mode
    make sure that you read 1024 bytes in both modes

    post output when running with GODEBUG=schedtrace=100 environment variable

    generally, if you want to do lots of work as fast as possible, you
    need to pay with CPU consumption. there is no magical way to work w/o
    consuming CPU



    On Tue, Mar 25, 2014 at 8:47 AM, wrote:
    Hi guys.

    Foreword:
    Im trying to find out a reason why the following sample has decrease in
    performance (if fd is set to non-blocking mode).

    Code: http://play.golang.org/p/Itb9Gs7RCp
    Go-version: go1.2 linux/amd64

    What handler does:

    Accept connection
    Do several number of read/write
    Close connection

    The getFile() function is a trick to set an underlying fd into a blocking
    mode.

    I did some tests and here is the result:

    In non-blocking mode: (do not call getFile()):

    CPU: ~40% (boost to 45%)


    Top:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    10650 root 19 0 63596 8488 1548 S 39.3 0.5 0:03.63 hello


    Prof:

    (pprof) top
    Total: 2670 samples
    1362 51.0% 51.0% 1437 53.8% syscall.Syscall
    122 4.6% 55.6% 122 4.6% runtime.futex
    119 4.5% 60.0% 119 4.5% runtime.epollctl
    118 4.4% 64.5% 118 4.4% runtime.epollwait
    61 2.3% 66.7% 136 5.1% syscall.Syscall6
    53 2.0% 68.7% 109 4.1% runtime.mallocgc
    38 1.4% 70.1% 38 1.4% runtime.xchg
    35 1.3% 71.5% 35 1.3% scanblock
    23 0.9% 72.3% 151 5.7% runtime.entersyscall
    22 0.8% 73.1% 22 0.8% runtime.markallocated


    Prof (in order):

    (pprof) top20 -cum
    Total: 2512 samples
    0 0.0% 0.0% 1562 62.2% System
    1161 46.2% 46.2% 1278 50.9% syscall.Syscall
    0 0.0% 46.2% 945 37.6% runtime.gosched0
    2 0.1% 46.3% 532 21.2% main.main
    0 0.0% 46.3% 532 21.2% runtime.main
    1 0.0% 46.3% 512 20.4% net.(*TCPListener).Accept
    10 0.4% 46.7% 511 20.3% net.(*TCPListener).AcceptTCP
    9 0.4% 47.1% 467 18.6% net.(*netFD).accept
    11 0.4% 47.5% 413 16.4% main.handleConnection
    20 0.8% 48.3% 271 10.8% runtime.entersyscall
    4 0.2% 48.5% 243 9.7% net.accept
    1 0.0% 48.5% 234 9.3% runtime.notewakeup
    2 0.1% 48.6% 231 9.2% runtime.futexwakeup
    229 9.1% 57.7% 229 9.1% runtime.futex
    61 2.4% 60.2% 216 8.6% syscall.Syscall6
    4 0.2% 60.3% 190 7.6% syscall.Accept4
    2 0.1% 60.4% 157 6.2% syscall.accept4
    1 0.0% 60.4% 150 6.0% net.(*conn).Read
    11 0.4% 60.9% 149 5.9% net.(*netFD).Read
    130 5.2% 66.0% 130 5.2% runtime.epollwait


    In blocking mode: (call getFile())

    CPU: ~20% (boost to 25%)


    Top:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    10708 root 20 0 153m 8880 1540 S 22.3 0.5 0:03.33 hello


    Prof:

    (pprof) top
    Total: 1996 samples
    1225 61.4% 61.4% 1254 62.8% syscall.Syscall
    92 4.6% 66.0% 92 4.6% runtime.epollctl
    40 2.0% 68.0% 72 3.6% runtime.mallocgc
    28 1.4% 69.4% 28 1.4% syscall.RawSyscall
    25 1.3% 70.6% 25 1.3% scanblock
    24 1.2% 71.8% 29 1.5% syscall.Syscall6
    22 1.1% 72.9% 22 1.1% runtime.memmove
    20 1.0% 73.9% 20 1.0% runtime.exitsyscall
    19 1.0% 74.9% 29 1.5% runtime.entersyscall
    15 0.8% 75.7% 45 2.3% runtime.deferreturn


    Prof (in order):

    (pprof) top20 -cum
    Total: 1996 samples
    0 0.0% 0.0% 1370 68.6% System
    1225 61.4% 61.4% 1254 62.8% syscall.Syscall
    0 0.0% 61.4% 623 31.2% runtime.gosched0
    10 0.5% 61.9% 328 16.4% main.handleConnection
    1 0.1% 61.9% 295 14.8% main.main
    0 0.0% 61.9% 295 14.8% runtime.main
    0 0.0% 61.9% 169 8.5% net.(*TCPListener).Accept
    3 0.2% 62.1% 169 8.5% net.(*TCPListener).AcceptTCP
    5 0.3% 62.3% 149 7.5% net.(*netFD).accept
    11 0.6% 62.9% 128 6.4% fmt.Fprintf
    3 0.2% 63.0% 114 5.7% main.getFile
    1 0.1% 63.1% 106 5.3% net.(*conn).Close
    2 0.1% 63.2% 105 5.3% net.(*netFD).Close
    2 0.1% 63.3% 96 4.8% net.(*conn).File
    0 0.0% 63.3% 94 4.7% net.(*netFD).decref
    1 0.1% 63.3% 94 4.7% net.(*netFD).dup
    0 0.0% 63.3% 92 4.6% net.(*netFD).destroy
    92 4.6% 67.9% 92 4.6% runtime.epollctl
    40 2.0% 69.9% 72 3.6% runtime.mallocgc
    6 0.3% 70.2% 71 3.6% net.(*conn).Write



    Based on Top output, i can figure out that non-blocking mode takes much more
    CPU resources than blocking mode. But blocking mode works a bit slower
    (total execution time).
    I think, that the reason could be in scheduler (switching tasks within a
    single os thread), but i'm not sure.

    Performance is very important for me. CPU boosted to 100% when the handler
    has a lot of read/write functions (non-blocking mode).
    I don't know what to do. I can't keep CPU usage so high by using
    non-blocking mode. And i can't use blocking mode, cause of:

    I don't know what happened with the scheduler after setting fd to blocking
    mode.
    Total execution time is a bit slower.


    I would be glad of any help. Thanks.


    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
  • Ilarkin at Mar 26, 2014 at 2:33 am
    Thanks for reply.

    On Tuesday, March 25, 2014 7:43:14 PM UTC+7, Dmitry Vyukov wrote:

    check the return value from conn.Read, the program can be reading
    different amount of data in blocking/non-blocking mode
    make sure that you read 1024 bytes in both modes
        - The return value is the same in both modes.


    post output when running with GODEBUG=schedtrace=100 environment variable
        - The output is big enough, so i put it into the attachment.

    Can you explain to me please, what happened in both modes with scheduler.
    (or, maybe, provide a design doc, if it changed after this one<https://docs.google.com/a/crystalnix.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit?pli=1>
    ).



    generally, if you want to do lots of work as fast as possible, you
    need to pay with CPU consumption. there is no magical way to work w/o
    consuming CPU

        - A few words about sender's side. The sender is go app, which tries to
        create a lot of clients as it is possible. Each client has time to make
        single Read/Write.

    Im wondering, why CPU usage on the recipient's side (hello) grows with the
    number of clients on the sender's side? (using non-blocking mode).

    Sender and recipient are placed on a different nodes. CPU usage is around
    40% if it works with a single sender.

    But if i'll create an additional sender (which is also on a different
    node), CPU usage grows up to 60%.



    On Tue, Mar 25, 2014 at 8:47 AM, <[email protected] <javascript:>>
    wrote:
    Hi guys.

    Foreword:
    Im trying to find out a reason why the following sample has decrease in
    performance (if fd is set to non-blocking mode).

    Code: http://play.golang.org/p/Itb9Gs7RCp
    Go-version: go1.2 linux/amd64

    What handler does:

    Accept connection
    Do several number of read/write
    Close connection

    The getFile() function is a trick to set an underlying fd into a blocking
    mode.

    I did some tests and here is the result:

    In non-blocking mode: (do not call getFile()):

    CPU: ~40% (boost to 45%)


    Top:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    10650 root 19 0 63596 8488 1548 S 39.3 0.5 0:03.63 hello


    Prof:

    (pprof) top
    Total: 2670 samples
    1362 51.0% 51.0% 1437 53.8% syscall.Syscall
    122 4.6% 55.6% 122 4.6% runtime.futex
    119 4.5% 60.0% 119 4.5% runtime.epollctl
    118 4.4% 64.5% 118 4.4% runtime.epollwait
    61 2.3% 66.7% 136 5.1% syscall.Syscall6
    53 2.0% 68.7% 109 4.1% runtime.mallocgc
    38 1.4% 70.1% 38 1.4% runtime.xchg
    35 1.3% 71.5% 35 1.3% scanblock
    23 0.9% 72.3% 151 5.7% runtime.entersyscall
    22 0.8% 73.1% 22 0.8% runtime.markallocated


    Prof (in order):

    (pprof) top20 -cum
    Total: 2512 samples
    0 0.0% 0.0% 1562 62.2% System
    1161 46.2% 46.2% 1278 50.9% syscall.Syscall
    0 0.0% 46.2% 945 37.6% runtime.gosched0
    2 0.1% 46.3% 532 21.2% main.main
    0 0.0% 46.3% 532 21.2% runtime.main
    1 0.0% 46.3% 512 20.4% net.(*TCPListener).Accept
    10 0.4% 46.7% 511 20.3% net.(*TCPListener).AcceptTCP
    9 0.4% 47.1% 467 18.6% net.(*netFD).accept
    11 0.4% 47.5% 413 16.4% main.handleConnection
    20 0.8% 48.3% 271 10.8% runtime.entersyscall
    4 0.2% 48.5% 243 9.7% net.accept
    1 0.0% 48.5% 234 9.3% runtime.notewakeup
    2 0.1% 48.6% 231 9.2% runtime.futexwakeup
    229 9.1% 57.7% 229 9.1% runtime.futex
    61 2.4% 60.2% 216 8.6% syscall.Syscall6
    4 0.2% 60.3% 190 7.6% syscall.Accept4
    2 0.1% 60.4% 157 6.2% syscall.accept4
    1 0.0% 60.4% 150 6.0% net.(*conn).Read
    11 0.4% 60.9% 149 5.9% net.(*netFD).Read
    130 5.2% 66.0% 130 5.2% runtime.epollwait


    In blocking mode: (call getFile())

    CPU: ~20% (boost to 25%)


    Top:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    10708 root 20 0 153m 8880 1540 S 22.3 0.5 0:03.33 hello


    Prof:

    (pprof) top
    Total: 1996 samples
    1225 61.4% 61.4% 1254 62.8% syscall.Syscall
    92 4.6% 66.0% 92 4.6% runtime.epollctl
    40 2.0% 68.0% 72 3.6% runtime.mallocgc
    28 1.4% 69.4% 28 1.4% syscall.RawSyscall
    25 1.3% 70.6% 25 1.3% scanblock
    24 1.2% 71.8% 29 1.5% syscall.Syscall6
    22 1.1% 72.9% 22 1.1% runtime.memmove
    20 1.0% 73.9% 20 1.0% runtime.exitsyscall
    19 1.0% 74.9% 29 1.5% runtime.entersyscall
    15 0.8% 75.7% 45 2.3% runtime.deferreturn


    Prof (in order):

    (pprof) top20 -cum
    Total: 1996 samples
    0 0.0% 0.0% 1370 68.6% System
    1225 61.4% 61.4% 1254 62.8% syscall.Syscall
    0 0.0% 61.4% 623 31.2% runtime.gosched0
    10 0.5% 61.9% 328 16.4% main.handleConnection
    1 0.1% 61.9% 295 14.8% main.main
    0 0.0% 61.9% 295 14.8% runtime.main
    0 0.0% 61.9% 169 8.5% net.(*TCPListener).Accept
    3 0.2% 62.1% 169 8.5% net.(*TCPListener).AcceptTCP
    5 0.3% 62.3% 149 7.5% net.(*netFD).accept
    11 0.6% 62.9% 128 6.4% fmt.Fprintf
    3 0.2% 63.0% 114 5.7% main.getFile
    1 0.1% 63.1% 106 5.3% net.(*conn).Close
    2 0.1% 63.2% 105 5.3% net.(*netFD).Close
    2 0.1% 63.3% 96 4.8% net.(*conn).File
    0 0.0% 63.3% 94 4.7% net.(*netFD).decref
    1 0.1% 63.3% 94 4.7% net.(*netFD).dup
    0 0.0% 63.3% 92 4.6% net.(*netFD).destroy
    92 4.6% 67.9% 92 4.6% runtime.epollctl
    40 2.0% 69.9% 72 3.6% runtime.mallocgc
    6 0.3% 70.2% 71 3.6% net.(*conn).Write



    Based on Top output, i can figure out that non-blocking mode takes much more
    CPU resources than blocking mode. But blocking mode works a bit slower
    (total execution time).
    I think, that the reason could be in scheduler (switching tasks within a
    single os thread), but i'm not sure.

    Performance is very important for me. CPU boosted to 100% when the handler
    has a lot of read/write functions (non-blocking mode).
    I don't know what to do. I can't keep CPU usage so high by using
    non-blocking mode. And i can't use blocking mode, cause of:

    I don't know what happened with the scheduler after setting fd to blocking
    mode.
    Total execution time is a bit slower.


    I would be glad of any help. Thanks.


    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected] <javascript:>.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
  • Ilarkin at Mar 26, 2014 at 2:53 am
    Here is an output, when i use two nodes.

    *Top:*

      PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

      23657 root 18 0 70484 13m 1584 S 62.8 0.8 0:08.27 hello

    *Debug:*

    Output in the attachment.

    On Wednesday, March 26, 2014 9:33:53 AM UTC+7, [email protected] wrote:

    Thanks for reply.

    On Tuesday, March 25, 2014 7:43:14 PM UTC+7, Dmitry Vyukov wrote:

    check the return value from conn.Read, the program can be reading
    different amount of data in blocking/non-blocking mode
    make sure that you read 1024 bytes in both modes
    - The return value is the same in both modes.


    post output when running with GODEBUG=schedtrace=100 environment variable
    - The output is big enough, so i put it into the attachment.

    Can you explain to me please, what happened in both modes with scheduler.
    (or, maybe, provide a design doc, if it changed after this one<https://docs.google.com/a/crystalnix.com/document/d/1TTj4T2JO42uD5ID9e89oa0sLKhJYD0Y_kqxDv3I3XMw/edit?pli=1>
    ).



    generally, if you want to do lots of work as fast as possible, you
    need to pay with CPU consumption. there is no magical way to work w/o
    consuming CPU

    - A few words about sender's side. The sender is go app, which tries
    to create a lot of clients as it is possible. Each client has time to make
    single Read/Write.

    Im wondering, why CPU usage on the recipient's side (hello) grows with the
    number of clients on the sender's side? (using non-blocking mode).

    Sender and recipient are placed on a different nodes. CPU usage is around
    40% if it works with a single sender.

    But if i'll create an additional sender (which is also on a different
    node), CPU usage grows up to 60%.


    On Tue, Mar 25, 2014 at 8:47 AM, wrote:
    Hi guys.

    Foreword:
    Im trying to find out a reason why the following sample has decrease in
    performance (if fd is set to non-blocking mode).

    Code: http://play.golang.org/p/Itb9Gs7RCp
    Go-version: go1.2 linux/amd64

    What handler does:

    Accept connection
    Do several number of read/write
    Close connection

    The getFile() function is a trick to set an underlying fd into a blocking
    mode.

    I did some tests and here is the result:

    In non-blocking mode: (do not call getFile()):

    CPU: ~40% (boost to 45%)


    Top:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    10650 root 19 0 63596 8488 1548 S 39.3 0.5 0:03.63 hello


    Prof:

    (pprof) top
    Total: 2670 samples
    1362 51.0% 51.0% 1437 53.8% syscall.Syscall
    122 4.6% 55.6% 122 4.6% runtime.futex
    119 4.5% 60.0% 119 4.5% runtime.epollctl
    118 4.4% 64.5% 118 4.4% runtime.epollwait
    61 2.3% 66.7% 136 5.1% syscall.Syscall6
    53 2.0% 68.7% 109 4.1% runtime.mallocgc
    38 1.4% 70.1% 38 1.4% runtime.xchg
    35 1.3% 71.5% 35 1.3% scanblock
    23 0.9% 72.3% 151 5.7% runtime.entersyscall
    22 0.8% 73.1% 22 0.8% runtime.markallocated


    Prof (in order):

    (pprof) top20 -cum
    Total: 2512 samples
    0 0.0% 0.0% 1562 62.2% System
    1161 46.2% 46.2% 1278 50.9% syscall.Syscall
    0 0.0% 46.2% 945 37.6% runtime.gosched0
    2 0.1% 46.3% 532 21.2% main.main
    0 0.0% 46.3% 532 21.2% runtime.main
    1 0.0% 46.3% 512 20.4% net.(*TCPListener).Accept
    10 0.4% 46.7% 511 20.3% net.(*TCPListener).AcceptTCP
    9 0.4% 47.1% 467 18.6% net.(*netFD).accept
    11 0.4% 47.5% 413 16.4% main.handleConnection
    20 0.8% 48.3% 271 10.8% runtime.entersyscall
    4 0.2% 48.5% 243 9.7% net.accept
    1 0.0% 48.5% 234 9.3% runtime.notewakeup
    2 0.1% 48.6% 231 9.2% runtime.futexwakeup
    229 9.1% 57.7% 229 9.1% runtime.futex
    61 2.4% 60.2% 216 8.6% syscall.Syscall6
    4 0.2% 60.3% 190 7.6% syscall.Accept4
    2 0.1% 60.4% 157 6.2% syscall.accept4
    1 0.0% 60.4% 150 6.0% net.(*conn).Read
    11 0.4% 60.9% 149 5.9% net.(*netFD).Read
    130 5.2% 66.0% 130 5.2% runtime.epollwait


    In blocking mode: (call getFile())

    CPU: ~20% (boost to 25%)


    Top:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    10708 root 20 0 153m 8880 1540 S 22.3 0.5 0:03.33 hello


    Prof:

    (pprof) top
    Total: 1996 samples
    1225 61.4% 61.4% 1254 62.8% syscall.Syscall
    92 4.6% 66.0% 92 4.6% runtime.epollctl
    40 2.0% 68.0% 72 3.6% runtime.mallocgc
    28 1.4% 69.4% 28 1.4% syscall.RawSyscall
    25 1.3% 70.6% 25 1.3% scanblock
    24 1.2% 71.8% 29 1.5% syscall.Syscall6
    22 1.1% 72.9% 22 1.1% runtime.memmove
    20 1.0% 73.9% 20 1.0% runtime.exitsyscall
    19 1.0% 74.9% 29 1.5% runtime.entersyscall
    15 0.8% 75.7% 45 2.3% runtime.deferreturn


    Prof (in order):

    (pprof) top20 -cum
    Total: 1996 samples
    0 0.0% 0.0% 1370 68.6% System
    1225 61.4% 61.4% 1254 62.8% syscall.Syscall
    0 0.0% 61.4% 623 31.2% runtime.gosched0
    10 0.5% 61.9% 328 16.4% main.handleConnection
    1 0.1% 61.9% 295 14.8% main.main
    0 0.0% 61.9% 295 14.8% runtime.main
    0 0.0% 61.9% 169 8.5% net.(*TCPListener).Accept
    3 0.2% 62.1% 169 8.5% net.(*TCPListener).AcceptTCP
    5 0.3% 62.3% 149 7.5% net.(*netFD).accept
    11 0.6% 62.9% 128 6.4% fmt.Fprintf
    3 0.2% 63.0% 114 5.7% main.getFile
    1 0.1% 63.1% 106 5.3% net.(*conn).Close
    2 0.1% 63.2% 105 5.3% net.(*netFD).Close
    2 0.1% 63.3% 96 4.8% net.(*conn).File
    0 0.0% 63.3% 94 4.7% net.(*netFD).decref
    1 0.1% 63.3% 94 4.7% net.(*netFD).dup
    0 0.0% 63.3% 92 4.6% net.(*netFD).destroy
    92 4.6% 67.9% 92 4.6% runtime.epollctl
    40 2.0% 69.9% 72 3.6% runtime.mallocgc
    6 0.3% 70.2% 71 3.6% net.(*conn).Write



    Based on Top output, i can figure out that non-blocking mode takes much more
    CPU resources than blocking mode. But blocking mode works a bit slower
    (total execution time).
    I think, that the reason could be in scheduler (switching tasks within a
    single os thread), but i'm not sure.

    Performance is very important for me. CPU boosted to 100% when the handler
    has a lot of read/write functions (non-blocking mode).
    I don't know what to do. I can't keep CPU usage so high by using
    non-blocking mode. And i can't use blocking mode, cause of:

    I don't know what happened with the scheduler after setting fd to blocking
    mode.
    Total execution time is a bit slower.


    I would be glad of any help. Thanks.


    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Mar 26, 2014 at 8:03 am

    On Wed, Mar 26, 2014 at 6:33 AM, wrote:
    Thanks for reply.

    On Tuesday, March 25, 2014 7:43:14 PM UTC+7, Dmitry Vyukov wrote:

    check the return value from conn.Read, the program can be reading
    different amount of data in blocking/non-blocking mode
    make sure that you read 1024 bytes in both modes
    The return value is the same in both modes.

    post output when running with GODEBUG=schedtrace=100 environment variable
    The output is big enough, so i put it into the attachment.

    Can you explain to me please, what happened in both modes with scheduler.
    (or, maybe, provide a design doc, if it changed after this one).

    In non-blocking mode scheduler deschedules goroutine waiting for IO,
    arms epoll/kqueue/IOCP notification mechanism for it and continues
    executing other gorotuines on the same thread.
    In blocking mode scheduler "loses" the thread that calls blocking
    syscall.Read; so after 20us it creates another thread to run
    goroutines.



    generally, if you want to do lots of work as fast as possible, you
    need to pay with CPU consumption. there is no magical way to work w/o
    consuming CPU
    A few words about sender's side. The sender is go app, which tries to create
    a lot of clients as it is possible. Each client has time to make single
    Read/Write.

    Im wondering, why CPU usage on the recipient's side (hello) grows with the
    number of clients on the sender's side?

    This looks obvious, server needs to do more work to serve more
    clients, so it consumes more CPU.

    (using non-blocking mode).

    Sender and recipient are placed on a different nodes. CPU usage is around
    40% if it works with a single sender.

    But if i'll create an additional sender (which is also on a different node),
    CPU usage grows up to 60%.

    I would suggest using non-blocking mode. The blocking mode is
    basically un-supported performance-wise and can require an infinite
    number of threads.

    Also, when this break happens:
    if i >= 300000 {
         break
    }
    it's not necessary that all connections are served. There can be
    unbounded number of unserved connections.


    On Tue, Mar 25, 2014 at 8:47 AM, wrote:
    Hi guys.

    Foreword:
    Im trying to find out a reason why the following sample has decrease in
    performance (if fd is set to non-blocking mode).

    Code: http://play.golang.org/p/Itb9Gs7RCp
    Go-version: go1.2 linux/amd64

    What handler does:

    Accept connection
    Do several number of read/write
    Close connection

    The getFile() function is a trick to set an underlying fd into a
    blocking
    mode.

    I did some tests and here is the result:

    In non-blocking mode: (do not call getFile()):

    CPU: ~40% (boost to 45%)


    Top:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    10650 root 19 0 63596 8488 1548 S 39.3 0.5 0:03.63 hello


    Prof:

    (pprof) top
    Total: 2670 samples
    1362 51.0% 51.0% 1437 53.8% syscall.Syscall
    122 4.6% 55.6% 122 4.6% runtime.futex
    119 4.5% 60.0% 119 4.5% runtime.epollctl
    118 4.4% 64.5% 118 4.4% runtime.epollwait
    61 2.3% 66.7% 136 5.1% syscall.Syscall6
    53 2.0% 68.7% 109 4.1% runtime.mallocgc
    38 1.4% 70.1% 38 1.4% runtime.xchg
    35 1.3% 71.5% 35 1.3% scanblock
    23 0.9% 72.3% 151 5.7% runtime.entersyscall
    22 0.8% 73.1% 22 0.8% runtime.markallocated


    Prof (in order):

    (pprof) top20 -cum
    Total: 2512 samples
    0 0.0% 0.0% 1562 62.2% System
    1161 46.2% 46.2% 1278 50.9% syscall.Syscall
    0 0.0% 46.2% 945 37.6% runtime.gosched0
    2 0.1% 46.3% 532 21.2% main.main
    0 0.0% 46.3% 532 21.2% runtime.main
    1 0.0% 46.3% 512 20.4% net.(*TCPListener).Accept
    10 0.4% 46.7% 511 20.3% net.(*TCPListener).AcceptTCP
    9 0.4% 47.1% 467 18.6% net.(*netFD).accept
    11 0.4% 47.5% 413 16.4% main.handleConnection
    20 0.8% 48.3% 271 10.8% runtime.entersyscall
    4 0.2% 48.5% 243 9.7% net.accept
    1 0.0% 48.5% 234 9.3% runtime.notewakeup
    2 0.1% 48.6% 231 9.2% runtime.futexwakeup
    229 9.1% 57.7% 229 9.1% runtime.futex
    61 2.4% 60.2% 216 8.6% syscall.Syscall6
    4 0.2% 60.3% 190 7.6% syscall.Accept4
    2 0.1% 60.4% 157 6.2% syscall.accept4
    1 0.0% 60.4% 150 6.0% net.(*conn).Read
    11 0.4% 60.9% 149 5.9% net.(*netFD).Read
    130 5.2% 66.0% 130 5.2% runtime.epollwait


    In blocking mode: (call getFile())

    CPU: ~20% (boost to 25%)


    Top:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    10708 root 20 0 153m 8880 1540 S 22.3 0.5 0:03.33 hello


    Prof:

    (pprof) top
    Total: 1996 samples
    1225 61.4% 61.4% 1254 62.8% syscall.Syscall
    92 4.6% 66.0% 92 4.6% runtime.epollctl
    40 2.0% 68.0% 72 3.6% runtime.mallocgc
    28 1.4% 69.4% 28 1.4% syscall.RawSyscall
    25 1.3% 70.6% 25 1.3% scanblock
    24 1.2% 71.8% 29 1.5% syscall.Syscall6
    22 1.1% 72.9% 22 1.1% runtime.memmove
    20 1.0% 73.9% 20 1.0% runtime.exitsyscall
    19 1.0% 74.9% 29 1.5% runtime.entersyscall
    15 0.8% 75.7% 45 2.3% runtime.deferreturn


    Prof (in order):

    (pprof) top20 -cum
    Total: 1996 samples
    0 0.0% 0.0% 1370 68.6% System
    1225 61.4% 61.4% 1254 62.8% syscall.Syscall
    0 0.0% 61.4% 623 31.2% runtime.gosched0
    10 0.5% 61.9% 328 16.4% main.handleConnection
    1 0.1% 61.9% 295 14.8% main.main
    0 0.0% 61.9% 295 14.8% runtime.main
    0 0.0% 61.9% 169 8.5% net.(*TCPListener).Accept
    3 0.2% 62.1% 169 8.5% net.(*TCPListener).AcceptTCP
    5 0.3% 62.3% 149 7.5% net.(*netFD).accept
    11 0.6% 62.9% 128 6.4% fmt.Fprintf
    3 0.2% 63.0% 114 5.7% main.getFile
    1 0.1% 63.1% 106 5.3% net.(*conn).Close
    2 0.1% 63.2% 105 5.3% net.(*netFD).Close
    2 0.1% 63.3% 96 4.8% net.(*conn).File
    0 0.0% 63.3% 94 4.7% net.(*netFD).decref
    1 0.1% 63.3% 94 4.7% net.(*netFD).dup
    0 0.0% 63.3% 92 4.6% net.(*netFD).destroy
    92 4.6% 67.9% 92 4.6% runtime.epollctl
    40 2.0% 69.9% 72 3.6% runtime.mallocgc
    6 0.3% 70.2% 71 3.6% net.(*conn).Write



    Based on Top output, i can figure out that non-blocking mode takes much
    more
    CPU resources than blocking mode. But blocking mode works a bit slower
    (total execution time).
    I think, that the reason could be in scheduler (switching tasks within a
    single os thread), but i'm not sure.

    Performance is very important for me. CPU boosted to 100% when the
    handler
    has a lot of read/write functions (non-blocking mode).
    I don't know what to do. I can't keep CPU usage so high by using
    non-blocking mode. And i can't use blocking mode, cause of:

    I don't know what happened with the scheduler after setting fd to
    blocking
    mode.
    Total execution time is a bit slower.


    I would be glad of any help. Thanks.


    --
    You received this message because you are subscribed to the Google
    Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an
    email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
  • Ilarkin at Mar 26, 2014 at 8:41 am
    Now, It's clear enough for me what happening on the scheduler's side.
    Thanks for advice.
    On Wednesday, March 26, 2014 3:03:27 PM UTC+7, Dmitry Vyukov wrote:

    On Wed, Mar 26, 2014 at 6:33 AM, <[email protected] <javascript:>>
    wrote:
    Thanks for reply.

    On Tuesday, March 25, 2014 7:43:14 PM UTC+7, Dmitry Vyukov wrote:

    check the return value from conn.Read, the program can be reading
    different amount of data in blocking/non-blocking mode
    make sure that you read 1024 bytes in both modes
    The return value is the same in both modes.

    post output when running with GODEBUG=schedtrace=100 environment
    variable
    The output is big enough, so i put it into the attachment.

    Can you explain to me please, what happened in both modes with
    scheduler.
    (or, maybe, provide a design doc, if it changed after this one).

    In non-blocking mode scheduler deschedules goroutine waiting for IO,
    arms epoll/kqueue/IOCP notification mechanism for it and continues
    executing other gorotuines on the same thread.
    In blocking mode scheduler "loses" the thread that calls blocking
    syscall.Read; so after 20us it creates another thread to run
    goroutines.



    generally, if you want to do lots of work as fast as possible, you
    need to pay with CPU consumption. there is no magical way to work w/o
    consuming CPU
    A few words about sender's side. The sender is go app, which tries to create
    a lot of clients as it is possible. Each client has time to make single
    Read/Write.

    Im wondering, why CPU usage on the recipient's side (hello) grows with the
    number of clients on the sender's side?

    This looks obvious, server needs to do more work to serve more
    clients, so it consumes more CPU.

    (using non-blocking mode).

    Sender and recipient are placed on a different nodes. CPU usage is around
    40% if it works with a single sender.

    But if i'll create an additional sender (which is also on a different node),
    CPU usage grows up to 60%.

    I would suggest using non-blocking mode. The blocking mode is
    basically un-supported performance-wise and can require an infinite
    number of threads.

    Also, when this break happens:
    if i >= 300000 {
    break
    }
    it's not necessary that all connections are served. There can be
    unbounded number of unserved connections.


    On Tue, Mar 25, 2014 at 8:47 AM, wrote:
    Hi guys.

    Foreword:
    Im trying to find out a reason why the following sample has decrease
    in
    performance (if fd is set to non-blocking mode).

    Code: http://play.golang.org/p/Itb9Gs7RCp
    Go-version: go1.2 linux/amd64

    What handler does:

    Accept connection
    Do several number of read/write
    Close connection

    The getFile() function is a trick to set an underlying fd into a
    blocking
    mode.

    I did some tests and here is the result:

    In non-blocking mode: (do not call getFile()):

    CPU: ~40% (boost to 45%)


    Top:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    10650 root 19 0 63596 8488 1548 S 39.3 0.5 0:03.63 hello


    Prof:

    (pprof) top
    Total: 2670 samples
    1362 51.0% 51.0% 1437 53.8% syscall.Syscall
    122 4.6% 55.6% 122 4.6% runtime.futex
    119 4.5% 60.0% 119 4.5% runtime.epollctl
    118 4.4% 64.5% 118 4.4% runtime.epollwait
    61 2.3% 66.7% 136 5.1% syscall.Syscall6
    53 2.0% 68.7% 109 4.1% runtime.mallocgc
    38 1.4% 70.1% 38 1.4% runtime.xchg
    35 1.3% 71.5% 35 1.3% scanblock
    23 0.9% 72.3% 151 5.7% runtime.entersyscall
    22 0.8% 73.1% 22 0.8% runtime.markallocated


    Prof (in order):

    (pprof) top20 -cum
    Total: 2512 samples
    0 0.0% 0.0% 1562 62.2% System
    1161 46.2% 46.2% 1278 50.9% syscall.Syscall
    0 0.0% 46.2% 945 37.6% runtime.gosched0
    2 0.1% 46.3% 532 21.2% main.main
    0 0.0% 46.3% 532 21.2% runtime.main
    1 0.0% 46.3% 512 20.4% net.(*TCPListener).Accept
    10 0.4% 46.7% 511 20.3% net.(*TCPListener).AcceptTCP
    9 0.4% 47.1% 467 18.6% net.(*netFD).accept
    11 0.4% 47.5% 413 16.4% main.handleConnection
    20 0.8% 48.3% 271 10.8% runtime.entersyscall
    4 0.2% 48.5% 243 9.7% net.accept
    1 0.0% 48.5% 234 9.3% runtime.notewakeup
    2 0.1% 48.6% 231 9.2% runtime.futexwakeup
    229 9.1% 57.7% 229 9.1% runtime.futex
    61 2.4% 60.2% 216 8.6% syscall.Syscall6
    4 0.2% 60.3% 190 7.6% syscall.Accept4
    2 0.1% 60.4% 157 6.2% syscall.accept4
    1 0.0% 60.4% 150 6.0% net.(*conn).Read
    11 0.4% 60.9% 149 5.9% net.(*netFD).Read
    130 5.2% 66.0% 130 5.2% runtime.epollwait


    In blocking mode: (call getFile())

    CPU: ~20% (boost to 25%)


    Top:

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    10708 root 20 0 153m 8880 1540 S 22.3 0.5 0:03.33 hello


    Prof:

    (pprof) top
    Total: 1996 samples
    1225 61.4% 61.4% 1254 62.8% syscall.Syscall
    92 4.6% 66.0% 92 4.6% runtime.epollctl
    40 2.0% 68.0% 72 3.6% runtime.mallocgc
    28 1.4% 69.4% 28 1.4% syscall.RawSyscall
    25 1.3% 70.6% 25 1.3% scanblock
    24 1.2% 71.8% 29 1.5% syscall.Syscall6
    22 1.1% 72.9% 22 1.1% runtime.memmove
    20 1.0% 73.9% 20 1.0% runtime.exitsyscall
    19 1.0% 74.9% 29 1.5% runtime.entersyscall
    15 0.8% 75.7% 45 2.3% runtime.deferreturn


    Prof (in order):

    (pprof) top20 -cum
    Total: 1996 samples
    0 0.0% 0.0% 1370 68.6% System
    1225 61.4% 61.4% 1254 62.8% syscall.Syscall
    0 0.0% 61.4% 623 31.2% runtime.gosched0
    10 0.5% 61.9% 328 16.4% main.handleConnection
    1 0.1% 61.9% 295 14.8% main.main
    0 0.0% 61.9% 295 14.8% runtime.main
    0 0.0% 61.9% 169 8.5% net.(*TCPListener).Accept
    3 0.2% 62.1% 169 8.5% net.(*TCPListener).AcceptTCP
    5 0.3% 62.3% 149 7.5% net.(*netFD).accept
    11 0.6% 62.9% 128 6.4% fmt.Fprintf
    3 0.2% 63.0% 114 5.7% main.getFile
    1 0.1% 63.1% 106 5.3% net.(*conn).Close
    2 0.1% 63.2% 105 5.3% net.(*netFD).Close
    2 0.1% 63.3% 96 4.8% net.(*conn).File
    0 0.0% 63.3% 94 4.7% net.(*netFD).decref
    1 0.1% 63.3% 94 4.7% net.(*netFD).dup
    0 0.0% 63.3% 92 4.6% net.(*netFD).destroy
    92 4.6% 67.9% 92 4.6% runtime.epollctl
    40 2.0% 69.9% 72 3.6% runtime.mallocgc
    6 0.3% 70.2% 71 3.6% net.(*conn).Write



    Based on Top output, i can figure out that non-blocking mode takes
    much
    more
    CPU resources than blocking mode. But blocking mode works a bit
    slower
    (total execution time).
    I think, that the reason could be in scheduler (switching tasks
    within a
    single os thread), but i'm not sure.

    Performance is very important for me. CPU boosted to 100% when the
    handler
    has a lot of read/write functions (non-blocking mode).
    I don't know what to do. I can't keep CPU usage so high by using
    non-blocking mode. And i can't use blocking mode, cause of:

    I don't know what happened with the scheduler after setting fd to
    blocking
    mode.
    Total execution time is a bit slower.


    I would be glad of any help. Thanks.


    --
    You received this message because you are subscribed to the Google
    Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it,
    send
    an
    email to [email protected].
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups
    "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to [email protected] <javascript:>.
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedMar 25, '14 at 12:04p
activeMar 26, '14 at 8:41a
posts6
users2
websitegolang.org

2 users in discussion

Ilarkin: 4 posts Dmitry Vyukov: 2 posts

People

Translate

site design / logo © 2023 Grokbase