FAQ
Hi guys,

I'm porting bindings for libvips (a super fast image processor) over CGO.
See: https://github.com/daddye/vips

There aren't many test on this repo since is part of a proprietary code
which already have a test suite, in future I'll port more of them.

So, my problem:

I spent all the searching for a solution of a problem with *CGO and
Channels*.

I have a quite big app, while my code works and is very fast, when I ran my
benchmark suite results *1000x worse*. So I'm not sure if I'm doing
something wrong or is a know problem.

Here a simplified version of the code:

func main() {
         runtime.GOMAXPROCS(0)

         // prepare communication protocol
         urls := make(chan string)
         wg := new(sync.WaitGroup)

         // fire a bunch of workers
         for i := 0; i < env.MaxProcs; i++ {
                 wg.Add(1)
                 go func(i int) {
                         fmt.Printf("starting worker %d...\n", i)
                         defer wg.Done()
                         for url := range urls {
                                 C.resize_image(url) // takes around 30ms
                         }
                 }(i)
         }

         // db connection (native go)....

         var url string
         for i := 0; iter.Scan(&url) && i < 1e4; i++ { // Scan takes around 8ms
                 // with a time.Sleep(10 * time.Millisecond) results are 1000x better!
                 urls <- url
         }
         err = iter.Close()

         // wait
         close(urls)
         wg.Wait()
}


When I run pprof:

     3512 76.4% 76.4% 3512 76.4% runtime.mach_semaphore_wait
      968 21.1% 97.5% 968 21.1% runtime.cgocall
      107 2.3% 99.8% 107 2.3% runtime.kevent
        3 0.1% 99.9% 3 0.1% runtime.usleep
        2 0.0% 99.9% 2 0.0% syscall.Syscall
        1 0.0% 99.9% 1 0.0% math/rand.Float64
        1 0.0% 100.0% 1 0.0% runtime.MHeap_LookupMaybe
        1 0.0% 100.0% 1 0.0% runtime.MSpan_Sweep
        1 0.0% 100.0% 1 0.0% runtime.memclr
        0 0.0% 100.0% 6 0.1% GC
        0 0.0% 100.0% 3618 78.7% System
        0 0.0% 100.0% 968 21.1% github.com/daddye/vips.Resize
        0 0.0% 100.0% 28 0.6% github.com/daddye/vips._Cfunc_im_close
        0 0.0% 100.0% 278 6.0% github.com/daddye/vips._Cfunc_vips_affine_interpolator
        0 0.0% 100.0% 18 0.4% github.com/daddye/vips._Cfunc_vips_colourspace_0
        0 0.0% 100.0% 43 0.9% github.com/daddye/vips._Cfunc_vips_copy_0
        0 0.0% 100.0% 258 5.6% github.com/daddye/vips._Cfunc_vips_embed_extend
        0 0.0% 100.0% 35 0.8% github.com/daddye/vips._Cfunc_vips_image_new
        0 0.0% 100.0% 16 0.3% github.com/daddye/vips._Cfunc_vips_jpegload_buffer_seq
        0 0.0% 100.0% 15 0.3% github.com/daddye/vips._Cfunc_vips_jpegload_buffer_shrink


So, here things I tried:

1. Use another CGO binding: image magick - Same
2. Use runtime.Gosched() (tried to put it in different places) - Same
3. Use runtime.LockOSThread() (tried to put it in different places) - Same
4. Add a time.Sleep(10*time.Milliseconds) - the commented line - I get back
good results!
5. Test on linux (ubuntu lts 12) - Same and pretty much same pprof
6. Remove CGO and use a native call - Very fast.
7. Avoid channels just `go func()` - Very fast!

So, I know cgo ffi requires 40ns for a nop call, however since resize
images can take from 20ms to 1s (in my case), I think it's worth.
I tried all native go packages for image resizing, but unfortunately are
even slower than a CGO in stuck with the semaphore. :'(

As said, in "production" I don't see this issue because between a schedule
`urls <- url` and another there are few operations that takes some time, so
it's "fine".

However, now I'm curious, what is the problem here? How can I avoid to be
trapped in a similar scenario?

Thanks a lot!
DAddYE

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • James Bardin at Sep 2, 2014 at 8:08 pm

    On Tuesday, September 2, 2014 3:17:42 PM UTC-4, Davide D'Agostino wrote:

    When I run pprof:

    Your pprof profile is broken on osX. You can patch your kernel, or
    preferably use Linux.

    What version of Go are you using?

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Davide D'Agostino at Sep 2, 2014 at 8:13 pm
    I'm using go 1.3.1

    Here pprof on a linux box:

    Total: 6250 samples
         6188 99.0% 99.0% 6191 99.1% runtime.cgocall
           21 0.3% 99.3% 21 0.3% runtime.usleep
            7 0.1% 99.5% 7 0.1% retake
            7 0.1% 99.6% 7 0.1% runtime.rtsigprocmask
            4 0.1% 99.6% 4 0.1% syscall.Syscall
            3 0.0% 99.7% 3 0.0% runtime.epollwait
            3 0.0% 99.7% 3 0.0% runtime.futex
            2 0.0% 99.8% 2 0.0% ExternalCode
            2 0.0% 99.8% 2 0.0% runtime.memclr
            1 0.0% 99.8% 2 0.0% exitsyscallfast
            1 0.0% 99.8% 6192 99.1% github.com/daddye/vips.Resize
            1 0.0% 99.8% 1 0.0%
    github.com/gocql/gocql.(*frame).readBytes
            1 0.0% 99.9% 1 0.0% runqgrab
            1 0.0% 99.9% 1 0.0% runtime.atomicload64
            1 0.0% 99.9% 1 0.0% runtime.cas
            1 0.0% 99.9% 1 0.0% runtime.entersyscall
            1 0.0% 99.9% 1 0.0% runtime.fastrand1
            1 0.0% 99.9% 1 0.0% runtime.noteclear
            1 0.0% 100.0% 1 0.0% runtime.purgecachedstats
            1 0.0% 100.0% 1 0.0% runtime.unlock

    Same result I switch with a different CGO library i.e. image magick.
    On Tuesday, September 2, 2014 12:17:42 PM UTC-7, Davide D'Agostino wrote:

    Hi guys,

    I'm porting bindings for libvips (a super fast image processor) over CGO.
    See: https://github.com/daddye/vips

    There aren't many test on this repo since is part of a proprietary code
    which already have a test suite, in future I'll port more of them.

    So, my problem:

    I spent all the searching for a solution of a problem with *CGO and
    Channels*.

    I have a quite big app, while my code works and is very fast, when I ran
    my benchmark suite results *1000x worse*. So I'm not sure if I'm doing
    something wrong or is a know problem.

    Here a simplified version of the code:

    func main() {
    runtime.GOMAXPROCS(0)

    // prepare communication protocol
    urls := make(chan string)
    wg := new(sync.WaitGroup)

    // fire a bunch of workers
    for i := 0; i < env.MaxProcs; i++ {
    wg.Add(1)
    go func(i int) {
    fmt.Printf("starting worker %d...\n", i)
    defer wg.Done()
    for url := range urls {
    C.resize_image(url) // takes around 30ms
    }
    }(i)
    }

    // db connection (native go)....

    var url string
    for i := 0; iter.Scan(&url) && i < 1e4; i++ { // Scan takes around 8ms
    // with a time.Sleep(10 * time.Millisecond) results are 1000x better!
    urls <- url
    }
    err = iter.Close()

    // wait
    close(urls)
    wg.Wait()
    }


    When I run pprof:

    3512 76.4% 76.4% 3512 76.4% runtime.mach_semaphore_wait
    968 21.1% 97.5% 968 21.1% runtime.cgocall
    107 2.3% 99.8% 107 2.3% runtime.kevent
    3 0.1% 99.9% 3 0.1% runtime.usleep
    2 0.0% 99.9% 2 0.0% syscall.Syscall
    1 0.0% 99.9% 1 0.0% math/rand.Float64
    1 0.0% 100.0% 1 0.0% runtime.MHeap_LookupMaybe
    1 0.0% 100.0% 1 0.0% runtime.MSpan_Sweep
    1 0.0% 100.0% 1 0.0% runtime.memclr
    0 0.0% 100.0% 6 0.1% GC
    0 0.0% 100.0% 3618 78.7% System
    0 0.0% 100.0% 968 21.1% github.com/daddye/vips.Resize
    0 0.0% 100.0% 28 0.6% github.com/daddye/vips._Cfunc_im_close
    0 0.0% 100.0% 278 6.0% github.com/daddye/vips._Cfunc_vips_affine_interpolator
    0 0.0% 100.0% 18 0.4% github.com/daddye/vips._Cfunc_vips_colourspace_0
    0 0.0% 100.0% 43 0.9% github.com/daddye/vips._Cfunc_vips_copy_0
    0 0.0% 100.0% 258 5.6% github.com/daddye/vips._Cfunc_vips_embed_extend
    0 0.0% 100.0% 35 0.8% github.com/daddye/vips._Cfunc_vips_image_new
    0 0.0% 100.0% 16 0.3% github.com/daddye/vips._Cfunc_vips_jpegload_buffer_seq
    0 0.0% 100.0% 15 0.3% github.com/daddye/vips._Cfunc_vips_jpegload_buffer_shrink


    So, here things I tried:

    1. Use another CGO binding: image magick - Same
    2. Use runtime.Gosched() (tried to put it in different places) - Same
    3. Use runtime.LockOSThread() (tried to put it in different places) - Same
    4. Add a time.Sleep(10*time.Milliseconds) - the commented line - I get
    back good results!
    5. Test on linux (ubuntu lts 12) - Same and pretty much same pprof
    6. Remove CGO and use a native call - Very fast.
    7. Avoid channels just `go func()` - Very fast!

    So, I know cgo ffi requires 40ns for a nop call, however since resize
    images can take from 20ms to 1s (in my case), I think it's worth.
    I tried all native go packages for image resizing, but unfortunately are
    even slower than a CGO in stuck with the semaphore. :'(

    As said, in "production" I don't see this issue because between a schedule
    `urls <- url` and another there are few operations that takes some time, so
    it's "fine".

    However, now I'm curious, what is the problem here? How can I avoid to be
    trapped in a similar scenario?

    Thanks a lot!
    DAddYE
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • James Bardin at Sep 2, 2014 at 8:20 pm
    That looks more reasonable, and nearly all your pprof sample are in the cgo
    call; which would be expected. Do you still have the same change in
    execution time on Linux when you add/remove the sleep?

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Davide D'Agostino at Sep 2, 2014 at 8:30 pm

    On Tuesday, September 2, 2014 1:20:42 PM UTC-7, James Bardin wrote:
    That looks more reasonable, and nearly all your pprof sample are in the
    cgo call; which would be expected. Do you still have the same change in
    execution time on Linux when you add/remove the sleep?
    Sure here also benchmark results:

    1. Without sleep:

    resize speed mean: 327.630885ms, min: 100.150452ms, max: 462.495835ms, 99%:
    429.954171ms
    Total: 6250 samples
         6188 99.0% 99.0% 6191 99.1% runtime.cgocall
           21 0.3% 99.3% 21 0.3% runtime.usleep
            7 0.1% 99.5% 7 0.1% retake
            7 0.1% 99.6% 7 0.1% runtime.rtsigprocmask
            4 0.1% 99.6% 4 0.1% syscall.Syscall
            3 0.0% 99.7% 3 0.0% runtime.epollwait
            3 0.0% 99.7% 3 0.0% runtime.futex
            2 0.0% 99.8% 2 0.0% ExternalCode
            2 0.0% 99.8% 2 0.0% runtime.memclr
            1 0.0% 99.8% 2 0.0% exitsyscallfast
            1 0.0% 99.8% 6192 99.1% github.com/daddye/vips.Resize
            1 0.0% 99.8% 1 0.0%
    github.com/gocql/gocql.(*frame).readBytes
            1 0.0% 99.9% 1 0.0% runqgrab
            1 0.0% 99.9% 1 0.0% runtime.atomicload64
            1 0.0% 99.9% 1 0.0% runtime.cas
            1 0.0% 99.9% 1 0.0% runtime.entersyscall
            1 0.0% 99.9% 1 0.0% runtime.fastrand1
            1 0.0% 99.9% 1 0.0% runtime.noteclear
            1 0.0% 100.0% 1 0.0% runtime.purgecachedstats
            1 0.0% 100.0% 1 0.0% runtime.unlock

      2. With 10ms sleep

    resize speed mean: 28.551381ms, min: 4.666236ms, max: 61.589934ms, 99%:
    52.274779ms
    Total: 1265 samples
         1101 87.0% 87.0% 1105 87.4% runtime.cgocall
           40 3.2% 90.2% 40 3.2% runtime.usleep
           40 3.2% 93.4% 41 3.2% syscall.Syscall
           23 1.8% 95.2% 23 1.8% runtime.futex
            7 0.6% 95.7% 7 0.6% runtime.epollwait
            4 0.3% 96.0% 4 0.3% runtime.MSpan_Sweep
            3 0.2% 96.3% 3 0.2% ExternalCode
            3 0.2% 96.5% 3 0.2% findrunnable
            3 0.2% 96.8% 3 0.2% runtime.markscan
            3 0.2% 97.0% 3 0.2% runtime.rtsigprocmask
            2 0.2% 97.2% 2 0.2% pidleget
            2 0.2% 97.3% 2 0.2% runtime.atomicload
            2 0.2% 97.5% 3 0.2% runtime.entersyscall
            2 0.2% 97.6% 2 0.2% runtime.fastrand1
            2 0.2% 97.8% 2 0.2% runtime.gogo
            2 0.2% 97.9% 2 0.2% runtime.memmove
            2 0.2% 98.1% 2 0.2% sysmon
            1 0.1% 98.2% 1 0.1% MHeap_AllocLocked
            1 0.1% 98.3% 1 0.1% addstackroots
            1 0.1% 98.3% 16 1.3% bufio.(*Reader).Read

    As you can see in the second one, the speed is much much better, and even
    the standard deviation is much more reliable.

    Yep, the cgo calls are still here in pprof but the line `
    github.com/daddye/vips.Resize` now it's gone.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • James Bardin at Sep 2, 2014 at 8:45 pm

    On Tue, Sep 2, 2014 at 4:30 PM, Davide D'Agostino wrote:

    As you can see in the second one, the speed is much much better, and even
    the standard deviation is much more reliable.

    Yep, the cgo calls are still here in pprof but the line `
    github.com/daddye/vips.Resize` <http://github.com/daddye/vips.Resize> now
    it's gone.
    That may be just a side-effect of sampling. If you show he cumulative time
    in calls I bet it re-appears.

    To try and eliminate something else, try stubbing out the cgo call
    altogether. Maybe this has something to do with the db (you could also load
    the URLs into a static []string to eliminate the db from the equation).

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Davide D'Agostino at Sep 2, 2014 at 9:09 pm

    That may be just a side-effect of sampling. If you show he cumulative time
    in calls I bet it re-appears.
    To try and eliminate something else, try stubbing out the cgo call
    altogether. Maybe this has something to do with the db (you could also load
    the URLs into a static []string to eliminate the db from the equation).
    So, I added the -cum:

    Total: 1279 samples
            0 0.0% 0.0% 1190 93.0% runtime.gosched0
            1 0.1% 0.1% 1149 89.8% main.func·003
            0 0.0% 0.1% 1116 87.3%
    github.com/triggit/feeds/processor.ResizeVips
            1 0.1% 0.2% 1114 87.1% github.com/daddye/vips.Resize
         1111 86.9% 87.0% 1111 86.9% runtime.cgocall
            0 0.0% 87.0% 488 38.2%
    github.com/daddye/vips._Cfunc_vips_jpegsave_custom
            0 0.0% 87.0% 267 20.9%
    github.com/daddye/vips._Cfunc_vips_affine_interpolator
            0 0.0% 87.0% 233 18.2%
    github.com/daddye/vips._Cfunc_vips_embed_extend
            0 0.0% 87.0% 82 6.4% System
           38 3.0% 90.0% 40 3.1% syscall.Syscall
           34 2.7% 92.7% 34 2.7% runtime.usleep
           27 2.1% 94.8% 27 2.1% runtime.futex
            0 0.0% 94.8% 26 2.0% io/ioutil.WriteFile
            0 0.0% 94.8% 25 2.0%
    github.com/daddye/vips._Cfunc_im_close
            0 0.0% 94.8% 25 2.0%
    github.com/daddye/vips._Cfunc_vips_image_new
            0 0.0% 94.8% 21 1.6%
    github.com/daddye/vips._Cfunc_vips_jpegload_buffer_seq
            0 0.0% 94.8% 21 1.6% github.com/gocql/gocql.(*Conn).recv
            0 0.0% 94.8% 21 1.6% github.com/gocql/gocql.(*Conn).serve
            1 0.1% 94.8% 19 1.5% bufio.(*Reader).Read
            0 0.0% 94.8% 19 1.5%
    github.com/daddye/vips._Cfunc_vips_copy_0

      So with sleep:
    1 0.1% 0.2% 1114 87.1% github.com/daddye/vips.Resize

    Without the sleep:
    1 0.0% 99.8% 6192 99.1% github.com/daddye/vips.Resize
    To try and eliminate something else, try stubbing out the cgo call
    altogether. Maybe this has something to do with the db (you could also load
    the URLs into a static []string to eliminate the db from the equation).

    I removed all the db part already, it's a bit worse.


    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Tamás Gulácsi at Sep 2, 2014 at 8:27 pm
    2014. szeptember 2., kedd 21:17:42 UTC+2 időpontban Davide D'Agostino a
    következőt írta:
    Hi guys,

    So, I know cgo ffi requires 40ns for a nop call, however since resize
    images can take from 20ms to 1s (in my case), I think it's worth.
    I tried all native go packages for image resizing, but unfortunately are
    even slower than a CGO in stuck with the semaphore. :'(

    As said, in "production" I don't see this issue because between a schedule
    `urls <- url` and another there are few operations that takes some time, so
    it's "fine".
    Just shot in the dark: have you tried buffered channels (with GOMAXPROCS
    size, or 1000)? Explicit GOMAXPROCS? GOMAXPROCS=1 vs. GOMAXPROCS=2 ?
    But yes, this seems strange.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Davide D'Agostino at Sep 2, 2014 at 8:34 pm

    Just shot in the dark: have you tried buffered channels (with GOMAXPROCS size, or 1000)? Explicit GOMAXPROCS? GOMAXPROCS=1 vs. GOMAXPROCS=2 ?
    But yes, this seems strange.
    I totally forgot to mention that, yep I tried both:

    1. with GOMAXPROCS=1 => FAST
    2. with buffered channels => SAME

    Basically you can cheat the benchmark just doing some like 0.1ms * GOMAXPROCS

    In this way seems it can scale horizontally.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedSep 2, '14 at 7:17p
activeSep 2, '14 at 9:09p
posts9
users3
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase