FAQ
Hi, I'm trying to use Go as the communications frontend for a C++-based
network server mesh, and as part of my explorations I've come across what
looks like a massive performance problem when calling C from multiple
concurrent goroutines.

As some background, I was trying to determine the relative performance and
maintenance benefits of managing a string-to-anonymous-ID mapping on the Go
side vs. just passing the converted C string in, so I have this test code
to check the timing:

func getId(string) int {
/// details are irrelevant here
}

func TimeTest_strings(id int, ch chan int) {
var count = 10000

fmt.Printf("%d wrap start\n", id)
startWrap := time.Now()
for i := 0; i < count; i++ {
cs := C.CString(fmt.Sprintf("%d", i%100))
defer C.free(unsafe.Pointer(cs))

C.timeTest_stringWrap(cs)
}
stopWrap := time.Now()
fmt.Printf("%d marshall start\n", id)
startMarshall := time.Now()
for i := 0; i < count; i++ {
cs := getId(fmt.Sprintf("%d", i%100))
C.timeTest_stringMarshall(C.int(cs))
}
stopMarshall := time.Now()

fmt.Printf("%d Wrap: %f/sec\tMarshall: %f/sec\n", id,
float64(count)/stopWrap.Sub(startWrap).Seconds(),
float64(count)/stopMarshall.Sub(startMarshall).Seconds())

ch <- 1
}

func main {
ch := make(chan int)
go TimeTest_strings(1, ch)
go TimeTest_strings(2, ch)
<-ch
<-ch
}


If I only have a single 'go TimeTest_strings' call, this performs very well
- about 1.5M operations per second per thread for stringWrap, and about
2M/s/t for stringMarshall. However, if I have the second goroutine
invocation, performance plummets to about 4.3K and 8K, respectively - that
is, a factor of 250!

Even if I change the time test inner loops to only use a fixed ID (instead
of calling getId), or have the C-string version only create and delete the
CString, the performance is still pretty much in the toilet (same order of
magnitude, certainly not the level of performance I need in any case).

I have a few of theories about what might be going on:

1. All C calls are wrapped by a global mutex
2. All C calls are handed off to a single C worker thread via a hidden
channel
3. C calls count as a blocking operation in the context of a goroutine

Are any of these even remotely accurate, and if so, is there any way to not
have the performance implications caused by it?

I'd really rather not have to write the backend portions of this service in
Go for various reasons (mostly that I can't find an equivalent to
boost::multi_index_container), but it's looking like I might have to write
the network stack in C++ as well - also something I'd rather not do.
Although if someone *does* know of a boost::multi_index_container
equivalent that has a license compatible with a proprietary codebase, I'd
be willing to try that instead.

Thanks.

--

Search Discussions

  • Kyle Lemons at Oct 9, 2012 at 7:52 pm
    Have you tried profiling<http://blog.golang.org/2011/06/profiling-go-programs.html>your
    code in both cases and seeing where the proportions differ?
    On Tue, Oct 9, 2012 at 11:02 AM, j shagam wrote:

    Hi, I'm trying to use Go as the communications frontend for a C++-based
    network server mesh, and as part of my explorations I've come across what
    looks like a massive performance problem when calling C from multiple
    concurrent goroutines.

    As some background, I was trying to determine the relative performance and
    maintenance benefits of managing a string-to-anonymous-ID mapping on the Go
    side vs. just passing the converted C string in, so I have this test code
    to check the timing:

    func getId(string) int {
    /// details are irrelevant here
    }

    func TimeTest_strings(id int, ch chan int) {
    var count = 10000
    Don't do benchmarks like this. Use the "testing" framework for benchmarks,
    either by making a BenchmarkXxx function or by passing a func(b *testing.B)
    into the testing.Benchmark function and printing the result.

    fmt.Printf("%d wrap start\n", id)
    startWrap := time.Now()
    for i := 0; i < count; i++ {
    cs := C.CString(fmt.Sprintf("%d", i%100))
    defer C.free(unsafe.Pointer(cs))
    I agree with Brad; either free immediately after your call or use unsafe so
    that you can re-use the same cstring.

    C.timeTest_stringWrap(cs)
    }
    stopWrap := time.Now()
    fmt.Printf("%d marshall start\n", id)
    startMarshall := time.Now()
    for i := 0; i < count; i++ {
    cs := getId(fmt.Sprintf("%d", i%100))
    C.timeTest_stringMarshall(C.int(cs))
    }
    stopMarshall := time.Now()

    fmt.Printf("%d Wrap: %f/sec\tMarshall: %f/sec\n", id,
    float64(count)/stopWrap.Sub(startWrap).Seconds(),
    float64(count)/stopMarshall.Sub(startMarshall).Seconds())

    ch <- 1
    }

    func main {
    ch := make(chan int)
    go TimeTest_strings(1, ch)
    go TimeTest_strings(2, ch)
    <-ch
    <-ch
    }


    If I only have a single 'go TimeTest_strings' call, this performs very
    well - about 1.5M operations per second per thread for stringWrap, and
    about 2M/s/t for stringMarshall. However, if I have the second goroutine
    invocation, performance plummets to about 4.3K and 8K, respectively - that
    is, a factor of 250!

    Even if I change the time test inner loops to only use a fixed ID (instead
    of calling getId), or have the C-string version only create and delete the
    CString, the performance is still pretty much in the toilet (same order of
    magnitude, certainly not the level of performance I need in any case).

    I have a few of theories about what might be going on:

    1. All C calls are wrapped by a global mutex
    I believe this is not the case. They are, however, wrapped in a stub that
    translates the Go to the C calling convention, which makes it heavier than
    a normal function call.

    2. All C calls are handed off to a single C worker thread via a hidden
    channel
    This is not the case.

    3. C calls count as a blocking operation in the context of a goroutine
    All function calls are blocking unless they have "go" in front of them.

    Are any of these even remotely accurate, and if so, is there any way to
    not have the performance implications caused by it?
    If I had to make a wild unsubstantiated guess, I would guess that the
    malloc in CString is the bottleneck.

    I'd really rather not have to write the backend portions of this service in
    Go for various reasons (mostly that I can't find an equivalent to
    boost::multi_index_container), but it's looking like I might have to write
    the network stack in C++ as well - also something I'd rather not do.
    Although if someone *does* know of a boost::multi_index_container
    equivalent that has a license compatible with a proprietary codebase, I'd
    be willing to try that instead.
    You can roll "multi-index containers" in Go pretty easily, though it won't
    be as pretty or as convenient. It will be more explicit and possibly more
    performant, because it will be tailored to your requirements.

    --
  • J shagam at Oct 9, 2012 at 8:44 pm

    On Tuesday, October 9, 2012 11:51:10 AM UTC-7, Kyle Lemons wrote:
    Have you tried profiling<http://blog.golang.org/2011/06/profiling-go-programs.html>your code in both cases and seeing where the proportions differ?
    I have not. I will give that a try, thanks.

    Don't do benchmarks like this. Use the "testing" framework for
    benchmarks, either by making a BenchmarkXxx function or by passing a func(b
    *testing.B) into the testing.Benchmark function and printing the result.
    This wasn't intended as a formal benchmark, it was just a quick test I
    could throw into some existing code to test out a theory.

    fmt.Printf("%d wrap start\n", id)
    startWrap := time.Now()
    for i := 0; i < count; i++ {
    cs := C.CString(fmt.Sprintf("%d", i%100))
    defer C.free(unsafe.Pointer(cs))
    I agree with Brad; either free immediately after your call or use unsafe
    so that you can re-use the same cstring.
    That's helpful, thanks.

    1. All C calls are wrapped by a global mutex
    I believe this is not the case. They are, however, wrapped in a stub that
    translates the Go to the C calling convention, which makes it heavier than
    a normal function call.
    Yes, I'm expecting C calls to be a bit heavier, but I'm trying to determine
    why in a multiply-threaded context they take 250x as long (roughly) to
    execute as in a singly-threaded context, when there's no synchronization
    primitives involved.

    2. All C calls are handed off to a single C worker thread via a hidden
    channel
    This is not the case.

    3. C calls count as a blocking operation in the context of a goroutine
    All function calls are blocking unless they have "go" in front of them.
    I apologize for misusing the term 'blocking' - what I meant was that it
    causes a context switch. I was thinking too far down in the UNIX systems
    stack, where blocking calls were at least traditionally used as a shorthand
    for 'this should sched_yield() first.'

    Are any of these even remotely accurate, and if so, is there any way to
    not have the performance implications caused by it?
    If I had to make a wild unsubstantiated guess, I would guess that the
    malloc in CString is the bottleneck.
    That would be a performance bottleneck, but would that explain why there's
    this massive order of magnitude difference in performance between single-
    and multi-threaded? That's what I'm trying to understand.

    I'd really rather not have to write the backend portions of this service
    in Go for various reasons (mostly that I can't find an equivalent to
    boost::multi_index_container), but it's looking like I might have to write
    the network stack in C++ as well - also something I'd rather not do.
    Although if someone *does* know of a boost::multi_index_container
    equivalent that has a license compatible with a proprietary codebase, I'd
    be willing to try that instead.
    You can roll "multi-index containers" in Go pretty easily, though it won't
    be as pretty or as convenient. It will be more explicit and possibly more
    performant, because it will be tailored to your requirements.
    How do you go about making an indexed container at all without going
    through the rigamarole of rolling your own binary tree? (Also, boost does a
    damned good job of being very optimizable, and in this case I'm trying to
    optimize programmer time more than runtime.)

    --
  • Kyle Lemons at Oct 9, 2012 at 7:14 pm
    On Tue, Oct 9, 2012 at 12:00 PM, j shagam wrote:
    On Tuesday, October 9, 2012 11:51:10 AM UTC-7, Kyle Lemons wrote:

    Have you tried profiling<http://blog.golang.org/2011/06/profiling-go-programs.html>your code in both cases and seeing where the proportions differ?
    I have not. I will give that a try, thanks.

    Don't do benchmarks like this. Use the "testing" framework for
    benchmarks, either by making a BenchmarkXxx function or by passing a func(b
    *testing.B) into the testing.Benchmark function and printing the result.
    This wasn't intended as a formal benchmark, it was just a quick test I
    could throw into some existing code to test out a theory.
    The second, passing a func(b *testing.B) is intended for that purpose. It
    does a good job of guessing on counts that gets a representative sample
    without you having to manually raise/decrease the count so that it
    completes in a reasonable amount of time. If you do the first, however,
    and put it in a _test file, you get the profiling options in go test for
    free.

    fmt.Printf("%d wrap start\n", id)
    startWrap := time.Now()
    for i := 0; i < count; i++ {
    cs := C.CString(fmt.Sprintf("%d", i%100))
    defer C.free(unsafe.Pointer(cs))
    I agree with Brad; either free immediately after your call or use unsafe
    so that you can re-use the same cstring.
    That's helpful, thanks.

    1. All C calls are wrapped by a global mutex
    I believe this is not the case. They are, however, wrapped in a stub
    that translates the Go to the C calling convention, which makes it heavier
    than a normal function call.
    Yes, I'm expecting C calls to be a bit heavier, but I'm trying to
    determine why in a multiply-threaded context they take 250x as long
    (roughly) to execute as in a singly-threaded context, when there's no
    synchronization primitives involved.

    2. All C calls are handed off to a single C worker thread via a hidden
    channel
    This is not the case.

    3. C calls count as a blocking operation in the context of a goroutine
    All function calls are blocking unless they have "go" in front of them.
    I apologize for misusing the term 'blocking' - what I meant was that it
    causes a context switch. I was thinking too far down in the UNIX systems
    stack, where blocking calls were at least traditionally used as a shorthand
    for 'this should sched_yield() first.'
    Any memory allocation, including any function call because of segmented
    stacks, could potentially incur a context switch. Your defers are also
    allocating space and could (if memory serves) cause a stack frame
    reallocation and context switch.

    Are any of these even remotely accurate, and if so, is there any way to
    not have the performance implications caused by it?
    If I had to make a wild unsubstantiated guess, I would guess that the
    malloc in CString is the bottleneck.
    That would be a performance bottleneck, but would that explain why there's
    this massive order of magnitude difference in performance between single-
    and multi-threaded? That's what I'm trying to understand.

    I'd really rather not have to write the backend portions of this service
    in Go for various reasons (mostly that I can't find an equivalent to
    boost::multi_index_container), but it's looking like I might have to write
    the network stack in C++ as well - also something I'd rather not do.
    Although if someone *does* know of a boost::multi_index_container
    equivalent that has a license compatible with a proprietary codebase, I'd
    be willing to try that instead.
    You can roll "multi-index containers" in Go pretty easily, though it
    won't be as pretty or as convenient. It will be more explicit and possibly
    more performant, because it will be tailored to your requirements.
    How do you go about making an indexed container at all without going
    through the rigamarole of rolling your own binary tree? (Also, boost does a
    damned good job of being very optimizable, and in this case I'm trying to
    optimize programmer time more than runtime.)

    --

    --
  • Bryanturley at Oct 9, 2012 at 8:18 pm
    You know you have access to the source right? No need for theories on how
    it might work http://golang.org/src/pkg/runtime/cgocall.c?h=cgo among a few
    other files.
    Why not just write your own boost::multi_index_container replacement?

    Also what is your MAXPROCS? Is your c++ code thread safe?


    --
  • J shagam at Oct 9, 2012 at 7:58 pm

    On Tuesday, October 9, 2012 11:41:27 AM UTC-7, bryanturley wrote:
    You know you have access to the source right? No need for theories on how
    it might work http://golang.org/src/pkg/runtime/cgocall.c?h=cgo among a
    few other files.
    Why not just write your own boost::multi_index_container replacement?
    I figure people on this list will be more familiar with the code - I'm not
    really interested in trying to parse through code when someone else's
    understanding will do. And multi_index_container is a very complicated
    piece of template-instantiated code that I'm not familiar enough with Go to
    make a suitable workalike for just yet.

    Also what is your MAXPROCS? Is your c++ code thread safe?
    I have not explicitly set MAXPROCS, and the C++ code is thread safe (in
    this case the functions that are being called don't even do anything).


    --
  • Bryanturley at Oct 9, 2012 at 9:39 pm

    I have not explicitly set MAXPROCS, and the C++ code is thread safe (in
    this case the functions that are being called don't even do anything).
    To my knowledge without settting http://golang.org/pkg/runtime/#GOMAXPROCS
    or the environment variable your code will only use a single cpu.
    Though I have read that will change in the not so distant future, from the
    docs "This call will go away when the scheduler improves."
    Probably won't help if you are trying to do nothing really fast though...

    If all your code is doing is malloc/free then your allocator not being
    thread happy might be your slowdown, that is speculation though.
    Definitely profile as has already been stated.

    --
  • Kyle Lemons at Oct 10, 2012 at 2:17 am

    On Tue, Oct 9, 2012 at 11:55 AM, j shagam wrote:
    On Tuesday, October 9, 2012 11:41:27 AM UTC-7, bryanturley wrote:

    You know you have access to the source right? No need for theories on
    how it might work http://golang.org/src/pkg/**runtime/cgocall.c?h=cgo<http://golang.org/src/pkg/runtime/cgocall.c?h=cgo>among a few other files.
    Why not just write your own boost::multi_index_container replacement?
    I figure people on this list will be more familiar with the code - I'm not
    really interested in trying to parse through code when someone else's
    understanding will do. And multi_index_container is a very complicated
    piece of template-instantiated code that I'm not familiar enough with Go to
    make a suitable workalike for just yet.
    Just make the code do what you want it to do. If you want O(1) access by
    name and easy iteration by age, then store pointers in a map and a slice
    and make an add function that stores the data in each "index":

    type stuff struct {
    mapByName map[string]*data
    orderByAge []*data
    }

    func (s *stuff) add(d *data) {...}

    Also what is your MAXPROCS? Is your c++ code thread safe?
    I have not explicitly set MAXPROCS, and the C++ code is thread safe (in
    this case the functions that are being called don't even do anything).


    --

    --
  • J shagam at Oct 9, 2012 at 7:57 pm

    On Tuesday, October 9, 2012 12:01:44 PM UTC-7, Kyle Lemons wrote:
    On Tue, Oct 9, 2012 at 11:55 AM, j shagam <jsh...@gmail.com <javascript:>>wrote:
    On Tuesday, October 9, 2012 11:41:27 AM UTC-7, bryanturley wrote:

    You know you have access to the source right? No need for theories on
    how it might work http://golang.org/src/pkg/**runtime/cgocall.c?h=cgo<http://golang.org/src/pkg/runtime/cgocall.c?h=cgo>among a few other files.
    Why not just write your own boost::multi_index_container replacement?
    I figure people on this list will be more familiar with the code - I'm
    not really interested in trying to parse through code when someone else's
    understanding will do. And multi_index_container is a very complicated
    piece of template-instantiated code that I'm not familiar enough with Go to
    make a suitable workalike for just yet.
    Just make the code do what you want it to do. If you want O(1) access by
    name and easy iteration by age, then store pointers in a map and a slice
    and make an add function that stores the data in each "index":
    An unordered index is easy to do in Go, but I need ordered indices.

    --
  • Kyle Lemons at Oct 9, 2012 at 7:11 pm

    On Tue, Oct 9, 2012 at 12:04 PM, j shagam wrote:
    On Tuesday, October 9, 2012 12:01:44 PM UTC-7, Kyle Lemons wrote:
    On Tue, Oct 9, 2012 at 11:55 AM, j shagam wrote:
    On Tuesday, October 9, 2012 11:41:27 AM UTC-7, bryanturley wrote:

    You know you have access to the source right? No need for theories on
    how it might work http://golang.org/src/pkg/**runt**ime/cgocall.c?h=cgo<http://golang.org/src/pkg/runtime/cgocall.c?h=cgo>among a few other files.
    Why not just write your own boost::multi_index_container replacement?
    I figure people on this list will be more familiar with the code - I'm
    not really interested in trying to parse through code when someone else's
    understanding will do. And multi_index_container is a very complicated
    piece of template-instantiated code that I'm not familiar enough with Go to
    make a suitable workalike for just yet.
    Just make the code do what you want it to do. If you want O(1) access by
    name and easy iteration by age, then store pointers in a map and a slice
    and make an add function that stores the data in each "index":
    An unordered index is easy to do in Go, but I need ordered indices.
    A slice with a binary search insert or a binary tree is easy to roll.
    There are also any number of implemenations out there you can use, though
    many use interface{} which has the overhead of a type assertion to extract
    data. See http://go-lang.cat-v.org/pure-go-libs under Data Structures for
    a few examples. It turns out that the code for them is short enough that
    you are often better off just dropping one in where you need it that is
    specific to the type in question.

    --
  • J shagam at Oct 9, 2012 at 7:24 pm

    On Tuesday, October 9, 2012 12:11:47 PM UTC-7, Kyle Lemons wrote:
    A slice with a binary search insert or a binary tree is easy to roll.
    There are also any number of implemenations out there you can use, though
    many use interface{} which has the overhead of a type assertion to extract
    data. See http://go-lang.cat-v.org/pure-go-libs under Data Structures
    for a few examples. It turns out that the code for them is short enough
    that you are often better off just dropping one in where you need it that
    is specific to the type in question.

    Okay, go-btree should do what I want. However, I do still need to
    potentially interface Go code with existing C++ code, so it would be nice
    to know what the root cause of this massive performance decrease is. I'll
    try profiling it and see what turns up.

    --
  • Brad Fitzpatrick at Oct 9, 2012 at 11:34 pm

    On Tue, Oct 9, 2012 at 11:02 AM, j shagam wrote:

    Hi, I'm trying to use Go as the communications frontend for a C++-based
    network server mesh, and as part of my explorations I've come across what
    looks like a massive performance problem when calling C from multiple
    concurrent goroutines.

    As some background, I was trying to determine the relative performance and
    maintenance benefits of managing a string-to-anonymous-ID mapping on the Go
    side vs. just passing the converted C string in, so I have this test code
    to check the timing:

    func getId(string) int {
    /// details are irrelevant here
    }

    func TimeTest_strings(id int, ch chan int) {
    var count = 10000

    fmt.Printf("%d wrap start\n", id)
    startWrap := time.Now()
    for i := 0; i < count; i++ {
    cs := C.CString(fmt.Sprintf("%d", i%100))
    defer C.free(unsafe.Pointer(cs))
    Aside: be aware that defer is not block-scoped, so you won't be freeing
    these 10,000 things until your function returns.

    C.timeTest_stringWrap(cs)
    }
    stopWrap := time.Now()
    fmt.Printf("%d marshall start\n", id)
    startMarshall := time.Now()
    for i := 0; i < count; i++ {
    cs := getId(fmt.Sprintf("%d", i%100))
    C.timeTest_stringMarshall(C.int(cs))
    }
    stopMarshall := time.Now()

    fmt.Printf("%d Wrap: %f/sec\tMarshall: %f/sec\n", id,
    float64(count)/stopWrap.Sub(startWrap).Seconds(),
    float64(count)/stopMarshall.Sub(startMarshall).Seconds())

    ch <- 1
    }

    func main {
    ch := make(chan int)
    go TimeTest_strings(1, ch)
    go TimeTest_strings(2, ch)
    <-ch
    <-ch
    }


    If I only have a single 'go TimeTest_strings' call, this performs very
    well - about 1.5M operations per second per thread for stringWrap, and
    about 2M/s/t for stringMarshall. However, if I have the second goroutine
    invocation, performance plummets to about 4.3K and 8K, respectively - that
    is, a factor of 250!

    Even if I change the time test inner loops to only use a fixed ID (instead
    of calling getId), or have the C-string version only create and delete the
    CString, the performance is still pretty much in the toilet (same order of
    magnitude, certainly not the level of performance I need in any case).

    I have a few of theories about what might be going on:

    1. All C calls are wrapped by a global mutex
    2. All C calls are handed off to a single C worker thread via a hidden
    channel
    3. C calls count as a blocking operation in the context of a goroutine
    I believe 3) is correct at least. But that just means the thread that the
    goroutine is running on can't run another goroutine until your C function
    returns. So worst case you'll have 2 threads (1 per goroutine), but that
    shouldn't explain your performance problems.

    Are any of these even remotely accurate, and if so, is there any way to
    not have the performance implications caused by it?

    I'd really rather not have to write the backend portions of this service
    in Go for various reasons (mostly that I can't find an equivalent to
    boost::multi_index_container), but it's looking like I might have to write
    the network stack in C++ as well - also something I'd rather not do.
    Although if someone *does* know of a boost::multi_index_container
    equivalent that has a license compatible with a proprietary codebase, I'd
    be willing to try that instead.

    Thanks.

    --

    --
  • J shagam at Oct 9, 2012 at 8:43 pm

    On Tuesday, October 9, 2012 11:09:38 AM UTC-7, Brad Fitzpatrick wrote:
    On Tue, Oct 9, 2012 at 11:02 AM, j shagam <jsh...@gmail.com <javascript:>>wrote:
    fmt.Printf("%d wrap start\n", id)
    startWrap := time.Now()
    for i := 0; i < count; i++ {
    cs := C.CString(fmt.Sprintf("%d", i%100))
    defer C.free(unsafe.Pointer(cs))
    Aside: be aware that defer is not block-scoped, so you won't be freeing
    these 10,000 things until your function returns.
    Good to know, thanks. That won't matter in the context of my actual
    service (where the Go function will just be calling the C function once)
    but for the sake of this test all that should do is skew my timing numbers
    somewhat.


    1. All C calls are wrapped by a global mutex
    2. All C calls are handed off to a single C worker thread via a hidden
    channel
    3. C calls count as a blocking operation in the context of a goroutine
    I believe 3) is correct at least. But that just means the thread that the
    goroutine is running on can't run another goroutine until your C function
    returns. So worst case you'll have 2 threads (1 per goroutine), but that
    shouldn't explain your performance problems.
    Yeah, that would be at worst an impact of 2-ish when running two threads,
    not 250-ish. But what I was trying to ask was whether it would also cause
    a goroutine context switch.

    --
  • Gijs at Oct 10, 2012 at 1:43 pm
    I have a few of theories about what might be going on:
    1. All C calls are wrapped by a global mutex
    2. All C calls are handed off to a single C worker thread via a hidden
    channel
    3. C calls count as a blocking operation in the context of a goroutine

    Are any of these even remotely accurate, and if so, is there any way to
    not have the performance implications caused by it?
    A couple of months ago while trying to learn all the ins and outs of cgo I
    came across an article/wikipage/note/bug ticket (can't remember) containing
    a small patch for cgo which would significantly increase performance of
    calls to C code at the expense of potential stability concerning calling
    the C code concurrently. I'm kicking myself for not being able to find it
    anymore but maybe someone else still knows of this (assuming it's still
    relevant).

    --
  • Minux at Oct 10, 2012 at 2:29 pm

    On Wed, Oct 10, 2012 at 4:38 PM, Gijs wrote:

    A couple of months ago while trying to learn all the ins and outs of cgo I
    came across an article/wikipage/note/bug ticket (can't remember) containing
    a small patch for cgo which would significantly increase performance of
    calls to C code at the expense of potential stability concerning calling
    the C code concurrently. I'm kicking myself for not being able to find it
    anymore but maybe someone else still knows of this (assuming it's still
    relevant).
    I'm not aware of any patches like what you described.
    I'd like to see it, if possible, thanks.

    --
  • Gijs at Oct 10, 2012 at 7:26 pm

    I'm not aware of any patches like what you described.
    I'd like to see it, if possible, thanks.

    I spent a couple of hours searching, almost thinking I imagined the whole
    thing when I finally found it, on this very list. It's not exactly what I
    thought it was and considering it's from 2010 by go-standards that's
    practically prehistoric so I'm not sure it's still relevant.

    https://groups.google.com/forum/#!topic/golang-nuts/NNaluSgkLSU/discussion

    --
  • Rémy Oudompheng at Oct 11, 2012 at 6:19 am
    It is still relevant and will not work for various reasons, like
    different calling convention differences, and very small stack space,
    and unnecessarily blocking Go threads, which are why cgo exists.

    Rémy.


    2012/10/10, Gijs <gwkunze@gmail.com>:
    I'm not aware of any patches like what you described.
    I'd like to see it, if possible, thanks.

    I spent a couple of hours searching, almost thinking I imagined the whole
    thing when I finally found it, on this very list. It's not exactly what I
    thought it was and considering it's from 2010 by go-standards that's
    practically prehistoric so I'm not sure it's still relevant.

    https://groups.google.com/forum/#!topic/golang-nuts/NNaluSgkLSU/discussion

    --

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedOct 9, '12 at 6:57p
activeOct 11, '12 at 6:19a
posts17
users7
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase