FAQ
Since I can't read C code... just wondered about whether "pure getters" are
inlined:

type MyRec struct {
foo string
}

func (me *MyRec) Foo () string {
return me.foo
}

If another package imports this and does a someRec.Foo() call, in the
ultimately finally assembled machine executable will that still be a CALL
or would that be rewritten as the equivalent direct field read access,
since "exporting" is no longer an existing concept at the machine code
level?

Obviously this is only for "pure getters", ie. those methods that do
nothing but return a field value, with no other instructions or pointer
dealings.

More generally speaking, is there a doc outlining what gets inlined under
what conditions, including info on cross-package inlining when the final
machine-code executable is assembled?

--

Search Discussions

  • Dave Cheney at Dec 22, 2012 at 11:43 am
    Leaf calls are generally inlined, see for yourself.

    go build -gcflags -m
    On 22 Dec 2012 22:40, "Philipp Schumann" wrote:

    Since I can't read C code... just wondered about whether "pure getters"
    are inlined:

    type MyRec struct {
    foo string
    }

    func (me *MyRec) Foo () string {
    return me.foo
    }

    If another package imports this and does a someRec.Foo() call, in the
    ultimately finally assembled machine executable will that still be a CALL
    or would that be rewritten as the equivalent direct field read access,
    since "exporting" is no longer an existing concept at the machine code
    level?

    Obviously this is only for "pure getters", ie. those methods that do
    nothing but return a field value, with no other instructions or pointer
    dealings.

    More generally speaking, is there a doc outlining what gets inlined under
    what conditions, including info on cross-package inlining when the final
    machine-code executable is assembled?

    --

    --
  • Minux at Dec 22, 2012 at 11:47 am

    On Sat, Dec 22, 2012 at 7:40 PM, Philipp Schumann wrote:

    Since I can't read C code... just wondered about whether "pure getters"
    are inlined:

    type MyRec struct {
    foo string
    }

    func (me *MyRec) Foo () string {
    return me.foo
    }

    If another package imports this and does a someRec.Foo() call, in the
    ultimately finally assembled machine executable will that still be a CALL
    or would that be rewritten as the equivalent direct field read access,
    since "exporting" is no longer an existing concept at the machine code
    level?

    Obviously this is only for "pure getters", ie. those methods that do
    nothing but return a field value, with no other instructions or pointer
    dealings.

    More generally speaking, is there a doc outlining what gets inlined under
    what conditions, including info on cross-package inlining when the final
    machine-code executable is assembled?
    there is a good way to find out, pass -gcflags -m to go build, and you will
    see what gets inlined.
    example output:
    ./test.go:16: can inline (*T).Get
    ./test.go:23: inlining call to (*T).Get

    --
  • Chen Yufei at Dec 28, 2012 at 3:09 am
    Where can I find the detailed description of the optimization decisions
    printed out by -m?

    For example, I'm seeing message like "leaking param" for function
    parameters, what does that mean?

    On Sat, Dec 22, 2012 at 7:47 PM, minux wrote:


    On Sat, Dec 22, 2012 at 7:40 PM, Philipp Schumann <
    philipp.schumann@gmail.com> wrote:
    Since I can't read C code... just wondered about whether "pure getters"
    are inlined:

    type MyRec struct {
    foo string
    }

    func (me *MyRec) Foo () string {
    return me.foo
    }

    If another package imports this and does a someRec.Foo() call, in the
    ultimately finally assembled machine executable will that still be a CALL
    or would that be rewritten as the equivalent direct field read access,
    since "exporting" is no longer an existing concept at the machine code
    level?

    Obviously this is only for "pure getters", ie. those methods that do
    nothing but return a field value, with no other instructions or pointer
    dealings.

    More generally speaking, is there a doc outlining what gets inlined under
    what conditions, including info on cross-package inlining when the final
    machine-code executable is assembled?
    there is a good way to find out, pass -gcflags -m to go build, and you
    will see what gets inlined.
    example output:
    ./test.go:16: can inline (*T).Get
    ./test.go:23: inlining call to (*T).Get

    --



    --
    Best regards,
    Chen Yufei

    --
  • Minux at Dec 27, 2012 at 5:27 pm

    On Thu, Dec 27, 2012 at 8:01 PM, Chen Yufei wrote:

    Where can I find the detailed description of the optimization decisions
    printed out by -m?
    I'm not aware of any.
    For example, I'm seeing message like "leaking param" for function
    parameters, what does that mean?
    it means that the function somehow keeps its parameter alive even it
    returns, for example,
    when I compile this package:
    package test
    func f(x []byte) func() []byte { // x leaks because it's closed in the
    closure it returns
    return func() []byte { return x }
    }
    6g -m gives:
    x.go:3: can inline func·001
    x.go:2: leaking param: x
    x.go:3: func literal escapes to heap
    x.go:2: moved to heap: x
    x.go:3: &x escapes to heap

    --
  • Chen Yufei at Jan 6, 2013 at 3:18 am
    On Fri, Dec 28, 2012 at 1:27 AM, minux wrote:
    On Thu, Dec 27, 2012 at 8:01 PM, Chen Yufei wrote:

    Where can I find the detailed description of the optimization decisions
    printed out by -m?
    I'm not aware of any.
    For example, I'm seeing message like "leaking param" for function
    parameters, what does that mean?
    it means that the function somehow keeps its parameter alive even it
    returns, for example,
    when I compile this package:
    package test
    func f(x []byte) func() []byte { // x leaks because it's closed in the
    closure it returns
    return func() []byte { return x }
    }
    6g -m gives:
    x.go:3: can inline func·001
    x.go:2: leaking param: x
    x.go:3: func literal escapes to heap
    x.go:2: moved to heap: x
    x.go:3: &x escapes to heap
    Thanks for your explanation. I forgot to set excluding rule in my gmail
    filter, so this reply didn't go into my inbox.

    It seems that the "leaking param" message is reported for some function
    which does not keep the parameter alive after return.

    Here's an example:

    package leak
    import "net"
    func hostIsIP(host string) bool {
    h, _, _ := net.SplitHostPort(host)
    return net.ParseIP(h) != nil
    }

    6g -m gives:
    leak.go:4: leaking param: host

    I can't see why "host" is kept alive after return here.

    --
    Best regards,
    Chen Yufei

    --
  • Minux at Jan 6, 2013 at 5:32 am

    On Sun, Jan 6, 2013 at 11:17 AM, Chen Yufei wrote:

    It seems that the "leaking param" message is reported for some function
    which does not keep the parameter alive after return.

    Here's an example:

    package leak
    import "net"
    func hostIsIP(host string) bool {
    h, _, _ := net.SplitHostPort(host)
    return net.ParseIP(h) != nil
    }

    6g -m gives:
    leak.go:4: leaking param: host

    I can't see why "host" is kept alive after return here.
    because the escape analysis part of gc failed to determine that the
    argument to net.SplitHostPort doesn't escape.
    I don't know why.

    --
  • Kevin Gillette at Jan 6, 2013 at 3:33 pm
    Is there a document, such as on the wiki, that tracks what gc's (and
    gccgo's) escape analysis detects and does not detect, and what is and is
    not optimized?
    On Saturday, January 5, 2013 10:32:22 PM UTC-7, minux wrote:


    On Sun, Jan 6, 2013 at 11:17 AM, Chen Yufei <cyfd...@gmail.com<javascript:>
    wrote:
    It seems that the "leaking param" message is reported for some function
    which does not keep the parameter alive after return.

    Here's an example:

    package leak
    import "net"
    func hostIsIP(host string) bool {
    h, _, _ := net.SplitHostPort(host)
    return net.ParseIP(h) != nil
    }

    6g -m gives:
    leak.go:4: leaking param: host

    I can't see why "host" is kept alive after return here.
    because the escape analysis part of gc failed to determine that the
    argument to net.SplitHostPort doesn't escape.
    I don't know why.
    --
  • Jan Mercl at Jan 6, 2013 at 4:36 pm
    IMO not a good idea as people will even more optimize their Go code
    against an implementation detail of one specific compiler implementation
    (ever changing).

    -j

    --
  • Kevin Gillette at Jan 6, 2013 at 5:38 pm

    On Sunday, January 6, 2013 9:36:31 AM UTC-7, Jan Mercl wrote:

    IMO not a good idea as people will even more optimize their Go code
    against an implementation detail of one specific compiler implementation
    (ever changing).

    -j
    Normally I'd agree in principle (with some reservations), but disagree in
    this circumstance. In this case, the worst case is heap allocation, and
    people have long been optimizing against what the implementation _can't_ do
    (or couldn't do, depending on what the nonexistant docs say), ever since
    the profiling article was published. With such documentation, if the
    compiler will stack allocate some non-escaping pointer literals, people can
    confidently write code the way we recommend they do "until they know that
    the GC is the bottleneck" (since the naive way would be optimal).

    I'm also thinking of non-escaping, read-only use of string([]byte) and
    []byte(string) conversions that could be optimized into reuse of the same
    allocation; I've seen (and have done) some dirty tricks to avoid
    reallocations of particularly large strings, or deviated from the
    straightforward approach to reuse byte buffers.

    In both these scenarios, the simplest code (and thus the code we'd be most
    likely to write when performance is not [yet] a concern) is optimal. At
    this time, Go programmers are investing effort into constructing
    app-specific "second order allocators" which would prove irrelevant and
    suboptimal as soon as such optimizations exist (not only will those
    optimizations be bypassed, in a bad way, but the programmer will
    inadvertently be wasting their time based on no longer accurate knowledge
    of the compiler tools).

    I would suggest that any time a compiler or runtime optimization can allow
    a programmer to _avoid_ manually optimizing/complexifying their own code,
    it should be documented and made easily accessible.

    --
  • Jan Mercl at Jan 6, 2013 at 6:17 pm

    On Sun, Jan 6, 2013 at 6:31 PM, Kevin Gillette wrote:
    Normally I'd agree in principle (with some reservations), but disagree in
    this circumstance. In this case, the worst case is heap allocation, and
    people have long been optimizing against what the implementation _can't_ do
    (or couldn't do, depending on what the nonexistant docs say), ever since the
    profiling article was published. With such documentation, if the compiler
    will stack allocate some non-escaping pointer literals, people can
    confidently write code the way we recommend they do "until they know that
    the GC is the bottleneck" (since the naive way would be optimal).
    That's IMHO still an unfortunate approach. No stack, neither heap is
    ever mentioned by the specs. OK, heap is perhaps inevitable, but (the
    HW) stack is not in any way really required and there's actually an
    experimental project (probably one of those which will never get
    finished) of a stack-less Go. What may, sometimes, help to produce
    better code in one compiler might hurt the code produced by other one.
    (Another idea might be a Go VM where the "optimization" problem may
    show as well as the cost of creating the invocation record might be
    the same as the cost of a heap alloc, for example).

    Most of bad (performing) Go code I see is, AFAICS, caused by coders w/
    absolutely no idea about any implementation details whatsoever -
    insane [ab]use of reflection, `interface{}`, ... Where any single
    inner cycle matters, the respective, competent and responsible
    programmers already know quite well how to shave them off, I believe.

    -j

    --
  • Kevin Gillette at Jan 7, 2013 at 3:39 am
    Which brings us back to my main point: giving programmers any reason to use
    the straightforward approach is ideal, since it's more readable and what
    the compiler writers will likely focus their efforts on.
    On Sunday, January 6, 2013 11:16:41 AM UTC-7, Jan Mercl wrote:

    On Sun, Jan 6, 2013 at 6:31 PM, Kevin Gillette
    <extempor...@gmail.com <javascript:>> wrote:
    Normally I'd agree in principle (with some reservations), but disagree in
    this circumstance. In this case, the worst case is heap allocation, and
    people have long been optimizing against what the implementation _can't_ do
    (or couldn't do, depending on what the nonexistant docs say), ever since the
    profiling article was published. With such documentation, if the compiler
    will stack allocate some non-escaping pointer literals, people can
    confidently write code the way we recommend they do "until they know that
    the GC is the bottleneck" (since the naive way would be optimal).
    That's IMHO still an unfortunate approach. No stack, neither heap is
    ever mentioned by the specs. OK, heap is perhaps inevitable, but (the
    HW) stack is not in any way really required and there's actually an
    experimental project (probably one of those which will never get
    finished) of a stack-less Go. What may, sometimes, help to produce
    better code in one compiler might hurt the code produced by other one.
    (Another idea might be a Go VM where the "optimization" problem may
    show as well as the cost of creating the invocation record might be
    the same as the cost of a heap alloc, for example).

    Most of bad (performing) Go code I see is, AFAICS, caused by coders w/
    absolutely no idea about any implementation details whatsoever -
    insane [ab]use of reflection, `interface{}`, ... Where any single
    inner cycle matters, the respective, competent and responsible
    programmers already know quite well how to shave them off, I believe.

    -j
    --
  • Dave Cheney at Jan 6, 2013 at 11:23 pm

    because the escape analysis part of gc failed to determine that the argument
    to net.SplitHostPort doesn't escape.
    I don't know why.
    host escapes because it is passed to net.SplitHostPort as hostport,
    which passes it to net.spiltHostPort as hostport which slices the it
    into two strings, and returns them back to the caller. hostport leaks
    because the memory backing that slice (and now the subslices) escapes
    from splitHostPort.

    Cheers

    Dave

    --
  • Niklas Schnelle at Jan 7, 2013 at 12:17 am
    While looking into this I find it quite hard to understand why things
    sometimes leak.
    For example in a current little project of mine I've got the code
    func ColorToGrayValue(c color.Color) uint8 {
    r, g, b, _ := c.RGBA()
    return uint8(((299*r + 587*g + 114*b + 500) / 1000) >> 8)
    }

    and gc says:
    ./hornschunk.go:71: leaking param: c
    line 71 being the function declaration. Does leaking in this case merely
    mean that the call to color.RGBA()
    needs it on the stack?

    By the way, why would anyone want a Stackless Go? Unlike ideal malloc()
    which is pretty much NP-hard stack allocation
    is trivial thing, so why would anyone want the worse allocation scheme?

    On Monday, January 7, 2013 12:23:29 AM UTC+1, Dave Cheney wrote:

    because the escape analysis part of gc failed to determine that the argument
    to net.SplitHostPort doesn't escape.
    I don't know why.
    host escapes because it is passed to net.SplitHostPort as hostport,
    which passes it to net.spiltHostPort as hostport which slices the it
    into two strings, and returns them back to the caller. hostport leaks
    because the memory backing that slice (and now the subslices) escapes
    from splitHostPort.

    Cheers

    Dave
    --
  • Dave Cheney at Jan 7, 2013 at 12:25 am

    By the way, why would anyone want a Stackless Go? Unlike ideal malloc()
    which is pretty much NP-hard stack allocation
    is trivial thing, so why would anyone want the worse allocation scheme?
    Yes, i'd like to know what Stackless-Go is. Wikipedia says Stackless Python is

    "Stackless Python, or Stackless, is a Python programming language
    interpreter, so named because it avoids depending on the C call stack
    for its own stack."

    I thought that this was a long winded description of python with coroutines.

    --
  • Jan Mercl at Jan 7, 2013 at 9:15 am

    On Mon, Jan 7, 2013 at 1:20 AM, Dave Cheney wrote:
    Yes, i'd like to know what Stackless-Go is. Wikipedia says Stackless Python is...
    I hope the overview bellow succeeds to communicate the ideas behind it.

    The primary goal: To allow _a lot_ of goroutines exist at any given time.

    The limiting factor: Memory used by the stacks.

    The naive approach: Goroutine is a thread (like in early gccgo).
    Considering the default thread stack size on Linux (4M IIRC), one can
    pack about up to 256 goroutines in 1GB of RAM.

    The current approach: Goroutines use split stacks (currently the
    granuilarity is 4KB, IIRC). One can pack about up to 256K goroutines
    in 1GB of RAM.
    Pros: That's a lot of goroutines.
    Cons:
    - Checked use of stack space (split on demand) costs some cycles on
    every call invocation.
    - Not compatible with C stacks (cgo has to start a new thread at the
    worlds borders because of this).

    The (out of curiosity experimental) stack-less approach: Allocate
    function/method invocation records in the heap space instead of the
    stack space. Considering that the size of any given invocation record
    is statically known, they can be, hopefully, allocated/freed in
    time/complexity roughly/in magnitude comparable to the amortized cost
    of a stack check and on demand split stack allocation/freeing.
    Measurement beats any speculation hence the implementation attempt.
    Pros:
    - Finer "stack" granularity has the potential to fit even more
    goroutines to the same amount of memory as long as they're in average
    lighter than the currently used split stack size or when the unused
    portions of the split stacks (on average half of its size) sum up to a
    lot of claimed but wasted RAM space. Probably not a common situation
    but can be nice in some corner case/less common scenarios.
    - HW stack is left for easy C interoperability (only in the first
    approximation, there are other already known issues in this)
    Cons: Even the fastest feasible invocation record allocation (cca a
    handful of instructions effectively) can be too slow compared to the
    current approach.

    The primary goal above might be, in some cases, even worth of paying
    the cost of some, if not substantial, performance degradation (say
    single digit percents only), I think.

    -j

    --
  • Russ Cox at Jan 23, 2013 at 4:08 am
    I spent a while on the Stackless Python email lists around 2004, so I like
    to think I understand that project's goals and implementation decently.
    I've also been on this list long enough that I like to think I understand
    the same for Go.

    Stackless Go does not make much sense.

    The goal of Stackless Python was to make it possible to block a Python
    coroutine at any point in a computation and start running another. This was
    non-trivial because the Python interpreter implemented language constructs
    like function calls by invoking the interpreter loop recursively, not by
    maintaining an explicit separate stack. That is, each Python call frame
    ended up being a C call frame in the interpreter. This is a perfectly fine
    way to write an interpreter, and it no doubt simplifies some things, but it
    means you can't just pause one execution context (task, goroutine,
    coroutine, lightweight thread, fiber, whatever name you like) and start
    another, because some of the execution state is on the C stack, which is
    hard to save and restore. To address this, the Stackless Python project
    moved all the Python execution state into separate allocation chains: an
    explicit Python stack per execution context. Once that's been done, the C
    interpreter can run one execution context for a while, and then stop and
    run another. Because all the execution state is in data structures, it's
    now trivial to flip between them.

    Even at the time, there were other possible implementation strategies. For
    example, swtch.com/libtask is a simple coroutine library for C. It would
    probably have been less work to flip between multiple Python stacks managed
    by that library, but not portably: it would have required
    architecture-specific assembly and also guessing at the size of the stack
    required, since arbitrary C library calls are being made. The approach
    taken by Stackless Python does make a lot of sense. If it had been done
    from the start of the Python interpreter it would have been pretty easy,
    and it's completely portable. Because there was a large existing
    interpreter not written that way, it turned out to be a lot of work.

    Go obviously shares with Stackless Python the goal of having many
    lightweight execution contexts (goroutines). However, Go also accepts that
    the underlying machine hardware is optimized for use of call and return
    instructions on a hardware-managed stack. It uses architecture-specific
    assembly to do stack switches, and it avoids guessing at the size of the
    stack by using a sequence of stack segments, allocated on demand and freed
    when no longer in use. In the 'Stackless Python' sense, Go is already
    stackless: it does not put any per-goroutine state on the main C stack, and
    as such can flip between goroutines with ease.

    The cons you list for Go's current approach are not fixed by explicit
    allocations for each function frame. The first is "checked use of stack
    space (split on demand) costs some cycles on every call invocation." It
    costs one or two machine instructions. Explicit allocation and deallocation
    on every function frame is like splitting at every call. It is guaranteed
    to be more expensive. Also, since you don't get to use the hardware's jump
    predictor for call and return, those transitions will be slower even
    ignoring the frame management. The second is "not compatible with C stacks
    (cgo has to start a new thread at the worlds borders because of this)."
    This is not true. Cgo does not create new threads. It switches from the
    per-goroutine stack to the significantly larger per-thread stack, so that
    the C program will not run out of space. The stack switch is super cheap,
    just saving and loading a pair of registers on the x86. The calling of the
    C routine does require some scheduler bookkeeping to avoid deadlocks. That
    is inherent to the scheduling, regardless of stack management.

    If you want to do something about the 4 kB per goroutine stack overhead
    (and remember that most C thread implementations measure stack size in
    megabytes), you don't need to throw the whole thing away and start over.
    That's almost certainly counterproductive. Put some logic into the linker
    or compiler to analyze the call graph and decide on smaller frames for
    computations that cannot possibly use 4 kB. Or ratchet the default stack
    size down to 2 kB or 1 kB.

    Rewriting everything to do explicit per-call frame allocation may be fun
    and instructive, but it is unlikely to be be faster.

    Russ

    --
  • Dave Cheney at Dec 28, 2012 at 4:22 am

    On 27/12/2012, at 23:01, Chen Yufei wrote:

    Where can I find the detailed description of the optimization decisions printed out by -m?
    Look in $GOROOT/src/cmd/gc for debug['m']
    For example, I'm seeing message like "leaking param" for function parameters, what does that mean?

    On Sat, Dec 22, 2012 at 7:47 PM, minux wrote:
    On Sat, Dec 22, 2012 at 7:40 PM, Philipp Schumann wrote:
    Since I can't read C code... just wondered about whether "pure getters" are inlined:

    type MyRec struct {
    foo string
    }

    func (me *MyRec) Foo () string {
    return me.foo
    }

    If another package imports this and does a someRec.Foo() call, in the ultimately finally assembled machine executable will that still be a CALL or would that be rewritten as the equivalent direct field read access, since "exporting" is no longer an existing concept at the machine code level?

    Obviously this is only for "pure getters", ie. those methods that do nothing but return a field value, with no other instructions or pointer dealings.

    More generally speaking, is there a doc outlining what gets inlined under what conditions, including info on cross-package inlining when the final machine-code executable is assembled?
    there is a good way to find out, pass -gcflags -m to go build, and you will see what gets inlined.
    example output:
    ./test.go:16: can inline (*T).Get
    ./test.go:23: inlining call to (*T).Get
    --


    --
    Best regards,
    Chen Yufei
    --
    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedDec 22, '12 at 11:40a
activeJan 23, '13 at 4:08a
posts18
users8
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase