FAQ
Hello everyone,

Our business suffered from an annoying problem. We are developing an
iMessage-like service in Go, the server can serves hundreds of
thousands of concurrent TCP connection per process, and it's robust
(be running for about a month), which is awesome. However, the process
consumes 16GB memory quickly, since there are so many connections,
there are also a lot of goroutines and buffered memories used. I
extend the memory limit to 64GB by changing runtime/malloc.h and
runtime/malloc.goc. It works, but brings a big problem too - The
garbage collecting process is then extremely slow, it stops the world
for about 10 seconds every 2 minutes, and brings me some problems
which are very hard to trace, for example, when stoping the world,
messages delivered may be lost. This is a disaster, since our service
is a real-time service which requires delivering messages as fast as
possible and there should be no stops and message lost at all.

I'm planning to split the "big server process" to many "small
processes" to avoid this problem (smaller memory footprint results to
smaller time stop), and waiting for Go's new GC implementation.

Or any suggestions for me to improve our service currently? I don't
know when Go's new latency-free garbage collection will occur.

Thanks.

--
Best regards,
Jingcheng Zhang
Beijing, P.R.China

--

Search Discussions

  • Christoph Hack at Nov 17, 2012 at 8:00 am
    Avoid the garbage in the first place. So, for example instead of allocating
    and returning a new strings in the String() methods of your object, you
    might want to implement the WriteTo method (or a similar interface). The
    standard library doesn't produce much garbage, so it's probably your
    program that allocates all those objects. Use the memory profiler to find
    those parts.

    -christoph

    --
  • Dmitry Vyukov at Nov 17, 2012 at 8:14 am

    On Saturday, November 17, 2012 12:00:29 PM UTC+4, Christoph Hack wrote:

    Avoid the garbage in the first place. So, for example instead of
    allocating and returning a new strings in the String() methods of your
    object, you might want to implement the WriteTo method (or a similar
    interface). The standard library doesn't produce much garbage, so it's
    probably your program that allocates all those objects. Use the memory
    profiler to find those parts.
    This won't help. GC duration is not a function of garbage generation speed.
    GC frequency is a function of garbage generation speed. So instead of 10sec
    every 2min, you can get 10sec every 4min. But I guess it won't solve the
    problem.

    --
  • Rémy Oudompheng at Nov 17, 2012 at 8:19 am

    On 2012/11/17 Dmitry Vyukov wrote:
    On Saturday, November 17, 2012 12:00:29 PM UTC+4, Christoph Hack wrote:

    Avoid the garbage in the first place. So, for example instead of
    allocating and returning a new strings in the String() methods of your
    object, you might want to implement the WriteTo method (or a similar
    interface). The standard library doesn't produce much garbage, so it's
    probably your program that allocates all those objects. Use the memory
    profiler to find those parts.
    This won't help. GC duration is not a function of garbage generation speed.
    GC frequency is a function of garbage generation speed. So instead of 10sec
    every 2min, you can get 10sec every 4min. But I guess it won't solve the
    problem.
    2minutes is the frequency of the scavenger froced GC. Maybe your
    application doesn't need it and you can increase the hardcoded
    frequency for something larger.

    But you should definitely profile and reduce the number of objects you
    are needing (go tool pprof --inuse_objects).

    Rémy.

    --
  • Jingcheng Zhang at Nov 19, 2012 at 1:59 am

    On Sat, Nov 17, 2012 at 4:19 PM, Rémy Oudompheng wrote:
    On 2012/11/17 Dmitry Vyukov wrote:
    On Saturday, November 17, 2012 12:00:29 PM UTC+4, Christoph Hack wrote:

    Avoid the garbage in the first place. So, for example instead of
    allocating and returning a new strings in the String() methods of your
    object, you might want to implement the WriteTo method (or a similar
    interface). The standard library doesn't produce much garbage, so it's
    probably your program that allocates all those objects. Use the memory
    profiler to find those parts.
    This won't help. GC duration is not a function of garbage generation speed.
    GC frequency is a function of garbage generation speed. So instead of 10sec
    every 2min, you can get 10sec every 4min. But I guess it won't solve the
    problem.
    2minutes is the frequency of the scavenger froced GC. Maybe your
    application doesn't need it and you can increase the hardcoded
    frequency for something larger.
    I set GOGC=200 to reduce the GC frequency, it seems the scavenger
    triggered the GC before GOGC=200 play its role.
    I will give this a try.
    But you should definitely profile and reduce the number of objects you
    are needing (go tool pprof --inuse_objects).
    Thanks, we are working on this too, but haven't tried the pprof.
    Rémy.

    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China

    --
  • Jingcheng Zhang at Nov 19, 2012 at 1:44 am
    在 2012-11-17 PM4:14,"Dmitry Vyukov" <dvyukov@google.com>写道:
    On Saturday, November 17, 2012 12:00:29 PM UTC+4, Christoph Hack wrote:

    Avoid the garbage in the first place. So, for example instead of
    allocating and returning a new strings in the String() methods of your
    object, you might want to implement the WriteTo method (or a similar
    interface). The standard library doesn't produce much garbage, so it's
    probably your program that allocates all those objects. Use the memory
    profiler to find those parts.
    This won't help. GC duration is not a function of garbage
    generation speed.
    GC frequency is a function of garbage generation speed. So instead of
    10sec every 2min, you can get 10sec every 4min. But I guess it won't solve
    the problem.
    >

    Thanks for your explanation, so what determines the GC duration? The memory
    arena size? I changed the arena limit to 64GB, before the change it is
    fast to complete the GC. It is fine for our business to stop for about 2
    seconds but bad to stop for 10 seconds.
    --
    --
  • Ian Lance Taylor at Nov 19, 2012 at 3:23 am

    On Sun, Nov 18, 2012 at 5:44 PM, Jingcheng Zhang wrote:
    Thanks for your explanation, so what determines the GC duration? The memory
    arena size? I changed the arena limit to 64GB, before the change it is fast
    to complete the GC. It is fine for our business to stop for about 2 seconds
    but bad to stop for 10 seconds.
    The time it takes to run a GC is approximately proportional to the
    size of live memory that may contain pointers. The total size of the
    memory arena has a relatively small effect on the time it takes to run
    a GC.

    Ian

    --
  • ⚛ at Nov 17, 2012 at 11:42 am

    On Saturday, November 17, 2012 8:29:12 AM UTC+1, Jingcheng Zhang wrote:

    Hello everyone,

    Our business suffered from an annoying problem. We are developing an
    iMessage-like service in Go, the server can serves hundreds of
    thousands of concurrent TCP connection per process, and it's robust
    (be running for about a month), which is awesome. However, the process
    consumes 16GB memory quickly, since there are so many connections,
    there are also a lot of goroutines and buffered memories used. I
    extend the memory limit to 64GB by changing runtime/malloc.h and
    runtime/malloc.goc. It works, but brings a big problem too - The
    garbage collecting process is then extremely slow, it stops the world
    for about 10 seconds every 2 minutes, and brings me some problems
    which are very hard to trace, for example, when stoping the world,
    messages delivered may be lost. This is a disaster, since our service
    is a real-time service which requires delivering messages as fast as
    possible and there should be no stops and message lost at all.

    I'm planning to split the "big server process" to many "small
    processes" to avoid this problem (smaller memory footprint results to
    smaller time stop), and waiting for Go's new GC implementation.
    1. There haven't been reports about major memory leaks on 64-bit CPUs. Are
    you suspecting that the application may be generating memory leaks on a
    64-bit CPU?

    2. The performance of the newer GC implementation (CL 6114046, in respect
    to the current GC) depends on the structure of data existing in the heap.
    For example, if all structs in the heap contain pointer fields and no
    integer fields, then the new GC is slower than the current one. This
    slowdown is unavoidable, because in this case the new GC is processing more
    bits of information per machine word. On the other hand, if the structs
    contain a mix of field types (pointers, integers, etc), the new GC may be
    faster. So, the new GC may be faster or slower than the current one. In
    the cases I have seen so far, the performance in regular Go programs is
    approximately the same.

    Or any suggestions for me to improve our service currently? I don't
    know when Go's new latency-free garbage collection will occur.

    Thanks.

    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --
  • Dmitry Vyukov at Nov 17, 2012 at 11:49 am

    On Sat, Nov 17, 2012 at 3:42 PM, ⚛ wrote:
    On Saturday, November 17, 2012 8:29:12 AM UTC+1, Jingcheng Zhang wrote:

    Hello everyone,

    Our business suffered from an annoying problem. We are developing an
    iMessage-like service in Go, the server can serves hundreds of
    thousands of concurrent TCP connection per process, and it's robust
    (be running for about a month), which is awesome. However, the process
    consumes 16GB memory quickly, since there are so many connections,
    there are also a lot of goroutines and buffered memories used. I
    extend the memory limit to 64GB by changing runtime/malloc.h and
    runtime/malloc.goc. It works, but brings a big problem too - The
    garbage collecting process is then extremely slow, it stops the world
    for about 10 seconds every 2 minutes, and brings me some problems
    which are very hard to trace, for example, when stoping the world,
    messages delivered may be lost. This is a disaster, since our service
    is a real-time service which requires delivering messages as fast as
    possible and there should be no stops and message lost at all.

    I'm planning to split the "big server process" to many "small
    processes" to avoid this problem (smaller memory footprint results to
    smaller time stop), and waiting for Go's new GC implementation.
    1. There haven't been reports about major memory leaks on 64-bit CPUs. Are
    you suspecting that the application may be generating memory leaks on a
    64-bit CPU?

    2. The performance of the newer GC implementation (CL 6114046, in respect
    to the current GC) depends on the structure of data existing in the heap.
    For example, if all structs in the heap contain pointer fields and no
    integer fields, then the new GC is slower than the current one. This
    slowdown is unavoidable, because in this case the new GC is processing more
    bits of information per machine word. On the other hand, if the structs
    contain a mix of field types (pointers, integers, etc), the new GC may be
    faster. So, the new GC may be faster or slower than the current one. In
    the cases I have seen so far, the performance in regular Go programs is
    approximately the same.

    I am curious have you considered reordering fields in structs so that
    pointers are packed together? I understand that it's impossible in general
    case (when there are sub-structs), but in otherwise I think it's OK to
    arbitrary reorder fields. Then you can say in the metainfo -- in this
    object of size 128 scan only 4 words. This is, of course, complicates
    things.



    Or any suggestions for me to improve our service currently? I don't
    know when Go's new latency-free garbage collection will occur.

    Thanks.

    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --

    --
  • ⚛ at Nov 17, 2012 at 12:57 pm

    On Saturday, November 17, 2012 12:49:40 PM UTC+1, Dmitry Vyukov wrote:

    I am curious have you considered reordering fields in structs so that
    pointers are packed together? I understand that it's impossible in general
    case (when there are sub-structs), but in otherwise I think it's OK to
    arbitrary reorder fields. Then you can say in the metainfo -- in this
    object of size 128 scan only 4 words. This is, of course, complicates
    things.
    There is cgo so reordering fields in general is forbidden. Also the runtime
    is sharing certain Go types and C types.

    In my opinion, if GC already knows where the pointer fields are in the
    struct (knows their byte-offsets in the struct) then reordering the fields
    will not improve performance.

    One reason for this is that in any case the GC needs to check the value of
    every pointer in the struct. The performance gain obtained by reordering
    fields seems negligible in comparison to checking the values.

    A second reason is that while walking the heap each step ("instruction") in
    the GC implementation should generate a finite number of pointers.
    Typically, 1 step generates at most 1 pointer that needs to be checked.
    There should be an upper bound on the number of pointers generated in one
    step, so considering N consecutive words in one step would be problematic
    if N can be an arbitrary number.

    Each pointer needs to be considered separately and there is absolutely no
    relation between any two pointers P and Q. This uncertainty appears to be
    the main reason why reordering fields wouldn't make the GC much faster.

    One area where reordering fields would help is when the structure is big,
    the non-pointer fields in the structure are forming a gap, and this gap is
    putting pointer fields on different cachelines.

    --
  • Dmitry Vyukov at Nov 17, 2012 at 1:06 pm

    On Sat, Nov 17, 2012 at 4:50 PM, ⚛ wrote:
    On Saturday, November 17, 2012 12:49:40 PM UTC+1, Dmitry Vyukov wrote:

    I am curious have you considered reordering fields in structs so that
    pointers are packed together? I understand that it's impossible in general
    case (when there are sub-structs), but in otherwise I think it's OK to
    arbitrary reorder fields. Then you can say in the metainfo -- in this
    object of size 128 scan only 4 words. This is, of course, complicates
    things.
    There is cgo so reordering fields in general is forbidden. Also the
    runtime is sharing certain Go types and C types.

    In my opinion, if GC already knows where the pointer fields are in the
    struct (knows their byte-offsets in the struct) then reordering the fields
    will not improve performance.

    One reason for this is that in any case the GC needs to check the value of
    every pointer in the struct. The performance gain obtained by reordering
    fields seems negligible in comparison to checking the values.

    A second reason is that while walking the heap each step ("instruction")
    in the GC implementation should generate a finite number of pointers.
    Typically, 1 step generates at most 1 pointer that needs to be checked.
    There should be an upper bound on the number of pointers generated in one
    step, so considering N consecutive words in one step would be problematic
    if N can be an arbitrary number.

    Each pointer needs to be considered separately and there is absolutely no
    relation between any two pointers P and Q. This uncertainty appears to be
    the main reason why reordering fields wouldn't make the GC much faster.

    One area where reordering fields would help is when the structure is big,
    the non-pointer fields in the structure are forming a gap, and this gap is
    putting pointer fields on different cachelines.

    I was thinking about an ideal case where you have only 1 instruction than
    says "first N fields are pointers", and you just need a single for loop
    1..N. It should be not much slower than current non-precise gc in situation
    when structs contain only points.
    But I think your analysis is correct.

    Do I get it right that with type info there is no need check whether the
    pointer points to an allocated chunk of memory. It can be either 0 or a
    valid pointer to a valid allocated memory chunk, no other cases possible,
    right?
    And additionally you know size of the pointed-to object -- it's either
    determined by type, or if it's a slice then size is in the subsequent word.
    Right?

    --
  • ⚛ at Nov 17, 2012 at 2:22 pm

    On Saturday, November 17, 2012 2:06:59 PM UTC+1, Dmitry Vyukov wrote:

    On Sat, Nov 17, 2012 at 4:50 PM, ⚛ <0xe2.0x...@gmail.com <javascript:>>wrote:
    On Saturday, November 17, 2012 12:49:40 PM UTC+1, Dmitry Vyukov wrote:

    I am curious have you considered reordering fields in structs so that
    pointers are packed together? I understand that it's impossible in general
    case (when there are sub-structs), but in otherwise I think it's OK to
    arbitrary reorder fields. Then you can say in the metainfo -- in this
    object of size 128 scan only 4 words. This is, of course, complicates
    things.
    There is cgo so reordering fields in general is forbidden. Also the
    runtime is sharing certain Go types and C types.

    In my opinion, if GC already knows where the pointer fields are in the
    struct (knows their byte-offsets in the struct) then reordering the fields
    will not improve performance.

    One reason for this is that in any case the GC needs to check the value
    of every pointer in the struct. The performance gain obtained by reordering
    fields seems negligible in comparison to checking the values.

    A second reason is that while walking the heap each step ("instruction")
    in the GC implementation should generate a finite number of pointers.
    Typically, 1 step generates at most 1 pointer that needs to be checked.
    There should be an upper bound on the number of pointers generated in one
    step, so considering N consecutive words in one step would be problematic
    if N can be an arbitrary number.

    Each pointer needs to be considered separately and there is absolutely no
    relation between any two pointers P and Q. This uncertainty appears to be
    the main reason why reordering fields wouldn't make the GC much faster.

    One area where reordering fields would help is when the structure is big,
    the non-pointer fields in the structure are forming a gap, and this gap is
    putting pointer fields on different cachelines.

    I was thinking about an ideal case where you have only 1 instruction than
    says "first N fields are pointers", and you just need a single for loop
    1..N. It should be not much slower than current non-precise gc in situation
    when structs contain only points.
    But I think your analysis is correct.

    Do I get it right that with type info there is no need check whether the
    pointer points to an allocated chunk of memory. It can be either 0 or a
    valid pointer to a valid allocated memory chunk, no other cases possible,
    right?
    It can be a pointer from C, or if the GC does not know the actual type of
    the object then it may be an integer (not a pointer).

    And additionally you know size of the pointed-to object -- it's either
    determined by type, or if it's a slice then size is in the subsequent word.
    Right?
    In general, no. The actual size of an object is unknown to GC, because in
    some cases it cannot be inferred from the type of the pointer. There are
    however cases when the GC knows the actual type of the object and thus
    knows the actual size.

    The GC code is robust in the sense that it will work even if there is no
    type information available about any object. If some type information is
    available, GC will use it and may be able to free more objects.

    It is impossible to decide whether to prefer the partial typeinfo about an
    object or the full typeinfo. Getting the full typeinfo is more costly than
    getting the partial one. In some cases the full typeinfo isn't available at
    all. The GC implementation is primarily using the partial typeinfo and
    tries to retrieve the full typeinfo only if something goes wrong.

    The length and capacity of a slice are insufficient to determine where the
    underlying array starts or ends. However, the knowledge that the object is
    a slice can be used to determine the full typeinfo of the slice's element
    (and thus the actual size of the slice's element). However, if the GC sees
    a pointer (Go type: *T) to any part of the underlying array prior to seeing
    that the block is actually of type [N]U, it will retrieve the full typeinfo
    U - if U can be determined and T is insufficient. Typeinfo T may be
    sufficient to process the whole array.

    The easiest values to GC are for example Go's maps because they are
    completely self-contained and it is impossible to get a pointer to their
    interior.

    --
  • Dmitry Vyukov at Nov 17, 2012 at 2:44 pm

    On Sat, Nov 17, 2012 at 6:22 PM, ⚛ wrote:

    On Saturday, November 17, 2012 2:06:59 PM UTC+1, Dmitry Vyukov wrote:
    On Sat, Nov 17, 2012 at 4:50 PM, ⚛ wrote:
    On Saturday, November 17, 2012 12:49:40 PM UTC+1, Dmitry Vyukov wrote:

    I am curious have you considered reordering fields in structs so that
    pointers are packed together? I understand that it's impossible in general
    case (when there are sub-structs), but in otherwise I think it's OK to
    arbitrary reorder fields. Then you can say in the metainfo -- in this
    object of size 128 scan only 4 words. This is, of course, complicates
    things.
    There is cgo so reordering fields in general is forbidden. Also the
    runtime is sharing certain Go types and C types.

    In my opinion, if GC already knows where the pointer fields are in the
    struct (knows their byte-offsets in the struct) then reordering the fields
    will not improve performance.

    One reason for this is that in any case the GC needs to check the value
    of every pointer in the struct. The performance gain obtained by reordering
    fields seems negligible in comparison to checking the values.

    A second reason is that while walking the heap each step ("instruction")
    in the GC implementation should generate a finite number of pointers.
    Typically, 1 step generates at most 1 pointer that needs to be checked.
    There should be an upper bound on the number of pointers generated in one
    step, so considering N consecutive words in one step would be problematic
    if N can be an arbitrary number.

    Each pointer needs to be considered separately and there is absolutely
    no relation between any two pointers P and Q. This uncertainty appears to
    be the main reason why reordering fields wouldn't make the GC much faster.

    One area where reordering fields would help is when the structure is
    big, the non-pointer fields in the structure are forming a gap, and this
    gap is putting pointer fields on different cachelines.

    I was thinking about an ideal case where you have only 1 instruction than
    says "first N fields are pointers", and you just need a single for loop
    1..N. It should be not much slower than current non-precise gc in situation
    when structs contain only points.
    But I think your analysis is correct.

    Do I get it right that with type info there is no need check whether the
    pointer points to an allocated chunk of memory. It can be either 0 or a
    valid pointer to a valid allocated memory chunk, no other cases possible,
    right?
    It can be a pointer from C, or if the GC does not know the actual type of
    the object then it may be an integer (not a pointer).

    And additionally you know size of the pointed-to object -- it's either
    determined by type, or if it's a slice then size is in the subsequent word.
    Right?
    In general, no. The actual size of an object is unknown to GC, because in
    some cases it cannot be inferred from the type of the pointer. There are
    however cases when the GC knows the actual type of the object and thus
    knows the actual size.

    The GC code is robust in the sense that it will work even if there is no
    type information available about any object. If some type information is
    available, GC will use it and may be able to free more objects.

    It is impossible to decide whether to prefer the partial typeinfo about an
    object or the full typeinfo. Getting the full typeinfo is more costly than
    getting the partial one. In some cases the full typeinfo isn't available at
    all. The GC implementation is primarily using the partial typeinfo and
    tries to retrieve the full typeinfo only if something goes wrong.

    The length and capacity of a slice are insufficient to determine where the
    underlying array starts or ends. However, the knowledge that the object is
    a slice can be used to determine the full typeinfo of the slice's element
    (and thus the actual size of the slice's element). However, if the GC sees
    a pointer (Go type: *T) to any part of the underlying array prior to seeing
    that the block is actually of type [N]U, it will retrieve the full typeinfo
    U - if U can be determined and T is insufficient. Typeinfo T may be
    sufficient to process the whole array.

    The easiest values to GC are for example Go's maps because they are
    completely self-contained and it is impossible to get a pointer to their
    interior.
    I see. Thanks!

    --
  • Dmitry Vyukov at Nov 17, 2012 at 4:26 pm

    On Sat, Nov 17, 2012 at 6:44 PM, Dmitry Vyukov wrote:
    On Sat, Nov 17, 2012 at 4:50 PM, ⚛ wrote:
    On Saturday, November 17, 2012 12:49:40 PM UTC+1, Dmitry Vyukov wrote:

    I am curious have you considered reordering fields in structs so that
    pointers are packed together? I understand that it's impossible in general
    case (when there are sub-structs), but in otherwise I think it's OK to
    arbitrary reorder fields. Then you can say in the metainfo -- in this
    object of size 128 scan only 4 words. This is, of course, complicates
    things.
    There is cgo so reordering fields in general is forbidden. Also the
    runtime is sharing certain Go types and C types.

    In my opinion, if GC already knows where the pointer fields are in the
    struct (knows their byte-offsets in the struct) then reordering the fields
    will not improve performance.

    One reason for this is that in any case the GC needs to check the value
    of every pointer in the struct. The performance gain obtained by reordering
    fields seems negligible in comparison to checking the values.

    A second reason is that while walking the heap each step
    ("instruction") in the GC implementation should generate a finite number of
    pointers. Typically, 1 step generates at most 1 pointer that needs to be
    checked. There should be an upper bound on the number of pointers generated
    in one step, so considering N consecutive words in one step would be
    problematic if N can be an arbitrary number.

    Each pointer needs to be considered separately and there is absolutely
    no relation between any two pointers P and Q. This uncertainty appears to
    be the main reason why reordering fields wouldn't make the GC much faster.

    One area where reordering fields would help is when the structure is
    big, the non-pointer fields in the structure are forming a gap, and this
    gap is putting pointer fields on different cachelines.

    I was thinking about an ideal case where you have only 1 instruction
    than says "first N fields are pointers", and you just need a single for
    loop 1..N. It should be not much slower than current non-precise gc in
    situation when structs contain only points.
    But I think your analysis is correct.

    Do I get it right that with type info there is no need check whether the
    pointer points to an allocated chunk of memory. It can be either 0 or a
    valid pointer to a valid allocated memory chunk, no other cases possible,
    right?
    It can be a pointer from C, or if the GC does not know the actual type of
    the object then it may be an integer (not a pointer).

    And additionally you know size of the pointed-to object -- it's either
    determined by type, or if it's a slice then size is in the subsequent word.
    Right?
    In general, no. The actual size of an object is unknown to GC, because in
    some cases it cannot be inferred from the type of the pointer. There are
    however cases when the GC knows the actual type of the object and thus
    knows the actual size.

    The GC code is robust in the sense that it will work even if there is no
    type information available about any object. If some type information is
    available, GC will use it and may be able to free more objects.

    It is impossible to decide whether to prefer the partial typeinfo about
    an object or the full typeinfo. Getting the full typeinfo is more costly
    than getting the partial one. In some cases the full typeinfo isn't
    available at all. The GC implementation is primarily using the partial
    typeinfo and tries to retrieve the full typeinfo only if something goes
    wrong.

    The length and capacity of a slice are insufficient to determine where
    the underlying array starts or ends. However, the knowledge that the object
    is a slice can be used to determine the full typeinfo of the slice's
    element (and thus the actual size of the slice's element). However, if the
    GC sees a pointer (Go type: *T) to any part of the underlying array prior
    to seeing that the block is actually of type [N]U, it will retrieve the
    full typeinfo U - if U can be determined and T is insufficient. Typeinfo T
    may be sufficient to process the whole array.

    The easiest values to GC are for example Go's maps because they are
    completely self-contained and it is impossible to get a pointer to their
    interior.
    I see. Thanks!
    Humm... can't type info help with the following issue?

    http://code.google.com/p/go/issues/detail?id=4246

    Basically, having an address in heap or .data/.bss I need to output
    variable name, it can be best-effort.

    --
  • Jingcheng Zhang at Nov 19, 2012 at 2:31 am

    On Sat, Nov 17, 2012 at 7:42 PM, ⚛ wrote:
    On Saturday, November 17, 2012 8:29:12 AM UTC+1, Jingcheng Zhang wrote:

    Hello everyone,

    Our business suffered from an annoying problem. We are developing an
    iMessage-like service in Go, the server can serves hundreds of
    thousands of concurrent TCP connection per process, and it's robust
    (be running for about a month), which is awesome. However, the process
    consumes 16GB memory quickly, since there are so many connections,
    there are also a lot of goroutines and buffered memories used. I
    extend the memory limit to 64GB by changing runtime/malloc.h and
    runtime/malloc.goc. It works, but brings a big problem too - The
    garbage collecting process is then extremely slow, it stops the world
    for about 10 seconds every 2 minutes, and brings me some problems
    which are very hard to trace, for example, when stoping the world,
    messages delivered may be lost. This is a disaster, since our service
    is a real-time service which requires delivering messages as fast as
    possible and there should be no stops and message lost at all.

    I'm planning to split the "big server process" to many "small
    processes" to avoid this problem (smaller memory footprint results to
    smaller time stop), and waiting for Go's new GC implementation.

    1. There haven't been reports about major memory leaks on 64-bit CPUs. Are
    you suspecting that the application may be generating memory leaks on a
    64-bit CPU?
    Runtime is stable, no memory leaks, our server processes all have
    uptime > 1 month.
    2. The performance of the newer GC implementation (CL 6114046, in respect to
    the current GC) depends on the structure of data existing in the heap. For
    example, if all structs in the heap contain pointer fields and no integer
    fields, then the new GC is slower than the current one. This slowdown is
    unavoidable, because in this case the new GC is processing more bits of
    information per machine word. On the other hand, if the structs contain a
    mix of field types (pointers, integers, etc), the new GC may be faster.
    So, the new GC may be faster or slower than the current one. In the cases I
    have seen so far, the performance in regular Go programs is approximately
    the same.
    The CL looks big. Does "precise GC" means "latency-free GC"? Or there
    is still improvement space between precise GC and latency-free GC (One
    of Go's goal)?
    Or any suggestions for me to improve our service currently? I don't
    know when Go's new latency-free garbage collection will occur.

    Thanks.

    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China

    --
  • Ian Lance Taylor at Nov 19, 2012 at 3:26 am

    On Sun, Nov 18, 2012 at 6:31 PM, Jingcheng Zhang wrote:
    The CL looks big. Does "precise GC" means "latency-free GC"? Or there
    is still improvement space between precise GC and latency-free GC (One
    of Go's goal)?
    Precise GC does not mean latency-free GC. It means a GC where only
    genuine pointers are considered. The opposite of precise GC is
    conservative GC, which is approximately what the Go runtime has now: a
    value that looks like a valid pointer value is treated as a valid
    pointer, even though in reality it may actually be, for example, a
    floating point number or a string. With precise GC, floating point or
    string values are never treated as pointers; only pointers are treated
    as pointers.

    Ian

    --
  • Jingcheng Zhang at Nov 19, 2012 at 7:07 am
    Thanks Ian for your explanation, so after precise GC, there should be
    another improvement exist to make it latency-free (ultimately, a
    precise, parallel, latency-free GC), right?
    On Mon, Nov 19, 2012 at 11:25 AM, Ian Lance Taylor wrote:
    On Sun, Nov 18, 2012 at 6:31 PM, Jingcheng Zhang wrote:

    The CL looks big. Does "precise GC" means "latency-free GC"? Or there
    is still improvement space between precise GC and latency-free GC (One
    of Go's goal)?
    Precise GC does not mean latency-free GC. It means a GC where only
    genuine pointers are considered. The opposite of precise GC is
    conservative GC, which is approximately what the Go runtime has now: a
    value that looks like a valid pointer value is treated as a valid
    pointer, even though in reality it may actually be, for example, a
    floating point number or a string. With precise GC, floating point or
    string values are never treated as pointers; only pointers are treated
    as pointers.

    Ian


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China

    --
  • Rémy Oudompheng at Nov 19, 2012 at 7:15 am

    On 2012/11/19 Jingcheng Zhang wrote:
    Thanks Ian for your explanation, so after precise GC, there should be
    another improvement exist to make it latency-free (ultimately, a
    precise, parallel, latency-free GC), right?
    It has been discussed but there is no plan as far as I know. It has
    been estimated that it would take months (probably a year) before a
    usable version would come out (if someone works on the subject of
    course).

    Rémy.

    --
  • ⚛ at Nov 19, 2012 at 8:32 am

    On Monday, November 19, 2012 8:07:36 AM UTC+1, Jingcheng Zhang wrote:

    Thanks Ian for your explanation, so after precise GC, there should be
    another improvement exist to make it latency-free (ultimately, a
    precise, parallel, latency-free GC), right?
    Maybe. It depends on the data structures the program is using.

    Implementing perfect latency-free concurrent GC would cause a slowdown in
    the overall throughput of many Go programs. So it cannot be said that
    concurrent latency-free GC is the ultimate goal.

    The current GC implementation allows Go programs to avoid GC pauses if the
    Go code is managing memory allocations and deallocations on its own. That
    is: objects which are known to be no longer in use can be put into buffers
    and the buffers will serve forthcoming allocations. However, this may
    increase the total memory consumption so this approach isn't applicable to
    all Go programs.

    On Mon, Nov 19, 2012 at 11:25 AM, Ian Lance Taylor wrote:
    On Sun, Nov 18, 2012 at 6:31 PM, Jingcheng Zhang wrote:

    The CL looks big. Does "precise GC" means "latency-free GC"? Or there
    is still improvement space between precise GC and latency-free GC (One
    of Go's goal)?
    Precise GC does not mean latency-free GC. It means a GC where only
    genuine pointers are considered. The opposite of precise GC is
    conservative GC, which is approximately what the Go runtime has now: a
    value that looks like a valid pointer value is treated as a valid
    pointer, even though in reality it may actually be, for example, a
    floating point number or a string. With precise GC, floating point or
    string values are never treated as pointers; only pointers are treated
    as pointers.

    Ian


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --
  • Sugu Sougoumarane at Nov 17, 2012 at 6:37 pm
    For vtocc (vitess), we measured an overhead of about 40K per connection.
    So, 16G sounds a little high, even for 100k connections. You may want to
    profile your memory to get a better picture of what's going on. We
    typically run anywhere betwen 5-20k connections, and rarely exceed 1G.
    Are you using Go 1? If so, you should try out a newer build with parallel
    GC. It should give you a speed up proportional to the number CPUs you have.
    If most of your memory is due to large buffer sizes, you should tone down
    GOGC lower (try 50?). This will cause the garbage collector to run more
    often, with shorter pauses. This is because the GC does not scan inside
    byte slices.
    On Friday, November 16, 2012 11:29:12 PM UTC-8, Jingcheng Zhang wrote:

    Hello everyone,

    Our business suffered from an annoying problem. We are developing an
    iMessage-like service in Go, the server can serves hundreds of
    thousands of concurrent TCP connection per process, and it's robust
    (be running for about a month), which is awesome. However, the process
    consumes 16GB memory quickly, since there are so many connections,
    there are also a lot of goroutines and buffered memories used. I
    extend the memory limit to 64GB by changing runtime/malloc.h and
    runtime/malloc.goc. It works, but brings a big problem too - The
    garbage collecting process is then extremely slow, it stops the world
    for about 10 seconds every 2 minutes, and brings me some problems
    which are very hard to trace, for example, when stoping the world,
    messages delivered may be lost. This is a disaster, since our service
    is a real-time service which requires delivering messages as fast as
    possible and there should be no stops and message lost at all.

    I'm planning to split the "big server process" to many "small
    processes" to avoid this problem (smaller memory footprint results to
    smaller time stop), and waiting for Go's new GC implementation.

    Or any suggestions for me to improve our service currently? I don't
    know when Go's new latency-free garbage collection will occur.

    Thanks.

    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --
  • Jingcheng Zhang at Nov 19, 2012 at 2:44 am

    On Sun, Nov 18, 2012 at 2:37 AM, Sugu Sougoumarane wrote:
    For vtocc (vitess), we measured an overhead of about 40K per connection. So,
    16G sounds a little high, even for 100k connections. You may want to profile
    your memory to get a better picture of what's going on. We typically run
    anywhere betwen 5-20k connections, and rarely exceed 1G.
    Are you using Go 1? If so, you should try out a newer build with parallel
    GC. It should give you a speed up proportional to the number CPUs you have.
    If most of your memory is due to large buffer sizes, you should tone down
    GOGC lower (try 50?). This will cause the garbage collector to run more
    often, with shorter pauses. This is because the GC does not scan inside byte
    slices.
    Currently we serve 600,000 concurrent, keep-alive TCP connections, per
    process. The process consumes 16GB res memory, so each connection
    28KB.
    Go version is 1.0.3, amd64, with GOGC set to 200.

    I'll tune GOGC and Scavenger's GC frequency to see if there are any
    space to improve beside of code optimization.
    Thanks for your help.
    On Friday, November 16, 2012 11:29:12 PM UTC-8, Jingcheng Zhang wrote:

    Hello everyone,

    Our business suffered from an annoying problem. We are developing an
    iMessage-like service in Go, the server can serves hundreds of
    thousands of concurrent TCP connection per process, and it's robust
    (be running for about a month), which is awesome. However, the process
    consumes 16GB memory quickly, since there are so many connections,
    there are also a lot of goroutines and buffered memories used. I
    extend the memory limit to 64GB by changing runtime/malloc.h and
    runtime/malloc.goc. It works, but brings a big problem too - The
    garbage collecting process is then extremely slow, it stops the world
    for about 10 seconds every 2 minutes, and brings me some problems
    which are very hard to trace, for example, when stoping the world,
    messages delivered may be lost. This is a disaster, since our service
    is a real-time service which requires delivering messages as fast as
    possible and there should be no stops and message lost at all.

    I'm planning to split the "big server process" to many "small
    processes" to avoid this problem (smaller memory footprint results to
    smaller time stop), and waiting for Go's new GC implementation.

    Or any suggestions for me to improve our service currently? I don't
    know when Go's new latency-free garbage collection will occur.

    Thanks.

    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China

    --
  • Sugu Sougoumarane at Nov 19, 2012 at 4:37 am

    On Sunday, November 18, 2012 6:44:45 PM UTC-8, Jingcheng Zhang wrote:
    On Sun, Nov 18, 2012 at 2:37 AM, Sugu Sougoumarane wrote:
    For vtocc (vitess), we measured an overhead of about 40K per connection. So,
    16G sounds a little high, even for 100k connections. You may want to profile
    your memory to get a better picture of what's going on. We typically run
    anywhere betwen 5-20k connections, and rarely exceed 1G.
    Are you using Go 1? If so, you should try out a newer build with parallel
    GC. It should give you a speed up proportional to the number CPUs you have.
    If most of your memory is due to large buffer sizes, you should tone down
    GOGC lower (try 50?). This will cause the garbage collector to run more
    often, with shorter pauses. This is because the GC does not scan inside byte
    slices.
    Currently we serve 600,000 concurrent, keep-alive TCP connections, per
    process. The process consumes 16GB res memory, so each connection
    28KB.
    Go version is 1.0.3, amd64, with GOGC set to 200.

    I'll tune GOGC and Scavenger's GC frequency to see if there are any
    space to improve beside of code optimization.
    Thanks for your help.
    600k is a lot of connections :). However, a pause time of 10 seconds seems
    suspicious for 16G. It should be in the ballpark of 1-2 seconds for an
    8-core box. This makes me think that 1.0.3 doesn't have the parallel GC
    improvements. I assume you have GOMAXPROCS set correctly.

    On Friday, November 16, 2012 11:29:12 PM UTC-8, Jingcheng Zhang wrote:

    Hello everyone,

    Our business suffered from an annoying problem. We are developing an
    iMessage-like service in Go, the server can serves hundreds of
    thousands of concurrent TCP connection per process, and it's robust
    (be running for about a month), which is awesome. However, the process
    consumes 16GB memory quickly, since there are so many connections,
    there are also a lot of goroutines and buffered memories used. I
    extend the memory limit to 64GB by changing runtime/malloc.h and
    runtime/malloc.goc. It works, but brings a big problem too - The
    garbage collecting process is then extremely slow, it stops the world
    for about 10 seconds every 2 minutes, and brings me some problems
    which are very hard to trace, for example, when stoping the world,
    messages delivered may be lost. This is a disaster, since our service
    is a real-time service which requires delivering messages as fast as
    possible and there should be no stops and message lost at all.

    I'm planning to split the "big server process" to many "small
    processes" to avoid this problem (smaller memory footprint results to
    smaller time stop), and waiting for Go's new GC implementation.

    Or any suggestions for me to improve our service currently? I don't
    know when Go's new latency-free garbage collection will occur.

    Thanks.

    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --
  • Dmitry Vyukov at Nov 19, 2012 at 4:55 am

    On Mon, Nov 19, 2012 at 8:37 AM, Sugu Sougoumarane wrote:
    On Sun, Nov 18, 2012 at 2:37 AM, Sugu Sougoumarane wrote:

    For vtocc (vitess), we measured an overhead of about 40K per
    connection. So,
    16G sounds a little high, even for 100k connections. You may want to profile
    your memory to get a better picture of what's going on. We typically run
    anywhere betwen 5-20k connections, and rarely exceed 1G.
    Are you using Go 1? If so, you should try out a newer build with parallel
    GC. It should give you a speed up proportional to the number CPUs you have.
    If most of your memory is due to large buffer sizes, you should tone down
    GOGC lower (try 50?). This will cause the garbage collector to run more
    often, with shorter pauses. This is because the GC does not scan inside byte
    slices.
    Currently we serve 600,000 concurrent, keep-alive TCP connections, per
    process. The process consumes 16GB res memory, so each connection
    28KB.
    Go version is 1.0.3, amd64, with GOGC set to 200.

    I'll tune GOGC and Scavenger's GC frequency to see if there are any
    space to improve beside of code optimization.
    Thanks for your help.
    600k is a lot of connections :). However, a pause time of 10 seconds seems
    suspicious for 16G. It should be in the ballpark of 1-2 seconds for an
    8-core box. This makes me think that 1.0.3 doesn't have the parallel GC
    improvements. I assume you have GOMAXPROCS set correctly.

    Yes, Go1.0.3 does not have the parallel GC improvements. With the improved
    GC and GOMAXPROCS=8 it can drop to 2 seconds.




    On Friday, November 16, 2012 11:29:12 PM UTC-8, Jingcheng Zhang wrote:

    Hello everyone,

    Our business suffered from an annoying problem. We are developing an
    iMessage-like service in Go, the server can serves hundreds of
    thousands of concurrent TCP connection per process, and it's robust
    (be running for about a month), which is awesome. However, the process
    consumes 16GB memory quickly, since there are so many connections,
    there are also a lot of goroutines and buffered memories used. I
    extend the memory limit to 64GB by changing runtime/malloc.h and
    runtime/malloc.goc. It works, but brings a big problem too - The
    garbage collecting process is then extremely slow, it stops the world
    for about 10 seconds every 2 minutes, and brings me some problems
    which are very hard to trace, for example, when stoping the world,
    messages delivered may be lost. This is a disaster, since our service
    is a real-time service which requires delivering messages as fast as
    possible and there should be no stops and message lost at all.

    I'm planning to split the "big server process" to many "small
    processes" to avoid this problem (smaller memory footprint results to
    smaller time stop), and waiting for Go's new GC implementation.

    Or any suggestions for me to improve our service currently? I don't
    know when Go's new latency-free garbage collection will occur.

    Thanks.

    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --

    --
  • Jingcheng Zhang at Nov 19, 2012 at 7:26 am

    On Mon, Nov 19, 2012 at 12:55 PM, Dmitry Vyukov wrote:
    On Mon, Nov 19, 2012 at 8:37 AM, Sugu Sougoumarane wrote:

    On Sun, Nov 18, 2012 at 2:37 AM, Sugu Sougoumarane <sou...@google.com>
    wrote:
    For vtocc (vitess), we measured an overhead of about 40K per
    connection. So,
    16G sounds a little high, even for 100k connections. You may want to
    profile
    your memory to get a better picture of what's going on. We typically
    run
    anywhere betwen 5-20k connections, and rarely exceed 1G.
    Are you using Go 1? If so, you should try out a newer build with
    parallel
    GC. It should give you a speed up proportional to the number CPUs you
    have.
    If most of your memory is due to large buffer sizes, you should tone
    down
    GOGC lower (try 50?). This will cause the garbage collector to run more
    often, with shorter pauses. This is because the GC does not scan inside
    byte
    slices.
    Currently we serve 600,000 concurrent, keep-alive TCP connections, per
    process. The process consumes 16GB res memory, so each connection
    28KB.
    Go version is 1.0.3, amd64, with GOGC set to 200.

    I'll tune GOGC and Scavenger's GC frequency to see if there are any
    space to improve beside of code optimization.
    Thanks for your help.

    600k is a lot of connections :). However, a pause time of 10 seconds seems
    suspicious for 16G. It should be in the ballpark of 1-2 seconds for an
    8-core box. This makes me think that 1.0.3 doesn't have the parallel GC
    improvements. I assume you have GOMAXPROCS set correctly.


    Yes, Go1.0.3 does not have the parallel GC improvements. With the improved
    GC and GOMAXPROCS=8 it can drop to 2 seconds.
    I heard that dl.google.com is using tip of Go, but I am afraid of the stability.
    Could Brad tell which revision is dl.google.com currently using?

    Thanks very much!

    On Friday, November 16, 2012 11:29:12 PM UTC-8, Jingcheng Zhang wrote:

    Hello everyone,

    Our business suffered from an annoying problem. We are developing an
    iMessage-like service in Go, the server can serves hundreds of
    thousands of concurrent TCP connection per process, and it's robust
    (be running for about a month), which is awesome. However, the process
    consumes 16GB memory quickly, since there are so many connections,
    there are also a lot of goroutines and buffered memories used. I
    extend the memory limit to 64GB by changing runtime/malloc.h and
    runtime/malloc.goc. It works, but brings a big problem too - The
    garbage collecting process is then extremely slow, it stops the world
    for about 10 seconds every 2 minutes, and brings me some problems
    which are very hard to trace, for example, when stoping the world,
    messages delivered may be lost. This is a disaster, since our service
    is a real-time service which requires delivering messages as fast as
    possible and there should be no stops and message lost at all.

    I'm planning to split the "big server process" to many "small
    processes" to avoid this problem (smaller memory footprint results to
    smaller time stop), and waiting for Go's new GC implementation.

    Or any suggestions for me to improve our service currently? I don't
    know when Go's new latency-free garbage collection will occur.

    Thanks.

    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --

    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China

    --
  • Dave Cheney at Nov 19, 2012 at 7:21 am
    Tip.
    On 19 Nov 2012 18:20, "Jingcheng Zhang" wrote:
    On Mon, Nov 19, 2012 at 12:55 PM, Dmitry Vyukov wrote:
    On Mon, Nov 19, 2012 at 8:37 AM, Sugu Sougoumarane <sougou@google.com>
    wrote:
    On Sun, Nov 18, 2012 at 2:37 AM, Sugu Sougoumarane <sou...@google.com>
    wrote:
    For vtocc (vitess), we measured an overhead of about 40K per
    connection. So,
    16G sounds a little high, even for 100k connections. You may want to
    profile
    your memory to get a better picture of what's going on. We typically
    run
    anywhere betwen 5-20k connections, and rarely exceed 1G.
    Are you using Go 1? If so, you should try out a newer build with
    parallel
    GC. It should give you a speed up proportional to the number CPUs you
    have.
    If most of your memory is due to large buffer sizes, you should tone
    down
    GOGC lower (try 50?). This will cause the garbage collector to run
    more
    often, with shorter pauses. This is because the GC does not scan
    inside
    byte
    slices.
    Currently we serve 600,000 concurrent, keep-alive TCP connections, per
    process. The process consumes 16GB res memory, so each connection
    28KB.
    Go version is 1.0.3, amd64, with GOGC set to 200.

    I'll tune GOGC and Scavenger's GC frequency to see if there are any
    space to improve beside of code optimization.
    Thanks for your help.

    600k is a lot of connections :). However, a pause time of 10 seconds
    seems
    suspicious for 16G. It should be in the ballpark of 1-2 seconds for an
    8-core box. This makes me think that 1.0.3 doesn't have the parallel GC
    improvements. I assume you have GOMAXPROCS set correctly.


    Yes, Go1.0.3 does not have the parallel GC improvements. With the improved
    GC and GOMAXPROCS=8 it can drop to 2 seconds.
    I heard that dl.google.com is using tip of Go, but I am afraid of the
    stability.
    Could Brad tell which revision is dl.google.com currently using?

    Thanks very much!

    On Friday, November 16, 2012 11:29:12 PM UTC-8, Jingcheng Zhang
    wrote:
    Hello everyone,

    Our business suffered from an annoying problem. We are developing an
    iMessage-like service in Go, the server can serves hundreds of
    thousands of concurrent TCP connection per process, and it's robust
    (be running for about a month), which is awesome. However, the
    process
    consumes 16GB memory quickly, since there are so many connections,
    there are also a lot of goroutines and buffered memories used. I
    extend the memory limit to 64GB by changing runtime/malloc.h and
    runtime/malloc.goc. It works, but brings a big problem too - The
    garbage collecting process is then extremely slow, it stops the
    world
    for about 10 seconds every 2 minutes, and brings me some problems
    which are very hard to trace, for example, when stoping the world,
    messages delivered may be lost. This is a disaster, since our
    service
    is a real-time service which requires delivering messages as fast as
    possible and there should be no stops and message lost at all.

    I'm planning to split the "big server process" to many "small
    processes" to avoid this problem (smaller memory footprint results
    to
    smaller time stop), and waiting for Go's new GC implementation.

    Or any suggestions for me to improve our service currently? I don't
    know when Go's new latency-free garbage collection will occur.

    Thanks.

    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --

    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China

    --

    --
  • David Symonds at Nov 19, 2012 at 7:26 am

    On Mon, Nov 19, 2012 at 6:20 PM, Jingcheng Zhang wrote:

    I heard that dl.google.com is using tip of Go, but I am afraid of the stability.
    Could Brad tell which revision is dl.google.com currently using?
    It's not exactly tip, but it's pretty close, and it's got almost all
    the changes that would cause stability concerns (GC work, etc.).


    Dave.

    --
  • Jingcheng Zhang at Nov 19, 2012 at 7:45 am

    On Mon, Nov 19, 2012 at 3:26 PM, David Symonds wrote:
    On Mon, Nov 19, 2012 at 6:20 PM, Jingcheng Zhang wrote:

    I heard that dl.google.com is using tip of Go, but I am afraid of the stability.
    Could Brad tell which revision is dl.google.com currently using?
    It's not exactly tip, but it's pretty close, and it's got almost all
    the changes that would cause stability concerns (GC work, etc.).
    Does this mean that there is an internal branch of the tip?
    Or only update to current tip when there are some changes improving
    the stability?
    Dave.


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China

    --
  • Sugu Sougoumarane at Nov 19, 2012 at 8:24 am

    Does this mean that there is an internal branch of the tip?
    Or only update to current tip when there are some changes improving
    the stability?
    For the longest time, we've run vtocc on this version of go: 4fdf6aa4f602
    from a June snapshot.
    We also have other servers that use a more recent snapshot: 024dde07c08d
    from October.

    Both those versions have the parallel GC work. If you're skeptical, you can
    use the older one. But the newer snapshot may contain other improvements.

    --
  • Jingcheng Zhang at Nov 19, 2012 at 7:16 am

    On Mon, Nov 19, 2012 at 12:37 PM, Sugu Sougoumarane wrote:
    On Sunday, November 18, 2012 6:44:45 PM UTC-8, Jingcheng Zhang wrote:

    On Sun, Nov 18, 2012 at 2:37 AM, Sugu Sougoumarane <sou...@google.com>
    wrote:
    For vtocc (vitess), we measured an overhead of about 40K per connection.
    So,
    16G sounds a little high, even for 100k connections. You may want to
    profile
    your memory to get a better picture of what's going on. We typically run
    anywhere betwen 5-20k connections, and rarely exceed 1G.
    Are you using Go 1? If so, you should try out a newer build with
    parallel
    GC. It should give you a speed up proportional to the number CPUs you
    have.
    If most of your memory is due to large buffer sizes, you should tone
    down
    GOGC lower (try 50?). This will cause the garbage collector to run more
    often, with shorter pauses. This is because the GC does not scan inside
    byte
    slices.
    Currently we serve 600,000 concurrent, keep-alive TCP connections, per
    process. The process consumes 16GB res memory, so each connection
    28KB.
    Go version is 1.0.3, amd64, with GOGC set to 200.

    I'll tune GOGC and Scavenger's GC frequency to see if there are any
    space to improve beside of code optimization.
    Thanks for your help.

    600k is a lot of connections :). However, a pause time of 10 seconds seems
    suspicious for 16G. It should be in the ballpark of 1-2 seconds for an
    8-core box. This makes me think that 1.0.3 doesn't have the parallel GC
    improvements. I assume you have GOMAXPROCS set correctly.
    Yes, I set GOMAXPROCS with:

    runtime.GOMAXPROCS(runtime.NumCPU())

    But runtime.NumCPU() is 24 in our box, while Go's MaxGcproc is 8 by default.
    On Friday, November 16, 2012 11:29:12 PM UTC-8, Jingcheng Zhang wrote:

    Hello everyone,

    Our business suffered from an annoying problem. We are developing an
    iMessage-like service in Go, the server can serves hundreds of
    thousands of concurrent TCP connection per process, and it's robust
    (be running for about a month), which is awesome. However, the process
    consumes 16GB memory quickly, since there are so many connections,
    there are also a lot of goroutines and buffered memories used. I
    extend the memory limit to 64GB by changing runtime/malloc.h and
    runtime/malloc.goc. It works, but brings a big problem too - The
    garbage collecting process is then extremely slow, it stops the world
    for about 10 seconds every 2 minutes, and brings me some problems
    which are very hard to trace, for example, when stoping the world,
    messages delivered may be lost. This is a disaster, since our service
    is a real-time service which requires delivering messages as fast as
    possible and there should be no stops and message lost at all.

    I'm planning to split the "big server process" to many "small
    processes" to avoid this problem (smaller memory footprint results to
    smaller time stop), and waiting for Go's new GC implementation.

    Or any suggestions for me to improve our service currently? I don't
    know when Go's new latency-free garbage collection will occur.

    Thanks.

    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China

    --
  • Rémy Oudompheng at Nov 19, 2012 at 7:12 am

    On 2012/11/19 Jingcheng Zhang wrote:
    Currently we serve 600,000 concurrent, keep-alive TCP connections, per
    process. The process consumes 16GB res memory, so each connection
    28KB.
    Go version is 1.0.3, amd64, with GOGC set to 200.

    I'll tune GOGC and Scavenger's GC frequency to see if there are any
    space to improve beside of code optimization.
    Thanks for your help.
    Resident memory is not an accurate way of measuring your process used
    memory. Can you run the server with GOGCTRACE=1 and give the GC
    statistics that come out?

    Please also run "go tool pprof --inuse_objects
    http://myserver:myport/debug/pprof/heap" on your server with
    net/http/pprof enabled (or get memory profiling by other means). It is
    really essential to obtain improvements.

    --
  • Jingcheng Zhang at Nov 19, 2012 at 7:35 am

    On Mon, Nov 19, 2012 at 3:12 PM, Rémy Oudompheng wrote:
    On 2012/11/19 Jingcheng Zhang wrote:
    Currently we serve 600,000 concurrent, keep-alive TCP connections, per
    process. The process consumes 16GB res memory, so each connection
    28KB.
    Go version is 1.0.3, amd64, with GOGC set to 200.

    I'll tune GOGC and Scavenger's GC frequency to see if there are any
    space to improve beside of code optimization.
    Thanks for your help.
    Resident memory is not an accurate way of measuring your process used
    memory. Can you run the server with GOGCTRACE=1 and give the GC
    statistics that come out?

    Please also run "go tool pprof --inuse_objects
    http://myserver:myport/debug/pprof/heap" on your server with
    net/http/pprof enabled (or get memory profiling by other means). It is
    really essential to obtain improvements.
    Would enable this behavior kill the server's performance?
    If not, I'll try it on one of the servers.

    Thanks!



    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China

    --
  • Rémy Oudompheng at Nov 19, 2012 at 7:50 am

    On 2012/11/19 Jingcheng Zhang wrote:
    On Mon, Nov 19, 2012 at 3:12 PM, Rémy Oudompheng
    wrote:
    On 2012/11/19 Jingcheng Zhang wrote:
    Resident memory is not an accurate way of measuring your process used
    memory. Can you run the server with GOGCTRACE=1 and give the GC
    statistics that come out?

    Please also run "go tool pprof --inuse_objects
    http://myserver:myport/debug/pprof/heap" on your server with
    net/http/pprof enabled (or get memory profiling by other means). It is
    really essential to obtain improvements.
    Would enable this behavior kill the server's performance?
    If not, I'll try it on one of the servers.
    Memory profiling is enabled by default even if you don't ask for it.
    Requesting the memory profile through http can be a costly operation,
    but it's only at the moment you use it and it's only extra CPU
    consumption.

    GOGCTRACE=1 enables debugging printing after each GC: it's only a
    print of 2 lines at each GC, which is probably much cheaper than the
    10 second pause.

    Rémy.

    --
  • ⚛ at Nov 19, 2012 at 8:51 am

    On Monday, November 19, 2012 3:44:45 AM UTC+1, Jingcheng Zhang wrote:
    On Sun, Nov 18, 2012 at 2:37 AM, Sugu Sougoumarane wrote:
    For vtocc (vitess), we measured an overhead of about 40K per connection. So,
    16G sounds a little high, even for 100k connections. You may want to profile
    your memory to get a better picture of what's going on. We typically run
    anywhere betwen 5-20k connections, and rarely exceed 1G.
    Are you using Go 1? If so, you should try out a newer build with parallel
    GC. It should give you a speed up proportional to the number CPUs you have.
    If most of your memory is due to large buffer sizes, you should tone down
    GOGC lower (try 50?). This will cause the garbage collector to run more
    often, with shorter pauses. This is because the GC does not scan inside byte
    slices.
    Currently we serve 600,000 concurrent, keep-alive TCP connections, per
    process. The process consumes 16GB res memory, so each connection
    28KB.
    Within a 5 minute time window, how many of those 600,000 connections are
    receiving or sending data?

    --
  • Bryanturley at Nov 19, 2012 at 3:29 pm
    Could manually running the gc more often help in this case? Less dead
    objects to scan perhaps.

    --
  • Han-Wen Nienhuys at Nov 19, 2012 at 3:38 pm

    On Mon, Nov 19, 2012 at 4:29 PM, bryanturley wrote:
    Could manually running the gc more often help in this case? Less dead
    objects to scan perhaps.
    Dead objects are not scanned. They are only sweeped.

    --
    Han-Wen Nienhuys
    Google Munich
    hanwen@google.com

    --
  • ⚛ at Nov 19, 2012 at 4:22 pm

    On Monday, November 19, 2012 4:38:46 PM UTC+1, Han-Wen Nienhuys wrote:
    On Mon, Nov 19, 2012 at 4:29 PM, bryanturley wrote:
    Could manually running the gc more often help in this case? Less dead
    objects to scan perhaps.
    Dead objects are not scanned. They are only sweeped.
    The sweep phase can be a fairly large part of a GC. It is hard to tell what
    the exact numbers are without running the server in question as
    "GOGCTRACE=1 ./server-binary".

    --
  • Bryanturley at Nov 19, 2012 at 5:13 pm

    On Monday, November 19, 2012 9:38:46 AM UTC-6, Han-Wen Nienhuys wrote:
    On Mon, Nov 19, 2012 at 4:29 PM, bryanturley wrote:
    Could manually running the gc more often help in this case? Less dead
    objects to scan perhaps.
    Dead objects are not scanned. They are only sweeped.
    Yeah, I meant less dead objects to scan *for*.
    If a program is making a lot of allocations that are short lived wouldn't
    scanning/reaping more often lead to less scan time during a scan?

    Every 2 minutes his code stopped and gc'ed for 10 seconds. What if every 5
    seconds he gc'ed? or every 20 seconds?
    10 second pause every 2 minutes with tons of short lived objects could lead
    to (optimistic) ~5sec every min, ~2.5 every 30secs, ~1.25 every 15secs.
    It is more likely to be a curve though, and it would slow the program down
    overall but you *might* be able to make it smoother.

    Just a guess though he would have to try/measure and see.

    --
  • Jingcheng Zhang at Nov 21, 2012 at 10:49 am
    Hello everyone,

    Thanks for all your help, I updated our Go version to:

    go version devel +852ee39cc8c4 Mon Nov 19 06:53:58 2012 +1100

    and rebuilt our servers, now GC duration reduced to 1~2 seconds, it's
    a big improvement!
    Thank contributors on the new GC!
    On Tue, Nov 20, 2012 at 1:13 AM, bryanturley wrote:

    On Monday, November 19, 2012 9:38:46 AM UTC-6, Han-Wen Nienhuys wrote:
    On Mon, Nov 19, 2012 at 4:29 PM, bryanturley wrote:
    Could manually running the gc more often help in this case? Less dead
    objects to scan perhaps.
    Dead objects are not scanned. They are only sweeped.
    Yeah, I meant less dead objects to scan *for*.
    If a program is making a lot of allocations that are short lived wouldn't
    scanning/reaping more often lead to less scan time during a scan?

    Every 2 minutes his code stopped and gc'ed for 10 seconds. What if every 5
    seconds he gc'ed? or every 20 seconds?
    10 second pause every 2 minutes with tons of short lived objects could lead
    to (optimistic) ~5sec every min, ~2.5 every 30secs, ~1.25 every 15secs.
    It is more likely to be a curve though, and it would slow the program down
    overall but you *might* be able to make it smoother.

    Just a guess though he would have to try/measure and see.

    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China

    --
  • Anoop K at Nov 21, 2012 at 11:14 am
    How much is the total memory consumed with new GO version ?
    On Wednesday, 21 November 2012 16:20:03 UTC+5:30, Jingcheng Zhang wrote:

    Hello everyone,

    Thanks for all your help, I updated our Go version to:

    go version devel +852ee39cc8c4 Mon Nov 19 06:53:58 2012 +1100

    and rebuilt our servers, now GC duration reduced to 1~2 seconds, it's
    a big improvement!
    Thank contributors on the new GC!
    On Tue, Nov 20, 2012 at 1:13 AM, bryanturley wrote:

    On Monday, November 19, 2012 9:38:46 AM UTC-6, Han-Wen Nienhuys wrote:
    On Mon, Nov 19, 2012 at 4:29 PM, bryanturley wrote:
    Could manually running the gc more often help in this case? Less
    dead
    objects to scan perhaps.
    Dead objects are not scanned. They are only sweeped.
    Yeah, I meant less dead objects to scan *for*.
    If a program is making a lot of allocations that are short lived wouldn't
    scanning/reaping more often lead to less scan time during a scan?

    Every 2 minutes his code stopped and gc'ed for 10 seconds. What if every 5
    seconds he gc'ed? or every 20 seconds?
    10 second pause every 2 minutes with tons of short lived objects could lead
    to (optimistic) ~5sec every min, ~2.5 every 30secs, ~1.25 every 15secs.
    It is more likely to be a curve though, and it would slow the program down
    overall but you *might* be able to make it smoother.

    Just a guess though he would have to try/measure and see.

    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --
  • Dave Cheney at Nov 21, 2012 at 11:25 am
    Fantastic news, Dmitry will be proud.
    On 21/11/2012, at 21:49, Jingcheng Zhang wrote:

    Hello everyone,

    Thanks for all your help, I updated our Go version to:

    go version devel +852ee39cc8c4 Mon Nov 19 06:53:58 2012 +1100

    and rebuilt our servers, now GC duration reduced to 1~2 seconds, it's
    a big improvement!
    Thank contributors on the new GC!
    On Tue, Nov 20, 2012 at 1:13 AM, bryanturley wrote:

    On Monday, November 19, 2012 9:38:46 AM UTC-6, Han-Wen Nienhuys wrote:
    On Mon, Nov 19, 2012 at 4:29 PM, bryanturley wrote:
    Could manually running the gc more often help in this case? Less dead
    objects to scan perhaps.
    Dead objects are not scanned. They are only sweeped.
    Yeah, I meant less dead objects to scan *for*.
    If a program is making a lot of allocations that are short lived wouldn't
    scanning/reaping more often lead to less scan time during a scan?

    Every 2 minutes his code stopped and gc'ed for 10 seconds. What if every 5
    seconds he gc'ed? or every 20 seconds?
    10 second pause every 2 minutes with tons of short lived objects could lead
    to (optimistic) ~5sec every min, ~2.5 every 30secs, ~1.25 every 15secs.
    It is more likely to be a curve though, and it would slow the program down
    overall but you *might* be able to make it smoother.

    Just a guess though he would have to try/measure and see.

    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China

    --
    --
  • Steve wang at Nov 21, 2012 at 3:07 pm

    On Wednesday, November 21, 2012 6:50:03 PM UTC+8, Jingcheng Zhang wrote:
    Hello everyone,

    Thanks for all your help, I updated our Go version to:

    go version devel +852ee39cc8c4 Mon Nov 19 06:53:58 2012 +1100

    and rebuilt our servers, now GC duration reduced to 1~2 seconds, it's
    a big improvement!
    Is it possible that GC does even better?
    One second is still a noticeable interruption when serving game players.

    Thank contributors on the new GC!
    On Tue, Nov 20, 2012 at 1:13 AM, bryanturley wrote:

    On Monday, November 19, 2012 9:38:46 AM UTC-6, Han-Wen Nienhuys wrote:
    On Mon, Nov 19, 2012 at 4:29 PM, bryanturley wrote:
    Could manually running the gc more often help in this case? Less
    dead
    objects to scan perhaps.
    Dead objects are not scanned. They are only sweeped.
    Yeah, I meant less dead objects to scan *for*.
    If a program is making a lot of allocations that are short lived wouldn't
    scanning/reaping more often lead to less scan time during a scan?

    Every 2 minutes his code stopped and gc'ed for 10 seconds. What if every 5
    seconds he gc'ed? or every 20 seconds?
    10 second pause every 2 minutes with tons of short lived objects could lead
    to (optimistic) ~5sec every min, ~2.5 every 30secs, ~1.25 every 15secs.
    It is more likely to be a curve though, and it would slow the program down
    overall but you *might* be able to make it smoother.

    Just a guess though he would have to try/measure and see.

    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --
  • Ian Lance Taylor at Nov 21, 2012 at 3:08 pm

    On Wed, Nov 21, 2012 at 7:00 AM, steve wang wrote:
    On Wednesday, November 21, 2012 6:50:03 PM UTC+8, Jingcheng Zhang wrote:

    and rebuilt our servers, now GC duration reduced to 1~2 seconds, it's
    a big improvement!
    Is it possible that GC does even better?
    One second is still a noticeable interruption when serving game players.
    Yes, it is possible.

    In fact 1-2 seconds is still surprisingly high.

    Ian

    --
  • Bryanturley at Nov 21, 2012 at 6:11 pm

    On Wednesday, November 21, 2012 9:00:40 AM UTC-6, steve wang wrote:

    On Wednesday, November 21, 2012 6:50:03 PM UTC+8, Jingcheng Zhang wrote:

    Hello everyone,

    Thanks for all your help, I updated our Go version to:

    go version devel +852ee39cc8c4 Mon Nov 19 06:53:58 2012 +1100

    and rebuilt our servers, now GC duration reduced to 1~2 seconds, it's
    a big improvement!
    Is it possible that GC does even better?
    One second is still a noticeable interruption when serving game players.
    Those are gc times on his workload, you would have to measure yourself on
    others.

    --
  • Dave Cheney at Nov 21, 2012 at 7:52 pm
    Posibly , the OP has not yet provided the debugging information that was
    requested.
    On 22 Nov 2012 02:00, "steve wang" wrote:


    On Wednesday, November 21, 2012 6:50:03 PM UTC+8, Jingcheng Zhang wrote:

    Hello everyone,

    Thanks for all your help, I updated our Go version to:

    go version devel +852ee39cc8c4 Mon Nov 19 06:53:58 2012 +1100

    and rebuilt our servers, now GC duration reduced to 1~2 seconds, it's
    a big improvement!
    Is it possible that GC does even better?
    One second is still a noticeable interruption when serving game players.

    Thank contributors on the new GC!
    On Tue, Nov 20, 2012 at 1:13 AM, bryanturley wrote:

    On Monday, November 19, 2012 9:38:46 AM UTC-6, Han-Wen Nienhuys wrote:

    On Mon, Nov 19, 2012 at 4:29 PM, bryanturley <bryan...@gmail.com>
    wrote:
    Could manually running the gc more often help in this case? Less
    dead
    objects to scan perhaps.
    Dead objects are not scanned. They are only sweeped.
    Yeah, I meant less dead objects to scan *for*.
    If a program is making a lot of allocations that are short lived wouldn't
    scanning/reaping more often lead to less scan time during a scan?

    Every 2 minutes his code stopped and gc'ed for 10 seconds. What if every 5
    seconds he gc'ed? or every 20 seconds?
    10 second pause every 2 minutes with tons of short lived objects could lead
    to (optimistic) ~5sec every min, ~2.5 every 30secs, ~1.25 every 15secs.
    It is more likely to be a curve though, and it would slow the program down
    overall but you *might* be able to make it smoother.

    Just a guess though he would have to try/measure and see.

    --


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --

    --
  • Dmitry Vyukov at Nov 23, 2012 at 7:50 am

    On Wed, Nov 21, 2012 at 2:49 PM, Jingcheng Zhang wrote:

    Hello everyone,

    Thanks for all your help, I updated our Go version to:

    go version devel +852ee39cc8c4 Mon Nov 19 06:53:58 2012 +1100

    and rebuilt our servers, now GC duration reduced to 1~2 seconds, it's
    a big improvement!
    Thank contributors on the new GC!


    Hi,

    How many hardware threads do you have? If you have a huge heap and more
    than 8 hardware threads, can you try to bump maximum number of GC worker
    threads, and check if it improves pause further?

    To do this you need to edit src/pkg/runtime/malloc.h
    MaxGcproc = 8,
    \/\/\/\/\/\/\/\/\/\/\/
    MaxGcproc = 16/32/64,
    and then rebuild everything.

    I've limited maximum number of GC threads to 8, because I was testing on a
    machine with only 8 HT cores (16 hw threads total, but only 8 real cores)
    and on tests that consume ~300MB. If the heap is e.g. > 2GB it may make
    sense to increase number of threads further.

    --
  • Jingcheng Zhang at Nov 28, 2012 at 8:48 am
    Hello Dmitry,

    Sorry to reply your mail so late. I noticed this variable before but
    am not sure what will happen if I increase it to 12 or 24
    (our server has 24 hardware threads: 2 CPUs, 6 core per CPU, with HT
    support), as it's not exactly 2^N.

    Does "proc" in "MaxGcproc" mean "24 logic cores with HT support" or
    "12 real cores" in our server?
    Or any difference for "MaxGcproc" between logic core with HT support
    and real core?

    Thanks,
    Jingcheng Zhang
    On Fri, Nov 23, 2012 at 3:50 PM, Dmitry Vyukov wrote:
    On Wed, Nov 21, 2012 at 2:49 PM, Jingcheng Zhang wrote:

    Hello everyone,

    Thanks for all your help, I updated our Go version to:

    go version devel +852ee39cc8c4 Mon Nov 19 06:53:58 2012 +1100

    and rebuilt our servers, now GC duration reduced to 1~2 seconds, it's
    a big improvement!
    Thank contributors on the new GC!


    Hi,

    How many hardware threads do you have? If you have a huge heap and more than
    8 hardware threads, can you try to bump maximum number of GC worker threads,
    and check if it improves pause further?

    To do this you need to edit src/pkg/runtime/malloc.h
    MaxGcproc = 8,
    \/\/\/\/\/\/\/\/\/\/\/
    MaxGcproc = 16/32/64,
    and then rebuild everything.

    I've limited maximum number of GC threads to 8, because I was testing on a
    machine with only 8 HT cores (16 hw threads total, but only 8 real cores)
    and on tests that consume ~300MB. If the heap is e.g. > 2GB it may make
    sense to increase number of threads further.


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China

    --
  • Dmitry Vyukov at Nov 28, 2012 at 8:34 am

    On Wed, Nov 28, 2012 at 12:21 PM, Jingcheng Zhang wrote:
    Hello Dmitry,

    Sorry to reply your mail so late. I noticed this variable before but
    am not sure what will happen if I increase it to 12 or 24
    (our server has 24 hardware threads: 2 CPUs, 6 core per CPU, with HT
    support), as it's not exactly 2^N.

    Does "proc" in "MaxGcproc" mean "24 logic cores with HT support" or
    "12 real cores" in our server?
    Or any difference for "MaxGcproc" between logic core with HT support
    and real core?
    Go runtime does not know about HyperThreading, it just requests N
    threads from OS and relies on OS thread scheduling and balancing.
    Anyway, I think you just need to try different values, e.g. 12, 16,
    20, 24 and see what works best for you.

    --
  • Jingcheng Zhang at Nov 28, 2012 at 8:52 am
    I'll try it later, thanks very much!
    On Wed, Nov 28, 2012 at 4:34 PM, Dmitry Vyukov wrote:
    On Wed, Nov 28, 2012 at 12:21 PM, Jingcheng Zhang wrote:
    Hello Dmitry,

    Sorry to reply your mail so late. I noticed this variable before but
    am not sure what will happen if I increase it to 12 or 24
    (our server has 24 hardware threads: 2 CPUs, 6 core per CPU, with HT
    support), as it's not exactly 2^N.

    Does "proc" in "MaxGcproc" mean "24 logic cores with HT support" or
    "12 real cores" in our server?
    Or any difference for "MaxGcproc" between logic core with HT support
    and real core?
    Go runtime does not know about HyperThreading, it just requests N
    threads from OS and relies on OS thread scheduling and balancing.
    Anyway, I think you just need to try different values, e.g. 12, 16,
    20, 24 and see what works best for you.


    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China

    --
  • Shkarupin at Nov 28, 2012 at 8:03 pm
    Guys,
    What is the best way to measure garbage collection times in GO?
    Thanks
    On Saturday, November 17, 2012 1:29:12 AM UTC-6, Jingcheng Zhang wrote:

    Hello everyone,

    Our business suffered from an annoying problem. We are developing an
    iMessage-like service in Go, the server can serves hundreds of
    thousands of concurrent TCP connection per process, and it's robust
    (be running for about a month), which is awesome. However, the process
    consumes 16GB memory quickly, since there are so many connections,
    there are also a lot of goroutines and buffered memories used. I
    extend the memory limit to 64GB by changing runtime/malloc.h and
    runtime/malloc.goc. It works, but brings a big problem too - The
    garbage collecting process is then extremely slow, it stops the world
    for about 10 seconds every 2 minutes, and brings me some problems
    which are very hard to trace, for example, when stoping the world,
    messages delivered may be lost. This is a disaster, since our service
    is a real-time service which requires delivering messages as fast as
    possible and there should be no stops and message lost at all.

    I'm planning to split the "big server process" to many "small
    processes" to avoid this problem (smaller memory footprint results to
    smaller time stop), and waiting for Go's new GC implementation.

    Or any suggestions for me to improve our service currently? I don't
    know when Go's new latency-free garbage collection will occur.

    Thanks.

    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --
  • ⚛ at Nov 28, 2012 at 8:02 pm
    GOGCTRACE=1 ./executable
    On Nov 28, 2012 8:55 PM, wrote:

    Guys,
    What is the best way to measure garbage collection times in GO?
    Thanks
    On Saturday, November 17, 2012 1:29:12 AM UTC-6, Jingcheng Zhang wrote:

    Hello everyone,

    Our business suffered from an annoying problem. We are developing an
    iMessage-like service in Go, the server can serves hundreds of
    thousands of concurrent TCP connection per process, and it's robust
    (be running for about a month), which is awesome. However, the process
    consumes 16GB memory quickly, since there are so many connections,
    there are also a lot of goroutines and buffered memories used. I
    extend the memory limit to 64GB by changing runtime/malloc.h and
    runtime/malloc.goc. It works, but brings a big problem too - The
    garbage collecting process is then extremely slow, it stops the world
    for about 10 seconds every 2 minutes, and brings me some problems
    which are very hard to trace, for example, when stoping the world,
    messages delivered may be lost. This is a disaster, since our service
    is a real-time service which requires delivering messages as fast as
    possible and there should be no stops and message lost at all.

    I'm planning to split the "big server process" to many "small
    processes" to avoid this problem (smaller memory footprint results to
    smaller time stop), and waiting for Go's new GC implementation.

    Or any suggestions for me to improve our service currently? I don't
    know when Go's new latency-free garbage collection will occur.

    Thanks.

    --
    Best regards,
    Jingcheng Zhang
    Beijing, P.R.China
    --

    --
  • Bryanturley at Nov 28, 2012 at 8:59 pm

    On Wednesday, November 28, 2012 2:02:18 PM UTC-6, ⚛ wrote:

    GOGCTRACE=1 ./executable
    Might help if you tell him what the fields mean exactly (from go 1.0.3,
    maybe less cryptic in tip)

    "gc63(4): 0+0+0 ms 1 -> 0 MB 8257 -> 1073 (92277-91204) objects 127 handoff"

    and from pkg/runtime/mgc0.c

    runtime·printf("gc%d(%d): %D+%D+%D ms %D -> %D MB %D -> %D (%D-%D) objects
    %D handoff\n",
    mstats.numgc, work.nproc, (t1-t0)/1000000,
    (t2-t1)/1000000, (t3-t2)/1000000,
    heap0>>20, heap1>>20, obj0, obj1,
    mstats.nmalloc, mstats.nfree,
    nhandoff);

    Without reading much of this code i am assuming obj0/heap0 are the before
    and obj1/heap1 are the after?
    nmalloc and nfree seem obvious enough.
    not even a guess as to what handoff is though ;)
    this code i am working on didn't trigger an scvg line.

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedNov 17, '12 at 7:29a
activeNov 28, '12 at 9:24p
posts52
users14
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase