FAQ
Hey all,

I've got a service I'm writing in Go receives a bunch of data, writes it to
disk, and does some calculations on the data while it caches a subset of
the data in memory (obviously the data set is bigger than memory).

I've created an abstraction that basically allows me to start evicting
stuff from cache once the total cache size has reached a certain threshold
(e.g. 2GB). This should really be the majority of the what I'm caching.

However, I've noticed that the service is consuming significantly more
memory (note: following was on Darwin):

$ ps -o rss= 42724
5418420 (kilobytes)

So I used pprof to figure out WTF is going on. To do that, I basically just
made my app respond to the USR1 signal by dumping the pprof heap profile.
But, it doesn't look like pprof is giving me a full snapshot.
go tool pprof bin/my-app /tmp/my-app-heap.pprof
Adjusting heap profiles for 1-in-524288 sampling rate
Welcome to pprof! For help, type 'help'.
(pprof) top
Total: 544.0 MB

Note that the ps reading and the pprof reading were taken while the app was
in the same state.

1. Am I missing something about pprof here?
2. Am I missing something about go's GC, or at least how the GC
subsequently releases allocations?

Thanks,
Brad Heller

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Tamás Gulácsi at Jul 6, 2014 at 6:03 am
    Use net/http/pprof,but on macosx profiling is lacking

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Carlos Castillo at Jul 7, 2014 at 1:24 am
    Only cpu profiles on OSX are broken (and can be fixed with a kernel patch),
    all other profiles are fine.

    As far as the questions of the OP:

        1. pprof can track memory a few different ways, but I believe the
        default is "memory in use by go at the time the profile was made". This
        does *NOT* include memory collected by the GC. The --help flag to pprof can
        show the various memory profile display options.
        2. The GC never releases memory, it only re-uses it. From the outside,
        memory use will only grow or stay level. Also the GC is only automatically
        run on allocation, when the total in-use memory exceeds a previously set
        limit, not when objects become inactive.

    If you write a program which processes data in 512MB chunks, the profiler
    may only list 512MB in use (depends on how the program holds on to memory),
    but when the GC runs it finds all the memory currently in use, and sets a
    limit of double that for running the next GC cycle. So if you happened to
    have 2 chunks active at the time, then at least 1GB is in use, and the GC
    will be next run when a total of 2GB has been allocated. This means that
    unless your app ends before the next GC, or the GC is called manually,
    those 2GB of memory will eventually be used. If only one 512MB chunk is
    active at a later GC, the limit is set to 1GB, but the extra 1GB allocated
    previously is *NOT* returned to the OS by the GC.

    The scavenger runs every 5 minutes and tries to find blocks of memory
    (allocations are made in blocks) which are completely unused by go code.
    This can be difficult, as if any byte in the block is active, the whole
    block remains active. If it finds any inactive blocks, it tells the OS that
    they are "Unused", which is slightly different then "Free". Unfortunately
    the OS still counts that memory against the process, but if it needs
    memory, it will use the "Unused" group first, and the contents don't need
    to be preserved (ie: swapped out to disk), so the operation is nearly as
    cheap as on "Free" memory.

    Also, maps and slices are easy ways to lose track of memory, since you may
    only be using len() elements, but the allocations can be larger. Eg:
    Slicing an array of length 21 to take only the middle 3 elements doesn't
    make the remaining 18 elements disappear in any way.

    A more concrete example: http://play.golang.org/p/-K8wbHTJmH

    The value of Sys is the total number of bytes Go has asked from the system,
    which in normal (non-cgo) programs should roughly match the system's
    measurement, Alloc is the number of bytes Go believes is active at the
    moment. Even though the example functions only generate slices/map that
    contain 10*1MB of data, the program stabalizes at a Sys value more than 10
    times that. Some of the extra space makes sense, as there is more memory in
    use than just the slices/maps, we are storing at least one 1MB element on
    the stack before copying it to the slice/map, and some extra allocations
    are needed due to re-sizing of the slices and maps. On re-size, both
    versions must exist at the same time, so at best with a doubling algorithm
    for slices, a slice of 0.5n elements and one of n must both be active at
    some point in the function, at worst it can be 2n-1 elements worth.
    Adjusting the boolean options in main show the results if you pre-allocate
    the slices, and/or call the gc at the end of each operation. With both
    active, the memory use as seen in the value Sys, is much better.

    So in short, from the outside, Go programs never shrink in memory use
    unless the OS takes memory back, and may be larger than you expect based on
    how references to data are held in your program.
    On Saturday, July 5, 2014 11:03:48 PM UTC-7, Tamás Gulácsi wrote:

    Use net/http/pprof,but on macosx profiling is lacking
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Carlos Castillo at Jul 7, 2014 at 2:24 am
    There's one other important factor I forgot to mention that contributes to
    extra memory use: memory fragmentation.

    If you allocate a 10MB block, free it, and then make some small allocation,
    the allocation could use part of the freed 10MB block. Even if the unused
    portion is made available for future allocations it will be smaller than
    10MB, and so can't be used to satisfy the next 10MB allocation (which must
    be contiguous), so more memory may need to be requested from the OS.

    IE: In the code I posted, simply because Sys-Alloc = ~20MB doesn't mean
    that there are 10MB in a row to store a slice of 10 1MB elements.
    On Sunday, July 6, 2014 6:24:21 PM UTC-7, Carlos Castillo wrote:

    Only cpu profiles on OSX are broken (and can be fixed with a kernel
    patch), all other profiles are fine.

    As far as the questions of the OP:

    1. pprof can track memory a few different ways, but I believe the
    default is "memory in use by go at the time the profile was made". This
    does *NOT* include memory collected by the GC. The --help flag to pprof can
    show the various memory profile display options.
    2. The GC never releases memory, it only re-uses it. From the outside,
    memory use will only grow or stay level. Also the GC is only automatically
    run on allocation, when the total in-use memory exceeds a previously set
    limit, not when objects become inactive.

    If you write a program which processes data in 512MB chunks, the profiler
    may only list 512MB in use (depends on how the program holds on to memory),
    but when the GC runs it finds all the memory currently in use, and sets a
    limit of double that for running the next GC cycle. So if you happened to
    have 2 chunks active at the time, then at least 1GB is in use, and the GC
    will be next run when a total of 2GB has been allocated. This means that
    unless your app ends before the next GC, or the GC is called manually,
    those 2GB of memory will eventually be used. If only one 512MB chunk is
    active at a later GC, the limit is set to 1GB, but the extra 1GB allocated
    previously is *NOT* returned to the OS by the GC.

    The scavenger runs every 5 minutes and tries to find blocks of memory
    (allocations are made in blocks) which are completely unused by go code.
    This can be difficult, as if any byte in the block is active, the whole
    block remains active. If it finds any inactive blocks, it tells the OS that
    they are "Unused", which is slightly different then "Free". Unfortunately
    the OS still counts that memory against the process, but if it needs
    memory, it will use the "Unused" group first, and the contents don't need
    to be preserved (ie: swapped out to disk), so the operation is nearly as
    cheap as on "Free" memory.

    Also, maps and slices are easy ways to lose track of memory, since you may
    only be using len() elements, but the allocations can be larger. Eg:
    Slicing an array of length 21 to take only the middle 3 elements doesn't
    make the remaining 18 elements disappear in any way.

    A more concrete example: http://play.golang.org/p/-K8wbHTJmH

    The value of Sys is the total number of bytes Go has asked from the
    system, which in normal (non-cgo) programs should roughly match the
    system's measurement, Alloc is the number of bytes Go believes is active at
    the moment. Even though the example functions only generate slices/map that
    contain 10*1MB of data, the program stabalizes at a Sys value more than 10
    times that. Some of the extra space makes sense, as there is more memory in
    use than just the slices/maps, we are storing at least one 1MB element on
    the stack before copying it to the slice/map, and some extra allocations
    are needed due to re-sizing of the slices and maps. On re-size, both
    versions must exist at the same time, so at best with a doubling algorithm
    for slices, a slice of 0.5n elements and one of n must both be active at
    some point in the function, at worst it can be 2n-1 elements worth.
    Adjusting the boolean options in main show the results if you pre-allocate
    the slices, and/or call the gc at the end of each operation. With both
    active, the memory use as seen in the value Sys, is much better.

    So in short, from the outside, Go programs never shrink in memory use
    unless the OS takes memory back, and may be larger than you expect based on
    how references to data are held in your program.
    On Saturday, July 5, 2014 11:03:48 PM UTC-7, Tamás Gulácsi wrote:

    Use net/http/pprof,but on macosx profiling is lacking
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Carlos Castillo at Jul 7, 2014 at 3:06 am
    Also note, adjusting the program to store slices/maps of pointers to 1MB
    blocks: http://play.golang.org/p/eCRmDQ-vvk

    The total memory use shown by Sys is now only ~2x the largest working set
    size (eg: 10MB), which falls in line with what's expected based on the
    behaviour of the GC (see: http://golang.org/pkg/runtime/#pkg-overview). The
    down side to this is that you are giving the GC more work, as it now needs
    to track each independent 1MB block. Using the prealloc var makes things
    slightly worse (although probably faster), and manually calling GC after
    each operation drops total use down to ~15MB. All these cases are still
    better than the best case previously (~30MB).

    Another way to decrease memory use is to manually re-use the memory
    yourself. This way, the large blocks you allocate don't go back to the GC
    where they could get fragmented, and so no new ones need to be made.
    On Sunday, July 6, 2014 7:24:06 PM UTC-7, Carlos Castillo wrote:

    There's one other important factor I forgot to mention that contributes to
    extra memory use: memory fragmentation.

    If you allocate a 10MB block, free it, and then make some small
    allocation, the allocation could use part of the freed 10MB block. Even if
    the unused portion is made available for future allocations it will be
    smaller than 10MB, and so can't be used to satisfy the next 10MB allocation
    (which must be contiguous), so more memory may need to be requested from
    the OS.

    IE: In the code I posted, simply because Sys-Alloc = ~20MB doesn't mean
    that there are 10MB in a row to store a slice of 10 1MB elements.
    On Sunday, July 6, 2014 6:24:21 PM UTC-7, Carlos Castillo wrote:

    Only cpu profiles on OSX are broken (and can be fixed with a kernel
    patch), all other profiles are fine.

    As far as the questions of the OP:

    1. pprof can track memory a few different ways, but I believe the
    default is "memory in use by go at the time the profile was made". This
    does *NOT* include memory collected by the GC. The --help flag to pprof can
    show the various memory profile display options.
    2. The GC never releases memory, it only re-uses it. From the
    outside, memory use will only grow or stay level. Also the GC is only
    automatically run on allocation, when the total in-use memory exceeds a
    previously set limit, not when objects become inactive.

    If you write a program which processes data in 512MB chunks, the profiler
    may only list 512MB in use (depends on how the program holds on to memory),
    but when the GC runs it finds all the memory currently in use, and sets a
    limit of double that for running the next GC cycle. So if you happened to
    have 2 chunks active at the time, then at least 1GB is in use, and the GC
    will be next run when a total of 2GB has been allocated. This means that
    unless your app ends before the next GC, or the GC is called manually,
    those 2GB of memory will eventually be used. If only one 512MB chunk is
    active at a later GC, the limit is set to 1GB, but the extra 1GB allocated
    previously is *NOT* returned to the OS by the GC.

    The scavenger runs every 5 minutes and tries to find blocks of memory
    (allocations are made in blocks) which are completely unused by go code.
    This can be difficult, as if any byte in the block is active, the whole
    block remains active. If it finds any inactive blocks, it tells the OS that
    they are "Unused", which is slightly different then "Free". Unfortunately
    the OS still counts that memory against the process, but if it needs
    memory, it will use the "Unused" group first, and the contents don't need
    to be preserved (ie: swapped out to disk), so the operation is nearly as
    cheap as on "Free" memory.

    Also, maps and slices are easy ways to lose track of memory, since you
    may only be using len() elements, but the allocations can be larger. Eg:
    Slicing an array of length 21 to take only the middle 3 elements doesn't
    make the remaining 18 elements disappear in any way.

    A more concrete example: http://play.golang.org/p/-K8wbHTJmH

    The value of Sys is the total number of bytes Go has asked from the
    system, which in normal (non-cgo) programs should roughly match the
    system's measurement, Alloc is the number of bytes Go believes is active at
    the moment. Even though the example functions only generate slices/map that
    contain 10*1MB of data, the program stabalizes at a Sys value more than 10
    times that. Some of the extra space makes sense, as there is more memory in
    use than just the slices/maps, we are storing at least one 1MB element on
    the stack before copying it to the slice/map, and some extra allocations
    are needed due to re-sizing of the slices and maps. On re-size, both
    versions must exist at the same time, so at best with a doubling algorithm
    for slices, a slice of 0.5n elements and one of n must both be active at
    some point in the function, at worst it can be 2n-1 elements worth.
    Adjusting the boolean options in main show the results if you pre-allocate
    the slices, and/or call the gc at the end of each operation. With both
    active, the memory use as seen in the value Sys, is much better.

    So in short, from the outside, Go programs never shrink in memory use
    unless the OS takes memory back, and may be larger than you expect based on
    how references to data are held in your program.
    On Saturday, July 5, 2014 11:03:48 PM UTC-7, Tamás Gulácsi wrote:

    Use net/http/pprof,but on macosx profiling is lacking
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dmitry Vyukov at Jul 7, 2014 at 11:09 am

    On Sun, Jul 6, 2014 at 4:32 AM, Brad Heller wrote:
    Hey all,

    I've got a service I'm writing in Go receives a bunch of data, writes it to
    disk, and does some calculations on the data while it caches a subset of the
    data in memory (obviously the data set is bigger than memory).

    I've created an abstraction that basically allows me to start evicting stuff
    from cache once the total cache size has reached a certain threshold (e.g.
    2GB). This should really be the majority of the what I'm caching.

    However, I've noticed that the service is consuming significantly more
    memory (note: following was on Darwin):

    $ ps -o rss= 42724
    5418420 (kilobytes)

    So I used pprof to figure out WTF is going on. To do that, I basically just
    made my app respond to the USR1 signal by dumping the pprof heap profile.
    But, it doesn't look like pprof is giving me a full snapshot.
    go tool pprof bin/my-app /tmp/my-app-heap.pprof
    Adjusting heap profiles for 1-in-524288 sampling rate
    Welcome to pprof! For help, type 'help'.
    (pprof) top
    Total: 544.0 MB

    Note that the ps reading and the pprof reading were taken while the app was
    in the same state.

    1. Am I missing something about pprof here?
    2. Am I missing something about go's GC, or at least how the GC subsequently
    releases allocations?

    What version of Go are you using? There were some fixes in heap pprof in 1.3.
    Does the program use cgo/swig? pprof does not see C heap.
    Print runtime.MemStats, pprof counts only Go heap, not all memory in a
    Go program is heap.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedJul 6, '14 at 12:32a
activeJul 7, '14 at 11:09a
posts6
users4
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase