FAQ
Reviewers: golang-dev_googlegroups.com,

Message:
Hello golang-dev@googlegroups.com (cc: msolomon@google.com,
sougou@google.com),

I'd like you to review this change to
https://dvyukov%40google.com@code.google.com/p/go/


Description:
runtime: less aggressive per-thread stack segment caching
Introduce global stack segment cache and limit per-thread cache size.
This greatly reduces StackSys memory on workloads that create lots of
threads.

TestStackMem old,MB old,sec new,MB new,sec
Run #1 310 3.22 8 1.83
Run #2 296 2.43 8 1.84
Run #3 479 2.50 8 1.88
Run #4 264 2.46 8 1.82
Run #5 296 2.53 8 2.00

Please review this at https://codereview.appspot.com/6997052/

Affected files:
M src/pkg/runtime/malloc.goc
M src/pkg/runtime/malloc.h
M src/pkg/runtime/mfixalloc.c
M src/pkg/runtime/proc.c
M src/pkg/runtime/stack.h
M src/pkg/runtime/stack_test.go

Search Discussions

  • Dvyukov at Dec 28, 2012 at 8:34 pm

    On 2012/12/28 19:52:11, sougou wrote:
    vtocc test run on 10M queries using 100 connections:
    Old runtime StackSys started at 26MB and kept growing @2MB/min.
    New runtime StackSys started at 3MB and stayed there.
    Wow! looks impressive:
    memstats.Sys 125461816
    memstats.Sys 43672888
    I am essentially trading speed for memory. I think I need to benchmark
    performance when global cache is accessed frequently. Do we have a
    parallel benchmark with deep stacks?

    https://codereview.appspot.com/6997052/
  • Sougou at Dec 29, 2012 at 12:21 am
    vtocc test run on 10M queries using 100 connections:
    Old runtime StackSys started at 26MB and kept growing @2MB/min.
    New runtime StackSys started at 3MB and stayed there.

    Old start:
    Fri Dec 28 11:11:34 2012
    Queries.TotalCount 1184487
    rate 5427.36666667
    memstats.Sys 67135288
    memstats.HeapSys 34603008
    memstats.StackSys 26738688

    end:
    Fri Dec 28 11:39:36 2012
    Queries.TotalCount 10296835
    rate 5512.95
    memstats.Sys 125461816
    memstats.HeapSys 34603008
    memstats.StackSys 84934656

    New start:
    Fri Dec 28 10:38:49 2012
    Queries.TotalCount 1142084
    rate 5540.26666667
    memstats.Sys 39937848
    memstats.HeapSys 31457280
    memstats.StackSys 3014656

    end:
    Fri Dec 28 11:06:51 2012
    Queries.TotalCount 10151110
    rate 5461.2
    memstats.Sys 43672888
    memstats.HeapSys 34603008
    memstats.StackSys 3145728

    https://codereview.appspot.com/6997052/
  • Sougou at Dec 29, 2012 at 12:28 am
    I saw a 10% drop in throughput compared to a build from October (5.5k vs
    6k+), but the circumstances might not have been the same. I'll rerun
    them using the same query sets, etc. and see what I get.

    https://codereview.appspot.com/6997052/
  • Dmitry Vyukov at Dec 28, 2012 at 9:04 pm
    If you see some performance degradation try to tune
    StackPerThreadLWM/StackPerThreadHWM (to say 64/128).

    On Sat, Dec 29, 2012 at 12:59 AM, wrote:
    I saw a 10% drop in throughput compared to a build from October (5.5k vs
    6k+), but the circumstances might not have been the same. I'll rerun
    them using the same query sets, etc. and see what I get.

    https://codereview.appspot.com/6997052/
  • Dvyukov at Dec 29, 2012 at 1:17 pm

    On 2012/12/29 13:00:52, nsf wrote:
    On 2012/12/28 17:58:13, dvyukov wrote:
    Hello mailto:golang-dev@googlegroups.com (cc:
    mailto:msolomon@google.com,
    mailto:sougou@google.com),
    Hey, thanks for an interesting patch. Just want to give some feedback here.
    I have a ray tracer app which consumed memory during run (800x600 image
    rendering, 100 rays per pixel, 10 goroutines, 1 goroutine per image row,
    GOMAXPROCS=4) starting from 20 megs and ending up with 100 megs or so in 5
    minutes of intensive CPU usage. I know there are no allocs in the ray tracer
    itself, so it must be stack segments cache. So, I tried your patch set. It's
    amazing. Memory consumption went down to 20 megs at the beginning
    (same) and 30
    megs at the end. And I see no performance degradation whatsoever.
    So, summary:
    Before: 20 to 100 megs during run.
    After: 20 to 30 megs during run.
    In both cases rendering time was 5 minutes 5 seconds.
    Так что спасибо огромное за патч, я как раз хотел выяснить от чего такое
    поведение у проги.
    Great results! Thanks for testing.

    https://codereview.appspot.com/6997052/
  • No Smile Face at Dec 29, 2012 at 2:41 pm

    On 2012/12/28 17:58:13, dvyukov wrote:
    Hello mailto:golang-dev@googlegroups.com (cc:
    mailto:msolomon@google.com, mailto:sougou@google.com),
    I'd like you to review this change to
    https://dvyukov%2540google.com%40code.google.com/p/go/
    Hey, thanks for an interesting patch. Just want to give some feedback
    here.

    I have a ray tracer app which consumed memory during run (800x600 image
    rendering, 100 rays per pixel, 10 goroutines, 1 goroutine per image row,
    GOMAXPROCS=4) starting from 20 megs and ending up with 100 megs or so in
    5 minutes of intensive CPU usage. I know there are no allocs in the ray
    tracer itself, so it must be stack segments cache. So, I tried your
    patch set. It's amazing. Memory consumption went down to 20 megs at the
    beginning (same) and 30 megs at the end. And I see no performance
    degradation whatsoever.

    So, summary:
    Before: 20 to 100 megs during run.
    After: 20 to 30 megs during run.

    In both cases rendering time was 5 minutes 5 seconds.

    Так что спасибо огромное за патч, я как раз хотел выяснить от чего такое
    поведение у проги.

    https://codereview.appspot.com/6997052/
  • Dvyukov at Dec 29, 2012 at 8:08 pm
    I've added the benchmark, BenchmarkStackGrowthDeep, that growth and
    shrinks stack in multiple goroutines. There is significant slowdown on
    this synthetic benchmark:

    benchmark old ns/op new ns/op delta
    BenchmarkStackGrowthDeep 94101 109271 +16.12%
    BenchmarkStackGrowthDeep-2 47576 70916 +49.06%
    BenchmarkStackGrowthDeep-4 25687 67188 +161.56%
    BenchmarkStackGrowthDeep-8 13592 77776 +472.22%
    BenchmarkStackGrowthDeep-16 9695 78721 +711.98%
    BenchmarkStackGrowthDeep-32 11679 76796 +557.56%
    BenchmarkStackGrowthDeep-64 12623 88951 +604.67%

    I would prefer to fix the performance in a separate patch.


    https://codereview.appspot.com/6997052/
  • Rémy Oudompheng at Dec 30, 2012 at 10:21 am

    On 2012/12/29 wrote:
    I've added the benchmark, BenchmarkStackGrowthDeep, that growth and
    shrinks stack in multiple goroutines. There is significant slowdown on
    this synthetic benchmark:

    benchmark old ns/op new ns/op delta
    BenchmarkStackGrowthDeep 94101 109271 +16.12%
    BenchmarkStackGrowthDeep-2 47576 70916 +49.06%
    BenchmarkStackGrowthDeep-4 25687 67188 +161.56%
    BenchmarkStackGrowthDeep-8 13592 77776 +472.22%
    BenchmarkStackGrowthDeep-16 9695 78721 +711.98%
    BenchmarkStackGrowthDeep-32 11679 76796 +557.56%
    BenchmarkStackGrowthDeep-64 12623 88951 +604.67%

    I would prefer to fix the performance in a separate patch.


    https://codereview.appspot.com/6997052/
    Is it related to the watermark values? They seem very small and I
    would expect the stack segment cache size to be roughly of the same
    order of magnitude of the size of an OS thread (a few megabytes).

    Rémy.
  • Dmitry Vyukov at Dec 30, 2012 at 9:45 am

    On Sun, Dec 30, 2012 at 12:34 AM, Rémy Oudompheng wrote:
    On 2012/12/29 wrote:
    I've added the benchmark, BenchmarkStackGrowthDeep, that growth and
    shrinks stack in multiple goroutines. There is significant slowdown on
    this synthetic benchmark:

    benchmark old ns/op new ns/op delta
    BenchmarkStackGrowthDeep 94101 109271 +16.12%
    BenchmarkStackGrowthDeep-2 47576 70916 +49.06%
    BenchmarkStackGrowthDeep-4 25687 67188 +161.56%
    BenchmarkStackGrowthDeep-8 13592 77776 +472.22%
    BenchmarkStackGrowthDeep-16 9695 78721 +711.98%
    BenchmarkStackGrowthDeep-32 11679 76796 +557.56%
    BenchmarkStackGrowthDeep-64 12623 88951 +604.67%

    I would prefer to fix the performance in a separate patch.


    https://codereview.appspot.com/6997052/
    Is it related to the watermark values? They seem very small and I
    would expect the stack segment cache size to be roughly of the same
    order of magnitude of the size of an OS thread (a few megabytes).

    Yes, it is related, but just increasing them won't help.

    AFAIR, by default Go runtime requests just 64k for thread (g0) stack.
  • Dvyukov at Dec 30, 2012 at 7:14 pm

    On 2012/12/29 20:40:03, dvyukov wrote:
    On Sun, Dec 30, 2012 at 12:34 AM, Rémy Oudompheng
    wrote:
    On 2012/12/29 wrote:
    I've added the benchmark, BenchmarkStackGrowthDeep, that growth and
    shrinks stack in multiple goroutines. There is significant slowdown
    on
    this synthetic benchmark:

    benchmark old ns/op new ns/op delta
    BenchmarkStackGrowthDeep 94101 109271 +16.12%
    BenchmarkStackGrowthDeep-2 47576 70916 +49.06%
    BenchmarkStackGrowthDeep-4 25687 67188 +161.56%
    BenchmarkStackGrowthDeep-8 13592 77776 +472.22%
    BenchmarkStackGrowthDeep-16 9695 78721 +711.98%
    BenchmarkStackGrowthDeep-32 11679 76796 +557.56%
    BenchmarkStackGrowthDeep-64 12623 88951 +604.67%

    I would prefer to fix the performance in a separate patch.


    https://codereview.appspot.com/6997052/
    Is it related to the watermark values? They seem very small and I
    would expect the stack segment cache size to be roughly of the same
    order of magnitude of the size of an OS thread (a few megabytes).
    Yes, it is related, but just increasing them won't help.
    AFAIR, by default Go runtime requests just 64k for thread (g0) stack.
    Please hold on. I am working on similar patch that makes slowdown more
    reasonable:

    BenchmarkStackGrowthDeep 97231 94391 -2.92%
    BenchmarkStackGrowthDeep-2 47230 58562 +23.99%
    BenchmarkStackGrowthDeep-4 24993 49356 +97.48%
    BenchmarkStackGrowthDeep-8 15105 30072 +99.09%
    BenchmarkStackGrowthDeep-16 10005 15623 +56.15%
    BenchmarkStackGrowthDeep-32 12517 13069 +4.41%
    https://codereview.appspot.com/7029044/

    The code almost completely rewritten, so it makes little sense to review
    this patch.


    https://codereview.appspot.com/6997052/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedDec 28, '12 at 5:58p
activeDec 30, '12 at 7:14p
posts11
users4
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase