FAQ
Hi,

I'm having trouble with growing memory usage when processing http response
body.

pprof --alloc_space reports around 370MB for one function:
http://pastebin.ca/2986357
and r is io.Reader there.

And I don't know yet why html.Parse() is getting so big over time, any idea?
How to reduce memory usage in this case?

I thought garbage collection should take care of it. But seems not.
Please correct me if I'm wrong here.

Any help would be much appreciated.

thanks a lot,

Ganbold

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Dave Cheney at May 1, 2015 at 6:47 pm
    There have been similar reports in the past. Is the source of your program available?

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nigel Tao at May 4, 2015 at 4:08 am

    On Fri, May 1, 2015 at 4:44 PM, wrote:
    pprof --alloc_space reports around 370MB for one function:
    http://pastebin.ca/2986357
    This isn't all directly related to your memory question, but a few
    style notes on using the html Parser:



    First, the middle line in:

    var err error
    n := new(html.Node)
    n, err = html.Parse(r)

    is unnecessary, and an unnecessary memory allocation: you're assigning
    a newly allocated Node to n, but then immediately overwriting it on
    the next line. These three lines can be just:

    n, err := htm.Parse(r)

    with the colon-equals instead of the bare equals.



    Second,

    s := strings.ToLower(n.Data)
    if s == "script" {
       script = 1
    } etc

    could probably be

    if n.DataAtom == atom.Script {
       script = 1
    } etc

    after you import "golang.org/x/net/html/atom". The atom comparison
    should be faster (comparing ints instead of strings is O(1) instead of
    worst-case O(length of string)), but more importantly, it allocates
    fewer temporary strings. I don't have your full source code, but I
    suspect you are calling strings.ToLower on your text nodes as well as
    your element nodes, which is needless allocation of garbage that needs
    collecting, and I think it will be a false positive when matching the
    text node inside "<b>script</b>".


    Third, the "<!--" check in

    if n.Type == html.TextNode {
       tmp := strings.TrimSpace(n.Data)
       if len(tmp) > 3 && tmp[0:4] == "<!--" {

    should be unnecessary. The HTML parser already recognizes HTML
    comments, and returns those as Comment nodes and not Text nodes.

    I thought garbage collection should take care of it. But seems not.
    Please correct me if I'm wrong here.
    It's been a while since I've used pprof, but I believe that
    --alloc_space is the total allocations, and garbage-collecting
    previous allocations doesn't lower this number: if you allocate 100MB
    and then GC 100MB of no-longer-used heap, alloc_space will still
    report 100MB, and the alloc_space number only grows upwards over time.
    Looking at --inuse_space may be a more appropriate metric.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nigel Tao at May 4, 2015 at 4:14 am
    I forgot to add: if you only care about extracting the <script>s,
    <style>s and <meta>s of an HTML document, then the HTML tokenizer
    might be a better match than the HTML parser. The tokenizer is lower
    level: it returns a stream of tokens instead of a complete parse tree,
    but it gives you []byte instead of string, which means it makes far
    fewer allocations. Note that you are responsible for copying the
    tokenizer's []byte somewhere if you want it to last past the life of
    the current token. See https://godoc.org/golang.org/x/net/html

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Ganbold Tsagaankhuu at May 4, 2015 at 7:31 am
    Nigel,
    On Mon, May 4, 2015 at 12:14 PM, Nigel Tao wrote:

    I forgot to add: if you only care about extracting the <script>s,
    <style>s and <meta>s of an HTML document, then the HTML tokenizer
    might be a better match than the HTML parser. The tokenizer is lower
    level: it returns a stream of tokens instead of a complete parse tree,
    Thanks a lot. I agree with the style related notes in your previous email
    and I'm
    aware of --alloc_space and --inuse_space.
    The reason of having HTML parser is to parse whole http response body not
    only few tags.

    Ganbold


    but it gives you []byte instead of string, which means it makes far
    fewer allocations. Note that you are responsible for copying the
    tokenizer's []byte somewhere if you want it to last past the life of
    the current token. See https://godoc.org/golang.org/x/net/html
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedMay 1, '15 at 12:37p
activeMay 4, '15 at 7:31a
posts5
users3
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase