FAQ

This is a clever trick. I added this mapping using a [256]int (otherwise
a malicious set of keys that uses every byte would fail)
You can make [256]byte work. If all the bytes are used then there is
no need for a value for "other".

Russ

Search Discussions

  • Eric D Eisner at Sep 6, 2012 at 1:48 am
    Hello nigeltao@golang.org, rogpeppe@gmail.com, remyoudompheng@gmail.com,
    rsc@golang.org (cc: golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6492076/
  • Eric D Eisner at Sep 6, 2012 at 1:48 am

    On 2012/09/05 22:11:43, rsc wrote:
    I'm sorry, but while I am still worried about the memory usage,
    To get the memory usage down, one trick that has worked well for me in
    the past is to group the input bytes into equivalence classes. Here it
    is probably enough to keep a list of all the bytes that occur in the
    match strings. You can use that list to map input bytes into a dense
    encoding. The n bytes that do occur map to a dense encoding 0 .. n-1
    and all the other bytes, if there are any, map to n. You can keep the
    mapping in a [256]byte stored once in the replacer, not per trie node,
    and then each node has a []*trie instead of a [256]*trie. The line
    node = node.table[s[0]] becomes node = node.table[mapping[s[0]]]. This
    is in some sense a generalization of your most recent change.
    I suspect that you'll find it reduces the memory usage considerably.
    Russ
    This is a clever trick. I added this mapping using a [256]int (otherwise
    a malicious set of keys that uses every byte would fail), and it only
    slightly decreased runtime performance, but dropped the memory by
    another factor of 2-3.

    http://codereview.appspot.com/6492076/
  • Nigeltao at Sep 6, 2012 at 3:24 am
    The original post says
    BenchmarkGenericNoMatch 2482 ns/op -96.36%
    BenchmarkGenericMatch 6995 ns/op -90.42%
    but that is against a baseline that is a poor implementation of a simple
    algorithm.

    For comparison, https://codereview.appspot.com/6495094 is a better
    implementation of that same algorithm, unlike this CL which uses a
    different algorithm, with different memory requirements. Compared to
    tip, that CL gets:

    benchmark old ns/op new ns/op delta
    BenchmarkGenericNoMatch 76550 18709 -75.56%
    BenchmarkGenericMatch 88093 29899 -66.06%

    and the number of mallocs per iteration drop from 403 and 343 before to
    3 and 3 after.

    I'm not saying that this CL should be abandoned. I'm just providing
    another reference point on how much this CL improves performance.

    Eric, is it possible to get a fresh set of numbers for this CL?


    https://codereview.appspot.com/6492076/diff/6/src/pkg/strings/replace.go
    File src/pkg/strings/replace.go (right):

    https://codereview.appspot.com/6492076/diff/6/src/pkg/strings/replace.go#newcode232
    src/pkg/strings/replace.go:232: if r.emptyMatch == "" {
    Does this work if I say NewReplacer("", "", "", "foo")?

    https://codereview.appspot.com/6492076/diff/6/src/pkg/strings/replace.go#newcode283
    src/pkg/strings/replace.go:283: var last, wn int
    This function body is a copy/paste of genericReplacer.Replace. Is it
    possible to look for a WriteString method, a la
    https://codereview.appspot.com/6495094, and avoid the duplicated code?

    https://codereview.appspot.com/6492076/
  • Eric D Eisner at Sep 6, 2012 at 8:20 am

    On 2012/09/06 03:23:58, nigeltao wrote:
    The original post says
    BenchmarkGenericNoMatch 2482 ns/op -96.36%
    BenchmarkGenericMatch 6995 ns/op -90.42%
    but that is against a baseline that is a poor implementation of a simple
    algorithm.
    For comparison, https://codereview.appspot.com/6495094 is a better
    implementation of that same algorithm, unlike this CL which uses a different
    algorithm, with different memory requirements. Compared to tip, that CL gets:
    benchmark old ns/op new ns/op delta
    BenchmarkGenericNoMatch 76550 18709 -75.56%
    BenchmarkGenericMatch 88093 29899 -66.06%
    Quick question: what tool/option generates these benchmark deltas?

    and the number of mallocs per iteration drop from 403 and 343 before
    to 3 and 3
    after.
    I'm not saying that this CL should be abandoned. I'm just providing another
    reference point on how much this CL improves performance.
    Eric, is it possible to get a fresh set of numbers for this CL?
    I had updated the numbers in the description (and dropped the part that
    said it was memory inefficient) in the last update or so. They are still
    multiple times faster than your improved simple implementation (but now
    more complex with a few memory workarounds).



    https://codereview.appspot.com/6492076/diff/6/src/pkg/strings/replace.go
    File src/pkg/strings/replace.go (right):

    https://codereview.appspot.com/6492076/diff/6/src/pkg/strings/replace.go#newcode232
    src/pkg/strings/replace.go:232: if r.emptyMatch == "" {
    Does this work if I say NewReplacer("", "", "", "foo")?

    It doesn't, but I am a bit confused as to the semantics of the empty
    match. After thinking more about precedence, I think the following
    should yeild "AXbXbAX" (maybe without the trailing "X")

    NewReplacer("a", "A", "", "X").Replace("abba")

    The original implementation returns "AXbbAX", mine is very confused and
    returns "XAXbXbXAX"

    https://codereview.appspot.com/6492076/diff/6/src/pkg/strings/replace.go#newcode283
    src/pkg/strings/replace.go:283: var last, wn int
    This function body is a copy/paste of genericReplacer.Replace. Is it
    possible to
    look for a WriteString method, a la
    https://codereview.appspot.com/6495094, and
    avoid the duplicated code?
    I tried swapping out the copy-pasted Replace code with a call to
    WriteString on a simple buffer, and it added a 35% runtime overhead to
    the two benchmarks. The NoMatch case also now allocates buffer memory
    where it didn't used to allocate any. I included this version for
    reference as *genericReplacer.BufferedReplace

    https://codereview.appspot.com/6492076/
  • David Symonds at Sep 6, 2012 at 8:21 am

    On Thu, Sep 6, 2012 at 5:53 PM, wrote:

    Quick question: what tool/option generates these benchmark deltas?
    $GOROOT/misc/benchcmp
  • Nigel Tao at Sep 7, 2012 at 12:03 am

    On 6 September 2012 17:53, wrote:
    I had updated the numbers in the description (and dropped the part that
    said it was memory inefficient) in the last update or so. They are still
    multiple times faster than your improved simple implementation (but now
    more complex with a few memory workarounds).
    Sorry for being dumb, but I can't find the new numbers. Can you e-mail
    them to this thread?

    I tried swapping out the copy-pasted Replace code with a call to
    WriteString on a simple buffer, and it added a 35% runtime overhead to
    the two benchmarks. The NoMatch case also now allocates buffer memory
    where it didn't used to allocate any. I included this version for
    reference as *genericReplacer.BufferedReplace
    Your version calls io.WriteString, which has to sniff for the
    interface each time. My version does the interface check only once, at
    the top of genericReplacer.WriteString.
  • Nigeltao at Sep 7, 2012 at 12:10 am
    https://codereview.appspot.com/6492076/diff/13004/src/pkg/strings/replace.go
    File src/pkg/strings/replace.go (right):

    https://codereview.appspot.com/6492076/diff/13004/src/pkg/strings/replace.go#newcode244
    src/pkg/strings/replace.go:244: // Write wrights to the buffer to
    satisfy io.Writer.
    s/wrights/writes/

    https://codereview.appspot.com/6492076/
  • Eric D Eisner at Sep 7, 2012 at 12:42 am

    On 2012/09/07 00:03:17, nigeltao wrote:
    On 6 September 2012 17:53, wrote:
    I had updated the numbers in the description (and dropped the part
    that
    said it was memory inefficient) in the last update or so. They are
    still
    multiple times faster than your improved simple implementation (but
    now
    more complex with a few memory workarounds).
    Sorry for being dumb, but I can't find the new numbers. Can you e-mail
    them to this thread?
    I tried swapping out the copy-pasted Replace code with a call to
    WriteString on a simple buffer, and it added a 35% runtime overhead
    to
    the two benchmarks. The NoMatch case also now allocates buffer
    memory
    where it didn't used to allocate any. I included this version for
    reference as *genericReplacer.BufferedReplace
    Your version calls io.WriteString, which has to sniff for the
    interface each time. My version does the interface check only once, at
    the top of genericReplacer.WriteString.
    I used your WriteString interface sniffing, and here are the updated
    numbers:

    Copy-paste Replace logic:

    benchmark old ns/op new ns/op delta
    BenchmarkGenericNoMatch 72882 2461 -96.62%
    BenchmarkGenericMatch 90858 9313 -89.75%

    benchmark old MB/s new MB/s speedup
    BenchmarkGenericNoMatch 2.74 81.26 29.66x
    BenchmarkGenericMatch 3.74 36.50 9.76x


    Using the buffer and WriteString:

    benchmark old ns/op new ns/op delta
    BenchmarkGenericNoMatch 72882 3058 -95.80%
    BenchmarkGenericMatch 90858 10361 -88.60%

    benchmark old MB/s new MB/s speedup
    BenchmarkGenericNoMatch 2.74 65.39 23.86x
    BenchmarkGenericMatch 3.74 32.81 8.77x

    https://codereview.appspot.com/6492076/
  • Eric D Eisner at Sep 7, 2012 at 12:43 am
    Hello nigeltao@golang.org, rogpeppe@gmail.com, remyoudompheng@gmail.com,
    rsc@golang.org, dsymonds@golang.org (cc: golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6492076/
  • Nigeltao at Sep 7, 2012 at 8:17 am
    https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go
    File src/pkg/strings/replace.go (right):

    https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#newcode93
    src/pkg/strings/replace.go:93: // If this node has no children, the
    previous 3 fields will be nil.
    s/nil/zero/

    https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#newcode106
    src/pkg/strings/replace.go:106: for n = 0; n < len(t.prefix) && n <
    len(key); n++ {
    You can drop the "n=0".

    https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#newcode177
    src/pkg/strings/replace.go:177: break
    Is this correct if every byte in the range [0, 255] is actually part of
    some old string?

    Again, I would like to see many more tests before I'm confident that
    this is all correct.

    https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#newcode278
    src/pkg/strings/replace.go:278: var buf []byte
    Even if it's slightly slower, I would still prefer Replace to just call
    WriteString, for now. If it really matters, we can copy/paste the code
    later, but for now I would prefer to avoid the duplication.

    https://codereview.appspot.com/6492076/
  • Eric D Eisner at Sep 7, 2012 at 6:03 pm
    On 2012/09/07 08:17:35, nigeltao wrote:

    https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go
    File src/pkg/strings/replace.go (right):

    https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#newcode93
    src/pkg/strings/replace.go:93: // If this node has no children, the
    previous 3
    fields will be nil.
    s/nil/zero/
    Done.


    https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#newcode106
    src/pkg/strings/replace.go:106: for n = 0; n < len(t.prefix) && n < len(key);
    n++ {
    You can drop the "n=0".
    Done.


    https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#newcode177
    src/pkg/strings/replace.go:177: break
    Is this correct if every byte in the range [0, 255] is actually part
    of some old
    string?
    I cleaned up this logic, so it should be more clear. I also added a test
    that hits this case.
    Again, I would like to see many more tests before I'm confident that
    this is all
    correct.
    I added a few more tests, and changed the handling of the empty match
    (which magically removed a lot of code). The last test in that suite
    {NewReplacer("a", "A", "", "X"), "abba", "AXbXbAX"}, follows the spec to
    my understanding, but it is a different behavior than the original
    implementation.


    https://codereview.appspot.com/6492076/diff/8005/src/pkg/strings/replace.go#newcode278
    src/pkg/strings/replace.go:278: var buf []byte
    Even if it's slightly slower, I would still prefer Replace to just call
    WriteString, for now. If it really matters, we can copy/paste the code later,
    but for now I would prefer to avoid the duplication.
    Sounds good.

    https://codereview.appspot.com/6492076/
  • Eric D Eisner at Sep 7, 2012 at 6:03 pm
    Hello nigeltao@golang.org, rogpeppe@gmail.com, remyoudompheng@gmail.com,
    rsc@golang.org, dsymonds@golang.org (cc: golang-dev@googlegroups.com),

    Please take another look.


    http://codereview.appspot.com/6492076/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedSep 6, '12 at 12:47a
activeSep 7, '12 at 6:03p
posts13
users4
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase