FAQ
Is there an efficient string comparison function similar to std::string
compare or memcmp? Using < and == requires multiple passes over the data
and using bytes.Compare causes multiple copies of the string data. I'd like
to use runtime.cmpstring directly but cannot.

Thanks for the help,
Patrick

--

Search Discussions

  • Ian Lance Taylor at Nov 29, 2012 at 6:23 pm

    On Thu, Nov 29, 2012 at 8:16 AM, Patrick Scott wrote:
    Is there an efficient string comparison function similar to std::string
    compare or memcmp? Using < and == requires multiple passes over the data and
    using bytes.Compare causes multiple copies of the string data. I'd like to
    use runtime.cmpstring directly but cannot.
    You could easily write your own version of bytes.Compare. In fact you
    can just copy the standard one.

    String comparison is somewhat ambiguous: should the comparison be done
    by bytes or by characters? What about different collation orders?
    Still, perhaps we should have a CompareByByte function in the strings
    package, I'm not sure.

    Ian

    --
  • Patrick Scott at Nov 29, 2012 at 6:22 pm
    Right now I am using this:

    func strcmp(a, b string) int {
    var min = len(b)
    if len(a) < len(b) {
    min = len(a)
    }
    var diff int
    for i := 0; i < min && diff == 0; i++ {
    diff = int(a[i]) - int(b[i])
    }
    if diff == 0 {
    diff = len(a) - len(b)
    }
    return diff
    }

    This is very similar to bytes.Compare. I was just hoping there might be a
    built-in version which could use memcmp instead of looping over bytes.
    Ideally memcmp could do word comparisons instead of bytes.

    Patrick
    On Thursday, November 29, 2012 1:15:29 PM UTC-5, Ian Lance Taylor wrote:

    On Thu, Nov 29, 2012 at 8:16 AM, Patrick Scott
    <pat...@springmetrics.com <javascript:>> wrote:
    Is there an efficient string comparison function similar to std::string
    compare or memcmp? Using < and == requires multiple passes over the data and
    using bytes.Compare causes multiple copies of the string data. I'd like to
    use runtime.cmpstring directly but cannot.
    You could easily write your own version of bytes.Compare. In fact you
    can just copy the standard one.

    String comparison is somewhat ambiguous: should the comparison be done
    by bytes or by characters? What about different collation orders?
    Still, perhaps we should have a CompareByByte function in the strings
    package, I'm not sure.

    Ian
    --
  • Steven Blenkinsop at Nov 29, 2012 at 6:33 pm
    If you're just comparing the strings lexically byte-wise, use the the
    normal comparison operators.

    On Thu, Nov 29, 2012 at 1:22 PM, Patrick Scott wrote:

    Right now I am using this:

    func strcmp(a, b string) int {
    var min = len(b)
    if len(a) < len(b) {
    min = len(a)
    }
    var diff int
    for i := 0; i < min && diff == 0; i++ {
    diff = int(a[i]) - int(b[i])
    }
    if diff == 0 {
    diff = len(a) - len(b)
    }
    return diff
    }

    This is very similar to bytes.Compare. I was just hoping there might be a
    built-in version which could use memcmp instead of looping over bytes.
    Ideally memcmp could do word comparisons instead of bytes.

    Patrick
    On Thursday, November 29, 2012 1:15:29 PM UTC-5, Ian Lance Taylor wrote:

    On Thu, Nov 29, 2012 at 8:16 AM, Patrick Scott
    wrote:
    Is there an efficient string comparison function similar to std::string
    compare or memcmp? Using < and == requires multiple passes over the data and
    using bytes.Compare causes multiple copies of the string data. I'd like to
    use runtime.cmpstring directly but cannot.
    You could easily write your own version of bytes.Compare. In fact you
    can just copy the standard one.

    String comparison is somewhat ambiguous: should the comparison be done
    by bytes or by characters? What about different collation orders?
    Still, perhaps we should have a CompareByByte function in the strings
    package, I'm not sure.

    Ian
    --

    --
  • Patrick Scott at Nov 29, 2012 at 7:04 pm
    The problem with that approach is that I need to make multiple comparisons
    to find out if a string is less than, equal to, or greater than another
    string. I need to do something different for each case but I only want to
    compare the strings once.
    On Thursday, November 29, 2012 1:32:34 PM UTC-5, Steven Blenkinsop wrote:

    If you're just comparing the strings lexically byte-wise, use the the
    normal comparison operators.


    On Thu, Nov 29, 2012 at 1:22 PM, Patrick Scott <pat...@springmetrics.com<javascript:>
    wrote:
    Right now I am using this:

    func strcmp(a, b string) int {
    var min = len(b)
    if len(a) < len(b) {
    min = len(a)
    }
    var diff int
    for i := 0; i < min && diff == 0; i++ {
    diff = int(a[i]) - int(b[i])
    }
    if diff == 0 {
    diff = len(a) - len(b)
    }
    return diff
    }

    This is very similar to bytes.Compare. I was just hoping there might be a
    built-in version which could use memcmp instead of looping over bytes.
    Ideally memcmp could do word comparisons instead of bytes.

    Patrick
    On Thursday, November 29, 2012 1:15:29 PM UTC-5, Ian Lance Taylor wrote:

    On Thu, Nov 29, 2012 at 8:16 AM, Patrick Scott
    wrote:
    Is there an efficient string comparison function similar to
    std::string
    compare or memcmp? Using < and == requires multiple passes over the data and
    using bytes.Compare causes multiple copies of the string data. I'd like to
    use runtime.cmpstring directly but cannot.
    You could easily write your own version of bytes.Compare. In fact you
    can just copy the standard one.

    String comparison is somewhat ambiguous: should the comparison be done
    by bytes or by characters? What about different collation orders?
    Still, perhaps we should have a CompareByByte function in the strings
    package, I'm not sure.

    Ian
    --

    --
  • Jan Mercl at Nov 29, 2012 at 6:58 pm

    On Nov 29, 2012 7:40 PM, "Patrick Scott" wrote:
    The problem with that approach is that I need to make multiple
    comparisons to find out if a string is less than, equal to, or greater than
    another string. I need to do something different for each case but I only
    want to compare the strings once.

    Using the built in comparison operators:

    - requires less than two compares in average.
    - is probably faster in many scenarios than the "faster workaround". (Have
    you made some benchmarking?)

    -j

    --
  • Patrick Scott at Nov 29, 2012 at 7:41 pm
    I'll run some quick benchmarks to see which approach is faster.
    On Thu, Nov 29, 2012 at 1:58 PM, Jan Mercl wrote:

    On Nov 29, 2012 7:40 PM, "Patrick Scott" wrote:

    The problem with that approach is that I need to make multiple
    comparisons to find out if a string is less than, equal to, or greater than
    another string. I need to do something different for each case but I only
    want to compare the strings once.

    Using the built in comparison operators:

    - requires less than two compares in average.
    - is probably faster in many scenarios than the "faster workaround". (Have
    you made some benchmarking?)

    -j
    --
  • Kevin Gillette at Nov 29, 2012 at 7:25 pm
    If it existed, a `strings.LeadingPrefix(a, b string) string` function could
    simplify this: you get the leading prefix -- if it's not equal in length to
    both of the strings, then the strings are unequal. Then you can compare the
    substrings after the prefix to each other -- that will be quick, since the
    next character of each string (if there are any) is guaranteed to be
    different between both strings.

    Essentially:

    func ByteCompare(a, b string) int {
    if len(a) != len(b) {
    if a < b {
    return -1
    }
    return 1
    }
    lp := len(LeadingPrefix(a, b))
    a, b = a[lp:], b[lp:]
    if a < b {
    return -1
    } else if a > b {
    return 1
    }
    return 0
    }

    The first if statement is just an optimization.
    On Thursday, November 29, 2012 11:40:27 AM UTC-7, Patrick Scott wrote:

    The problem with that approach is that I need to make multiple comparisons
    to find out if a string is less than, equal to, or greater than another
    string. I need to do something different for each case but I only want to
    compare the strings once.
    On Thursday, November 29, 2012 1:32:34 PM UTC-5, Steven Blenkinsop wrote:

    If you're just comparing the strings lexically byte-wise, use the the
    normal comparison operators.
    --
  • Kevin Gillette at Nov 29, 2012 at 7:29 pm
    Eh, when I said LeadingPrefix, I meant 'SharedPrefix'.

    --
  • Patrick Scott at Nov 29, 2012 at 8:09 pm
    For my benchmark I created 1024 strings and compared the first generated
    string to the other 1023 strings. My comparison function is the first
    number.

    16 byte strings, random content: 17236 ns/op vs 36277 ns/op
    16 byte strings, equal content: 63365 ns/op vs 73647 ns/op
    0 - 32 byte strings, random content: 17454 ns/op vs 26220 ns/op
    0 - 32 byte strings, equal content: 16900 ns/op vs 23426 ns/op

    Most of the strings I'm comparing are very close in length but have
    different content. It looks like for my use case, a custom comparison
    function is better than the builtin operators.

    On Thu, Nov 29, 2012 at 2:21 PM, Kevin Gillette
    wrote:
    Eh, when I said LeadingPrefix, I meant 'SharedPrefix'.
    --
  • Bryanturley at Nov 29, 2012 at 8:37 pm

    On Thursday, November 29, 2012 2:09:51 PM UTC-6, Patrick Scott wrote:
    For my benchmark I created 1024 strings and compared the first generated
    string to the other 1023 strings. My comparison function is the first
    number.

    16 byte strings, random content: 17236 ns/op vs 36277 ns/op
    16 byte strings, equal content: 63365 ns/op vs 73647 ns/op
    0 - 32 byte strings, random content: 17454 ns/op vs 26220 ns/op
    0 - 32 byte strings, equal content: 16900 ns/op vs 23426 ns/op

    Most of the strings I'm comparing are very close in length but have
    different content. It looks like for my use case, a custom comparison
    function is better than the builtin operators.

    On Thu, Nov 29, 2012 at 2:21 PM, Kevin Gillette <extempor...@gmail.com<javascript:>
    wrote:
    Eh, when I said LeadingPrefix, I meant 'SharedPrefix'.
    Source code?

    --
  • Patrick Scott at Nov 29, 2012 at 8:58 pm
    I simplified the test a bit and got slightly different numbers for short
    but equal strings.
    On Thu, Nov 29, 2012 at 3:37 PM, bryanturley wrote:


    On Thursday, November 29, 2012 2:09:51 PM UTC-6, Patrick Scott wrote:

    For my benchmark I created 1024 strings and compared the first generated
    string to the other 1023 strings. My comparison function is the first
    number.

    16 byte strings, random content: 17236 ns/op vs 36277 ns/op
    16 byte strings, equal content: 63365 ns/op vs 73647 ns/op
    0 - 32 byte strings, random content: 17454 ns/op vs 26220 ns/op
    0 - 32 byte strings, equal content: 16900 ns/op vs 23426 ns/op

    Most of the strings I'm comparing are very close in length but have
    different content. It looks like for my use case, a custom comparison
    function is better than the builtin operators.
    On Thu, Nov 29, 2012 at 2:21 PM, Kevin Gillette wrote:

    Eh, when I said LeadingPrefix, I meant 'SharedPrefix'.
    Source code?

    --

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedNov 29, '12 at 5:13p
activeNov 29, '12 at 8:58p
posts12
users6
websitegolang.org

People

Translate

site design / logo © 2017 Grokbase