FAQ
Hi!

github.com/twotwotwo/radixsort.test is a new package providing radix
sorting by a string, []byte, or uint64 key. It has helpers for sorting
other numeric types (floats and signed ints) with the uint64 sort. You'd
use it as a faster alternative to the standard package sort in places where
you're sorting at least thousands of items by a key of one of the supported
types, and the gain in sort speed matters enough to justify using a less
elegant interface than the stdlib's.

You use it by implementing sort.Interface plus one more method, Key(i int),
returning item i's key as string/[]byte/uint64. Then you call
radixsort.ByString, ByBytes, or ByNumber on your data as appropriate.
radixsort exports convenience types and functions like stdlib sort does
(radixsort.Ints(), etc.). See the godoc for details and examples:
godoc.org/github.com/twotwotwo/radixsort.test

Most folks should just use stdlib sort: there's a good chance sorting is
not a bottleneck for you, and thus not worth extra effort to speed up;
also, radixsort doesn't speed up small sorts at all. Speedups vary with how
your data looks--lots of duplicate items, or lots of strings that start
with the same bytes, slow things down a bit.

But where it helps, it helps! Here's a benchcmp with stdlib sort on an AWS
t2.medium box:

benchmark old ns/op new ns/op delta
BenchmarkSortString1K 338583 325094 -3.98%
BenchmarkSortInt1K 147321 81921 -44.39%
BenchmarkSortInt64K 14450286 5920935 -59.03%
BenchmarkSort1e2 67964 68114 +0.22%
BenchmarkSort1e4 15096831 8415723 -44.26%
BenchmarkSort1e6 2413166706 972895798 -59.68%

Sorting a shuffled copy of Ubuntu trusty's English /usr/share/dict/words
went from 239ms to 109ms. Sorting 11m shuffled Wikipedia page titles (with
fun quirks like 180k lines starting "List_of_") went from 19.6s to 11.9s.

There's more about how it works in radixsort.go comments, with links to
some resources that helped me writing it. The string/[]byte sort owes a lot
to McIlroy, Bostic, and McIlroy's American flag sort. All sorts call the
stdlib's qSort for small ranges. The string sorts also bail to quicksort
after radix-sorting by the first 32 bytes, mostly to limit stack blowup.

The import path ends radixsort.test because it seems wrong to freeze the
API *before* I can get any external feedback about it. In not too long I'll
put up whatever the final API is (rename some things? get rid of the need
for Less?) as plain radixsort.

You can also look at ".test" as a reminder that a new library can always
use tire-kicking. There are tests: 100% statement coverage for what that's
worth, thanks to a lot of the tests from stdlib sort plus some tests to hit
tricky cases in the new code. Right now radixsort always checks that your
results are really sorted and panics(!) if not. But if folks play with it
and it works, that's encouraging.

Some usage gotchas: with signed ints or floats, remember to use the helper
functions like Int32Key or Float32Key and Float32Less; just converting
numbers to uint64 will mishandle negative numbers, and not-a-number float
values are tricky. sort.Reverse() won't work with radixsort, but
radixsort.Flip(data) will flip ascending-sorted data to descending. There's
no stable sort. Remember that the string sorts aren't doing any fancy
collation, just handling data as raw bytes, so pairs like é and e won't
sort next to each other. If you want to report a problem with radixsort,
include your Less and Key functions, since mistakes in them (or data races)
can be hard to distinguish from bugs in radixsort.

Of course, there are lots of useful or fun ideas I haven't done here:
faster type-specific sorts! parallel sorts! different interfaces like
bradfitz/slice! filesorts! If you're into those or whatever other fun
ideas, of course I'm all for you forking or taking pieces of radixsort if
it helps!

Between work and life, I can't promise much support, but e-mail or a GitHub
issue are always worth a shot. Love to hear if you find it useful. You can
bug me on Twitter as @rf.

Best,
Randall

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Egon at Mar 20, 2015 at 7:16 am
    Few ideas:

    Use *Numbers*, *Strings*... instead of *NumberInterface* and
    *StringInterface*... it looks cleaner that way.. e.g. radixsort.Numbers,
    radixsort.Strings

    Regarding the usage gotchas, I had several ideas how to eliminate them:

    Maybe using http://play.golang.org/p/3HLnlzt3gm would make it better?

    Alternatively similarly to http://golang.org/pkg/sort/#Float64Slice it
    could have http://play.golang.org/p/60ZC3n8TnO for the common types - i.e.
    making it easier to do the right thing with special cases?
    On Friday, 20 March 2015 06:46:36 UTC+2, Randall Farmer wrote:

    Hi!

    github.com/twotwotwo/radixsort.test
    <http://www.google.com/url?q=http%3A%2F%2Fgithub.com%2Ftwotwotwo%2Fradixsort.test&sa=D&sntz=1&usg=AFQjCNFYxbBGKq3OhjeL3FIhoa0Pz-VUMA>
    is a new package providing radix sorting by a string, []byte, or uint64
    key. It has helpers for sorting other numeric types (floats and signed
    ints) with the uint64 sort. You'd use it as a faster alternative to the
    standard package sort in places where you're sorting at least thousands of
    items by a key of one of the supported types, and the gain in sort speed
    matters enough to justify using a less elegant interface than the stdlib's.

    You use it by implementing sort.Interface plus one more method, Key(i
    int), returning item i's key as string/[]byte/uint64. Then you call
    radixsort.ByString, ByBytes, or ByNumber on your data as appropriate.
    radixsort exports convenience types and functions like stdlib sort does
    (radixsort.Ints(), etc.). See the godoc for details and examples:
    godoc.org/github.com/twotwotwo/radixsort.test

    Most folks should just use stdlib sort: there's a good chance sorting is
    not a bottleneck for you, and thus not worth extra effort to speed up;
    also, radixsort doesn't speed up small sorts at all. Speedups vary with how
    your data looks--lots of duplicate items, or lots of strings that start
    with the same bytes, slow things down a bit.

    But where it helps, it helps! Here's a benchcmp with stdlib sort on an AWS
    t2.medium box:

    benchmark old ns/op new ns/op delta
    BenchmarkSortString1K 338583 325094 -3.98%
    BenchmarkSortInt1K 147321 81921 -44.39%
    BenchmarkSortInt64K 14450286 5920935 -59.03%
    BenchmarkSort1e2 67964 68114 +0.22%
    BenchmarkSort1e4 15096831 8415723 -44.26%
    BenchmarkSort1e6 2413166706 972895798 -59.68%

    Sorting a shuffled copy of Ubuntu trusty's English /usr/share/dict/words
    went from 239ms to 109ms. Sorting 11m shuffled Wikipedia page titles (with
    fun quirks like 180k lines starting "List_of_") went from 19.6s to 11.9s.

    There's more about how it works in radixsort.go comments, with links to
    some resources that helped me writing it. The string/[]byte sort owes a lot
    to McIlroy, Bostic, and McIlroy's American flag sort. All sorts call the
    stdlib's qSort for small ranges. The string sorts also bail to quicksort
    after radix-sorting by the first 32 bytes, mostly to limit stack blowup.

    The import path ends radixsort.test because it seems wrong to freeze the
    API *before* I can get any external feedback about it. In not too long I'll
    put up whatever the final API is (rename some things? get rid of the need
    for Less?) as plain radixsort.

    You can also look at ".test" as a reminder that a new library can always
    use tire-kicking. There are tests: 100% statement coverage for what that's
    worth, thanks to a lot of the tests from stdlib sort plus some tests to hit
    tricky cases in the new code. Right now radixsort always checks that your
    results are really sorted and panics(!) if not. But if folks play with it
    and it works, that's encouraging.
    Some usage gotchas: with signed ints or floats, remember to use the helper
    functions like Int32Key or Float32Key and Float32Less; just converting
    numbers to uint64 will mishandle negative numbers, and not-a-number float
    values are tricky. sort.Reverse() won't work with radixsort, but
    radixsort.Flip(data) will flip ascending-sorted data to descending. There's
    no stable sort. Remember that the string sorts aren't doing any fancy
    collation, just handling data as raw bytes, so pairs like é and e won't
    sort next to each other. If you want to report a problem with radixsort,
    include your Less and Key functions, since mistakes in them (or data races)
    can be hard to distinguish from bugs in radixsort.

    Of course, there are lots of useful or fun ideas I haven't done here:
    faster type-specific sorts! parallel sorts! different interfaces like
    bradfitz/slice! filesorts! If you're into those or whatever other fun
    ideas, of course I'm all for you forking or taking pieces of radixsort if
    it helps!
    Btw. there I've optimized that bradfitz version,
    https://github.com/egonelbre/slice

    Between work and life, I can't promise much support, but e-mail or a
    GitHub issue are always worth a shot. Love to hear if you find it useful.
    You can bug me on Twitter as @rf.

    Best,
    Randall
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Randall Farmer at Mar 20, 2015 at 8:35 am
    Use *Numbers*, *Strings*... instead of *NumberInterface* and
    *StringInterface*... it looks cleaner that way.. e.g. radixsort.Numbers,
    radixsort.Strings

    Strings and Bytes names are already taken for the shortcut functions to
    sort a []string and a [][]byte. The user doesn't usually need the interface
    names in their code, though, so they aren't as important to me as the other
    names. Thought about renaming them Interface{Number,String,Bytes} just to
    make them sort together in the godoc, though, and was torn between
    NumberInterface+ByNumber() and Uint64Interface+ByUint64(): Number makes
    clear you use it with any numeric data (type's still there in Key's
    signature), uint64 makes clear what the code's really sorting.
    Maybe using http://play.golang.org/p/3HLnlzt3gm would make it better?
    Worries me that each Key call would make an interface call to At, and each
    Less would call At twice, if I follow. It looks like something you can
    implement from outside the package, though, either to check on the speed
    impact or otherwise.
    Alternatively similarly to http://golang.org/pkg/sort/#Float64Slice it
    could have http://play.golang.org/p/60ZC3n8TnO for the common types - i.e.
    making it easier to do the right thing with special cases?

    There are indeed types and shortcut functions like stdlib sort's, including
    for common int/float types ([]float32, []float64, []int, []int32, []int64)
    -- see http://godoc.org/github.com/twotwotwo/radixsort.test#Float32Slice
    and http://godoc.org/github.com/twotwotwo/radixsort.test#Float32s for
    instance.
    On Fri, Mar 20, 2015 at 12:16 AM, Egon wrote:

    Few ideas:

    Use *Numbers*, *Strings*... instead of *NumberInterface* and
    *StringInterface*... it looks cleaner that way.. e.g. radixsort.Numbers,
    radixsort.Strings

    Regarding the usage gotchas, I had several ideas how to eliminate them:

    Maybe using http://play.golang.org/p/3HLnlzt3gm would make it better?

    Alternatively similarly to http://golang.org/pkg/sort/#Float64Slice it
    could have http://play.golang.org/p/60ZC3n8TnO for the common types -
    i.e. making it easier to do the right thing with special cases?
    On Friday, 20 March 2015 06:46:36 UTC+2, Randall Farmer wrote:

    Hi!

    github.com/twotwotwo/radixsort.test
    <http://www.google.com/url?q=http%3A%2F%2Fgithub.com%2Ftwotwotwo%2Fradixsort.test&sa=D&sntz=1&usg=AFQjCNFYxbBGKq3OhjeL3FIhoa0Pz-VUMA>
    is a new package providing radix sorting by a string, []byte, or uint64
    key. It has helpers for sorting other numeric types (floats and signed
    ints) with the uint64 sort. You'd use it as a faster alternative to the
    standard package sort in places where you're sorting at least thousands of
    items by a key of one of the supported types, and the gain in sort speed
    matters enough to justify using a less elegant interface than the stdlib's.

    You use it by implementing sort.Interface plus one more method, Key(i
    int), returning item i's key as string/[]byte/uint64. Then you call
    radixsort.ByString, ByBytes, or ByNumber on your data as appropriate.
    radixsort exports convenience types and functions like stdlib sort does
    (radixsort.Ints(), etc.). See the godoc for details and examples:
    godoc.org/github.com/twotwotwo/radixsort.test

    Most folks should just use stdlib sort: there's a good chance sorting is
    not a bottleneck for you, and thus not worth extra effort to speed up;
    also, radixsort doesn't speed up small sorts at all. Speedups vary with how
    your data looks--lots of duplicate items, or lots of strings that start
    with the same bytes, slow things down a bit.

    But where it helps, it helps! Here's a benchcmp with stdlib sort on an
    AWS t2.medium box:

    benchmark old ns/op new ns/op delta
    BenchmarkSortString1K 338583 325094 -3.98%
    BenchmarkSortInt1K 147321 81921 -44.39%
    BenchmarkSortInt64K 14450286 5920935 -59.03%
    BenchmarkSort1e2 67964 68114 +0.22%
    BenchmarkSort1e4 15096831 8415723 -44.26%
    BenchmarkSort1e6 2413166706 972895798 -59.68%

    Sorting a shuffled copy of Ubuntu trusty's English /usr/share/dict/words
    went from 239ms to 109ms. Sorting 11m shuffled Wikipedia page titles (with
    fun quirks like 180k lines starting "List_of_") went from 19.6s to 11.9s.

    There's more about how it works in radixsort.go comments, with links to
    some resources that helped me writing it. The string/[]byte sort owes a lot
    to McIlroy, Bostic, and McIlroy's American flag sort. All sorts call the
    stdlib's qSort for small ranges. The string sorts also bail to quicksort
    after radix-sorting by the first 32 bytes, mostly to limit stack blowup.

    The import path ends radixsort.test because it seems wrong to freeze the
    API *before* I can get any external feedback about it. In not too long I'll
    put up whatever the final API is (rename some things? get rid of the need
    for Less?) as plain radixsort.

    You can also look at ".test" as a reminder that a new library can always
    use tire-kicking. There are tests: 100% statement coverage for what that's
    worth, thanks to a lot of the tests from stdlib sort plus some tests to hit
    tricky cases in the new code. Right now radixsort always checks that your
    results are really sorted and panics(!) if not. But if folks play with it
    and it works, that's encouraging.
    Some usage gotchas: with signed ints or floats, remember to use the
    helper functions like Int32Key or Float32Key and Float32Less; just
    converting numbers to uint64 will mishandle negative numbers, and
    not-a-number float values are tricky. sort.Reverse() won't work with
    radixsort, but radixsort.Flip(data) will flip ascending-sorted data to
    descending. There's no stable sort. Remember that the string sorts aren't
    doing any fancy collation, just handling data as raw bytes, so pairs like é
    and e won't sort next to each other. If you want to report a problem with
    radixsort, include your Less and Key functions, since mistakes in them (or
    data races) can be hard to distinguish from bugs in radixsort.

    Of course, there are lots of useful or fun ideas I haven't done here:
    faster type-specific sorts! parallel sorts! different interfaces like
    bradfitz/slice! filesorts! If you're into those or whatever other fun
    ideas, of course I'm all for you forking or taking pieces of radixsort if
    it helps!
    Btw. there I've optimized that bradfitz version,
    https://github.com/egonelbre/slice

    Between work and life, I can't promise much support, but e-mail or a
    GitHub issue are always worth a shot. Love to hear if you find it useful.
    You can bug me on Twitter as @rf.

    Best,
    Randall
    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedMar 20, '15 at 4:46a
activeMar 20, '15 at 8:35a
posts3
users2
websitegolang.org

2 users in discussion

Randall Farmer: 2 posts Egon: 1 post

People

Translate

site design / logo © 2022 Grokbase