FAQ

[scala-internals] specializing ordering, faster sort

Erik Osheim
Oct 11, 2012 at 2:47 pm
Hi folks,

Spire's 2.10.0 branch just got a nice quicksort implementation which is
faster than Scala's and is competitive with java.util.Arrays. I posted
on Twitter about it last night but here are the notes:

http://plastic-idolatry.com/scala/spire-sort.txt

Sean Parsons pointed out that this could be ported to the standard
library. So, my questions are:

1. Is there any chance to specialize scala.math.Ordering? Spire uses a
specialized spire.math.Order which is very similar, without problems.

2. Is there any interest in adapting Scala's existing quickSort to use
this implementation? It seems better across the board.

3. Are there backwards compatibility issues I'm not aware of? I'm
asking now with a view towards 2.11, although I would be willing to try
to port any of it to 2.10.1.

-- Erik

P.S. Would love to get a second pair of eyes on the code, or other
people's results (either confirming or denying my conclusions). The
branch in question is "2.10.0":

https://github.com/non/spire/tree/2.10.0
reply

Search Discussions

23 responses

  • Rex Kerr at Oct 11, 2012 at 4:05 pm
    Quicksorts have all sorts of ugly worst-case scenarios. Do you know that
    your code avoids the worst ones? (E.g. still fine on sorted/inverse sorted
    arrays?)

    --Rex
    On Thu, Oct 11, 2012 at 10:47 AM, Erik Osheim wrote:

    Hi folks,

    Spire's 2.10.0 branch just got a nice quicksort implementation which is
    faster than Scala's and is competitive with java.util.Arrays. I posted
    on Twitter about it last night but here are the notes:

    http://plastic-idolatry.com/scala/spire-sort.txt

    Sean Parsons pointed out that this could be ported to the standard
    library. So, my questions are:

    1. Is there any chance to specialize scala.math.Ordering? Spire uses a
    specialized spire.math.Order which is very similar, without problems.

    2. Is there any interest in adapting Scala's existing quickSort to use
    this implementation? It seems better across the board.

    3. Are there backwards compatibility issues I'm not aware of? I'm
    asking now with a view towards 2.11, although I would be willing to try
    to port any of it to 2.10.1.

    -- Erik

    P.S. Would love to get a second pair of eyes on the code, or other
    people's results (either confirming or denying my conclusions). The
    branch in question is "2.10.0":

    https://github.com/non/spire/tree/2.10.0
  • Erik Osheim at Oct 11, 2012 at 4:34 pm

    On Thu, Oct 11, 2012 at 11:13:05AM -0400, Rex Kerr wrote:
    Quicksorts have all sorts of ugly worst-case scenarios. Do you know that
    your code avoids the worst ones? (E.g. still fine on sorted/inverse sorted
    arrays?)
    Here is some Caliper output from sorting Arrays of 1024 Doubles:

    [info] benchmark layout us linear runtime
    [info] JavaSort random 18.89 =
    [info] JavaSort sorted 2.15 =
    [info] JavaSort reversed 2.58 =
    [info] ScalaQuicksort random 61.46 =====
    [info] ScalaQuicksort sorted 27.07 ==
    [info] ScalaQuicksort reversed 27.75 ==
    [info] SpireQuickSort random 16.96 =
    [info] SpireQuickSort sorted 8.52 =
    [info] SpireQuickSort reversed 12.21 =

    As you can see, Java has some crafty tricks to detect a sorted or
    reverse sorted array and short-circuit (9x faster). Scala's sort is
    about twice as fast on sorted/reversed data. Spire's sort is about
    twice as fast on sorted, but only about 1.3x faster on reversed.

    To me it still looks like a clear win, although if someone has some
    good fast heuristics for detecting this case and doing even better I'd
    accept a pull request! :)

    -- Erik
  • Ivan Todoroski at Oct 11, 2012 at 5:12 pm
    I apologize if I'm missing something blindingly obvious, but why does
    Scala even need to invent its own sorting algorithm? Why not just use
    the battle-tested JDK sort?

    You don't have to call the Java sort() method directly if that's
    undesirable for whatever reason, but you could still just look at the
    source code for that method and rewrite the algorithm in Scala.

    On 11.10.2012 18:06, Erik Osheim wrote:
    On Thu, Oct 11, 2012 at 11:13:05AM -0400, Rex Kerr wrote:
    Quicksorts have all sorts of ugly worst-case scenarios. Do you know that
    your code avoids the worst ones? (E.g. still fine on sorted/inverse sorted
    arrays?)
    Here is some Caliper output from sorting Arrays of 1024 Doubles:

    [info] benchmark layout us linear runtime
    [info] JavaSort random 18.89 =
    [info] JavaSort sorted 2.15 =
    [info] JavaSort reversed 2.58 =
    [info] ScalaQuicksort random 61.46 =====
    [info] ScalaQuicksort sorted 27.07 ==
    [info] ScalaQuicksort reversed 27.75 ==
    [info] SpireQuickSort random 16.96 =
    [info] SpireQuickSort sorted 8.52 =
    [info] SpireQuickSort reversed 12.21 =

    As you can see, Java has some crafty tricks to detect a sorted or
    reverse sorted array and short-circuit (9x faster). Scala's sort is
    about twice as fast on sorted/reversed data. Spire's sort is about
    twice as fast on sorted, but only about 1.3x faster on reversed.

    To me it still looks like a clear win, although if someone has some
    good fast heuristics for detecting this case and doing even better I'd
    accept a pull request! :)

    -- Erik
  • Erik Osheim at Oct 11, 2012 at 4:24 pm

    On Thu, Oct 11, 2012 at 06:20:24PM +0200, Ivan Todoroski wrote:
    I apologize if I'm missing something blindingly obvious, but why
    does Scala even need to invent its own sorting algorithm? Why not
    just use the battle-tested JDK sort?
    The big reason we'd like our own version is so that we can use
    specialized type classes to abstract between primitives and objects,
    which Java is unable to do (and also to plug in custom orderings
    dynamically rather than baking them into the classes).
    You don't have to call the Java sort() method directly if that's
    undesirable for whatever reason, but you could still just look at
    the source code for that method and rewrite the algorithm in Scala.
    My concern is violating the GPL2. Of course, it should be possible to
    translate the algorithm without violating the copyright, but I'd be
    nervous about using the code as a reference.

    I have been trying not to re-invent the wheel and definitely reading
    other implementations and algorithms (including Java's), but have tried
    to avoid basing the source code on anything that wasn't public domain
    or compatible with Spire's MIT license.

    -- Erik
  • Ivan Todoroski at Oct 11, 2012 at 7:40 pm

    On 11.10.2012 18:24, Erik Osheim wrote:
    On Thu, Oct 11, 2012 at 06:20:24PM +0200, Ivan Todoroski wrote:
    I apologize if I'm missing something blindingly obvious, but why
    does Scala even need to invent its own sorting algorithm? Why not
    just use the battle-tested JDK sort?
    The big reason we'd like our own version is so that we can use
    specialized type classes to abstract between primitives and objects,
    which Java is unable to do (and also to plug in custom orderings
    dynamically rather than baking them into the classes).
    But Java already has separate "manually specialized" versions of the
    Arrays.sort() method for each primitive type. Or is the problem that you
    want to use custom orderings for primitive types?
  • Tom Switzer at Oct 11, 2012 at 4:53 pm
    Yes. It is not enough to simply provide fast sort for natural orderings.
    On Thu, Oct 11, 2012 at 10:42 AM, Ivan Todoroski wrote:
    On 11.10.2012 18:24, Erik Osheim wrote:
    On Thu, Oct 11, 2012 at 06:20:24PM +0200, Ivan Todoroski wrote:

    I apologize if I'm missing something blindingly obvious, but why
    does Scala even need to invent its own sorting algorithm? Why not
    just use the battle-tested JDK sort?
    The big reason we'd like our own version is so that we can use
    specialized type classes to abstract between primitives and objects,
    which Java is unable to do (and also to plug in custom orderings
    dynamically rather than baking them into the classes).
    But Java already has separate "manually specialized" versions of the
    Arrays.sort() method for each primitive type. Or is the problem that you
    want to use custom orderings for primitive types?
  • Erik Osheim at Oct 12, 2012 at 12:20 am

    On Thu, Oct 11, 2012 at 06:42:27PM +0200, Ivan Todoroski wrote:
    But Java already has separate "manually specialized" versions of the
    Arrays.sort() method for each primitive type. Or is the problem that
    you want to use custom orderings for primitive types?
    Yes. I want to be able to write scala code parameterized on A, provide
    a custom ordering for A (in spire, an Order[A] instance) and then be
    able to use a fast sort algorithm on Array[A], without having to rely
    on "natural ordering" or having to extend Order[A] or Comparable[A], or
    having to use "default sorting" for primitives.

    You can probably see a lot of benefits to this, but an obvious one is
    that if you want to sort int[] to get the biggest first, you just
    reverse your Order[Int] and use the same sort method you always use.

    Hope this clarifies things!

    -- Erik
  • Ivan Todoroski at Oct 11, 2012 at 5:56 pm
    Thank you Erik and Tom for your patient explanations, I understand now.

    On 11.10.2012 18:49, Erik Osheim wrote:
    On Thu, Oct 11, 2012 at 06:42:27PM +0200, Ivan Todoroski wrote:
    But Java already has separate "manually specialized" versions of the
    Arrays.sort() method for each primitive type. Or is the problem that
    you want to use custom orderings for primitive types?
    Yes. I want to be able to write scala code parameterized on A, provide
    a custom ordering for A (in spire, an Order[A] instance) and then be
    able to use a fast sort algorithm on Array[A], without having to rely
    on "natural ordering" or having to extend Order[A] or Comparable[A], or
    having to use "default sorting" for primitives.

    You can probably see a lot of benefits to this, but an obvious one is
    that if you want to sort int[] to get the biggest first, you just
    reverse your Order[Int] and use the same sort method you always use.

    Hope this clarifies things!

    -- Erik
  • Erik Osheim at Oct 11, 2012 at 4:40 pm

    On Thu, Oct 11, 2012 at 11:13:05AM -0400, Rex Kerr wrote:
    Quicksorts have all sorts of ugly worst-case scenarios. Do you know that
    your code avoids the worst ones? (E.g. still fine on sorted/inverse sorted
    arrays?)
    It's a work-in-progress, although at the low-level it uses insertion
    sort rather than quick sort. I will run on some data that traditionally
    gives qsort O(n^2) behavior and see how it does, but I'm pretty sure
    that it's not worse than the sort that's in Scala library (quicksort
    without any run detection or code to deal with bad cases).

    -- Erik
  • Paolo G. Giarrusso at Oct 11, 2012 at 4:09 pm

    Il giorno giovedì 11 ottobre 2012 17:36:27 UTC+2, Erik Osheim ha scritto:
    On Thu, Oct 11, 2012 at 11:13:05AM -0400, Rex Kerr wrote:
    Quicksorts have all sorts of ugly worst-case scenarios. Do you know that
    your code avoids the worst ones? (E.g. still fine on sorted/inverse sorted
    arrays?)
    It's a work-in-progress, although at the low-level it uses insertion
    sort rather than quick sort. I will run on some data that traditionally
    gives qsort O(n^2) behavior and see how it does, but I'm pretty sure
    that it's not worse than the sort that's in Scala library (quicksort
    without any run detection or code to deal with bad cases).
    Is that a reference to "introspection sort" as implemented by the C++ STL?
    What I know is that it guarantees O(n log n) performance while being
    quicksort-based, and what I heard is that it just detects bad cases and
    switches to another algorithm.
    Disclaimer: I'm no expert on writing highly performant quicksort, so I'd
    like Scala's library authors to do that in my place and offer robust
    performance.

    Paolo
  • Erik at Oct 11, 2012 at 4:13 pm

    On Thu, Oct 11, 2012 at 09:04:41AM -0700, Paolo G. Giarrusso wrote:
    Is that a reference to "introspection sort" as implemented by the C++ STL?
    What I know is that it guarantees O(n log n) performance while being
    quicksort-based, and what I heard is that it just detects bad cases and
    switches to another algorithm.
    Introsort is a bit different, although it's also a hybrid sort. I'll
    definitely try it out and see how it does. Spire's sorting is entirely
    based on type classes and specialization, and I'm learning which
    methods seem to work best with those.
    Disclaimer: I'm no expert on writing highly performant quicksort, so I'd
    like Scala's library authors to do that in my place and offer robust
    performance.
    I agree that if Scala is going to offer sorting methods they should be
    as fast as possible, thus my email. :)

    -- Erik
  • Rex Kerr at Oct 11, 2012 at 4:15 pm
    Size 8 sorting networks are another ~20% faster than insertion sort.

    Here's code to make one:

    def sab(i: Int, j: Int) = {
    val List(a,b) = List(i,j).map(x => if (x==0) " " else "+"+x)
    """if (data(i0%s)>data(i0%s)) { val x = data(i0%s); data(i0%s) =
    data(i0%s); data(i0%s) = x }""".format(a,b,a,a,b,b)
    }

    // Bachter's Merge-Exchange of size 8 sorting network
    // Reordered for non-consecutive swaps across blocks
    val net = """[0,4],[2,6],[1,5],[3,7]
    [0,2],[4,6],[1,3],[5,7]
    [2,4],[0,1],[3,5],[6,7]
    [2,3],[4,5]
    [3,6],[1,4]
    [5,6],[1,2],[3,4]"""
    val xss = net.split('\n').flatMap(_.split("\\],\\[")).
    map(_.map(c => if (c.isDigit) c else ' ').
    split(' ').filter(_.length>0).map(_.toInt))
    xss.map(xs => sab(xs(0),xs(1))).foreach(println)

    --Rex
    On Thu, Oct 11, 2012 at 11:36 AM, Erik Osheim wrote:
    On Thu, Oct 11, 2012 at 11:13:05AM -0400, Rex Kerr wrote:
    Quicksorts have all sorts of ugly worst-case scenarios. Do you know that
    your code avoids the worst ones? (E.g. still fine on sorted/inverse sorted
    arrays?)
    It's a work-in-progress, although at the low-level it uses insertion
    sort rather than quick sort. I will run on some data that traditionally
    gives qsort O(n^2) behavior and see how it does, but I'm pretty sure
    that it's not worse than the sort that's in Scala library (quicksort
    without any run detection or code to deal with bad cases).

    -- Erik
  • Erik Osheim at Oct 11, 2012 at 4:17 pm

    On Thu, Oct 11, 2012 at 12:15:06PM -0400, Rex Kerr wrote:
    Size 8 sorting networks are another ~20% faster than insertion sort.

    Here's code to make one:
    Thanks! I'll definitely try plugging that in and see how it does.

    -- Erik
  • Rex Kerr at Oct 11, 2012 at 5:43 pm

    On Thu, Oct 11, 2012 at 12:16 PM, Erik Osheim wrote:
    On Thu, Oct 11, 2012 at 12:15:06PM -0400, Rex Kerr wrote:
    Size 8 sorting networks are another ~20% faster than insertion sort.

    Here's code to make one:
    Thanks! I'll definitely try plugging that in and see how it does.

    -- Erik
    On a hunch that the JVM was still less than entirely clever about array
    access, I gave this a try:

    final def snet8c(data: Array[Int], i0: Int) {
    var x0 = data(i0)
    var x1 = data(i0+1)
    var x2 = data(i0+2)
    var x3 = data(i0+3)
    var x4 = data(i0+4)
    var x5 = data(i0+5)
    var x6 = data(i0+6)
    var x7 = data(i0+7)
    if (x0>x4) { val x=x0; x0 = x4; x4 = x }
    if (x2>x6) { val x=x2; x2 = x6; x6 = x }
    if (x1>x5) { val x=x1; x1 = x5; x5 = x }
    if (x3>x7) { val x=x3; x3 = x7; x7 = x }
    if (x0>x2) { val x=x0; x0 = x2; x2 = x }
    if (x4>x6) { val x=x4; x4 = x6; x6 = x }
    if (x1>x3) { val x=x1; x1 = x3; x3 = x }
    if (x5>x7) { val x=x5; x5 = x7; x7 = x }
    if (x2>x4) { val x=x2; x2 = x4; x4 = x }
    if (x0>x1) { val x=x0; x0 = x1; x1 = x }
    if (x3>x5) { val x=x3; x3 = x5; x5 = x }
    if (x6>x7) { val x=x6; x6 = x7; x7 = x }
    if (x2>x3) { val x=x2; x2 = x3; x3 = x }
    if (x4>x5) { val x=x4; x4 = x5; x5 = x }
    if (x3>x6) { val x=x3; x3 = x6; x6 = x }
    if (x1>x4) { val x=x1; x1 = x4; x4 = x }
    if (x5>x6) { val x=x5; x5 = x6; x6 = x }
    if (x1>x2) { val x=x1; x1 = x2; x2 = x }
    if (x3>x4) { val x=x3; x3 = x4; x4 = x }
    data(i0) = x0
    data(i0+1) = x1
    data(i0+2) = x2
    data(i0+3) = x3
    data(i0+4) = x4
    data(i0+5) = x5
    data(i0+6) = x6
    data(i0+7) = x7
    }

    and it's another 20% faster again. I feel embarrassed on behalf of the JVM.

    --Rex
  • Tom Switzer at Oct 11, 2012 at 5:53 pm
    Awesome! Looks like using a sorting network for the leafs is a clear win.
    On Thu, Oct 11, 2012 at 10:42 AM, Rex Kerr wrote:
    On Thu, Oct 11, 2012 at 12:16 PM, Erik Osheim wrote:
    On Thu, Oct 11, 2012 at 12:15:06PM -0400, Rex Kerr wrote:
    Size 8 sorting networks are another ~20% faster than insertion sort.

    Here's code to make one:
    Thanks! I'll definitely try plugging that in and see how it does.

    -- Erik
    On a hunch that the JVM was still less than entirely clever about array
    access, I gave this a try:

    final def snet8c(data: Array[Int], i0: Int) {
    var x0 = data(i0)
    var x1 = data(i0+1)
    var x2 = data(i0+2)
    var x3 = data(i0+3)
    var x4 = data(i0+4)
    var x5 = data(i0+5)
    var x6 = data(i0+6)
    var x7 = data(i0+7)
    if (x0>x4) { val x=x0; x0 = x4; x4 = x }
    if (x2>x6) { val x=x2; x2 = x6; x6 = x }
    if (x1>x5) { val x=x1; x1 = x5; x5 = x }
    if (x3>x7) { val x=x3; x3 = x7; x7 = x }
    if (x0>x2) { val x=x0; x0 = x2; x2 = x }
    if (x4>x6) { val x=x4; x4 = x6; x6 = x }
    if (x1>x3) { val x=x1; x1 = x3; x3 = x }
    if (x5>x7) { val x=x5; x5 = x7; x7 = x }
    if (x2>x4) { val x=x2; x2 = x4; x4 = x }
    if (x0>x1) { val x=x0; x0 = x1; x1 = x }
    if (x3>x5) { val x=x3; x3 = x5; x5 = x }
    if (x6>x7) { val x=x6; x6 = x7; x7 = x }
    if (x2>x3) { val x=x2; x2 = x3; x3 = x }
    if (x4>x5) { val x=x4; x4 = x5; x5 = x }
    if (x3>x6) { val x=x3; x3 = x6; x6 = x }
    if (x1>x4) { val x=x1; x1 = x4; x4 = x }
    if (x5>x6) { val x=x5; x5 = x6; x6 = x }
    if (x1>x2) { val x=x1; x1 = x2; x2 = x }
    if (x3>x4) { val x=x3; x3 = x4; x4 = x }
    data(i0) = x0
    data(i0+1) = x1
    data(i0+2) = x2
    data(i0+3) = x3
    data(i0+4) = x4
    data(i0+5) = x5
    data(i0+6) = x6
    data(i0+7) = x7
    }

    and it's another 20% faster again. I feel embarrassed on behalf of the
    JVM.

    --Rex
  • Erik Osheim at Oct 11, 2012 at 6:41 pm

    On Thu, Oct 11, 2012 at 12:42:44PM -0400, Rex Kerr wrote:
    On a hunch that the JVM was still less than entirely clever about array
    access, I gave this a try: ...
    and it's another 20% faster again. I feel embarrassed on behalf of the JVM.
    Heh, yeah, I feel like this is always my experience... it's good at
    things I'm surprised about, but then still needs hand-holding in places
    I would have hoped it didn't.

    Thanks for working on this. When I get a moment I'll try to report some
    benchmarking numbers incorporating this.

    -- Erik
  • Rex Kerr at Oct 11, 2012 at 7:45 pm
    You might want a size-16 sorting network for the quicksort. I didn't
    realize you used 16 instead of 8 there. And I'm not sure what you want to
    do with non-multiple sizes.

    Anyway, I've tested the network below out and it's decent (same ~20%
    improvement). Haven't tested it with the let's-make-a-zillion-local-vars
    approach. It should be relatively straightforward to modify my generator
    to spit out the appropriate code given the network description below.

    val net = """[[0,8],[1,9],[2,10],[3,11],[4,12],[5,13],[6,14],[7,15]]
    [[0,4],[1,5],[2,6],[3,7],[8,12],[9,13],[10,14],[11,15]]
    [[4,8],[5,9],[6,10],[7,11],[0,2],[1,3],[12,14],[13,15]]
    [[4,6],[5,7],[8,10],[9,11],[0,1],[14,15]]
    [[2,8],[3,9],[6,12],[7,13]]
    [[2,4],[3,5],[6,8],[7,9],[10,12],[11,13]]
    [[2,3],[4,5],[6,7],[8,9],[10,11],[12,13]]
    [[1,8],[3,10],[5,12],[7,14]]
    [[1,4],[3,6],[5,8],[7,10],[9,12],[11,14]]
    [[1,2],[3,4],[5,6],[7,8],[9,10],[11,12],[13,14]]"""

    --Rex

    P.S. You can request sorting networks at
    http://pages.ripco.net/~jgamble/nw.html

    P.P.S. Is this really the right mailing list for this?
    On Thu, Oct 11, 2012 at 12:58 PM, Erik Osheim wrote:
    On Thu, Oct 11, 2012 at 12:42:44PM -0400, Rex Kerr wrote:
    On a hunch that the JVM was still less than entirely clever about array
    access, I gave this a try: ...
    and it's another 20% faster again. I feel embarrassed on behalf of the
    JVM.

    Heh, yeah, I feel like this is always my experience... it's good at
    things I'm surprised about, but then still needs hand-holding in places
    I would have hoped it didn't.

    Thanks for working on this. When I get a moment I'll try to report some
    benchmarking numbers incorporating this.

    -- Erik
  • Erik Osheim at Oct 11, 2012 at 7:17 pm

    On Thu, Oct 11, 2012 at 01:56:51PM -0400, Rex Kerr wrote:
    P.P.S. Is this really the right mailing list for this?
    I think it was the right list to ask about specializing Ordering and
    changing scala.util.Sorting. It's probably not the right list for all
    the follow-up.

    I'll send more results to you off-list.

    -- Erik
  • Ismael Juma at Oct 12, 2012 at 8:25 am

    On Thu, Oct 11, 2012 at 5:42 PM, Rex Kerr wrote:

    On a hunch that the JVM was still less than entirely clever about array
    access, I gave this a try:
    For whatever it's worth, there is interesting work going on in HotSpot to
    take advantage of avx/avx2 to optimise array operations.

    and it's another 20% faster again. I feel embarrassed on behalf of the JVM.
    >

    It makes me uncomfortable when statements like these are made without
    specifying the JVM version and the CPU used in the test. Things can vary
    quite a bit when those variables change.

    Best,
    Ismael
  • Rex Kerr at Oct 12, 2012 at 7:59 am

    On Fri, Oct 12, 2012 at 3:30 AM, Ismael Juma wrote:
    On Thu, Oct 11, 2012 at 5:42 PM, Rex Kerr wrote:

    On a hunch that the JVM was still less than entirely clever about array
    access, I gave this a try:
    For whatever it's worth, there is interesting work going on in HotSpot to
    take advantage of avx/avx2 to optimise array operations.
    In this case, not very much since it's small-stride "random" access.
    Sorting is a high-localization-entropy process; not much vectorization to
    take advantage of unless there are explicit sort instructions. It could be
    really useful for other things like matrix math.

    and it's another 20% faster again. I feel embarrassed on behalf of the
    JVM.
    It makes me uncomfortable when statements like these are made without
    specifying the JVM version and the CPU used in the test. Things can vary
    quite a bit when those variables change.
    Oracle JVM 1.6.0_31 and 1.7.0_7 with a Intel Xeon X5680 (3.33 GHz) and an
    Intel i7 920 (2.67 GHz). All pretty much the same result. I don't have an
    AMD machine handy at the moment.

    --Rex
  • Ismael Juma at Oct 12, 2012 at 10:00 am

    On Fri, Oct 12, 2012 at 8:59 AM, Rex Kerr wrote:

    In this case, not very much since it's small-stride "random" access.
    Sorting is a high-localization-entropy process; not much vectorization to
    take advantage of unless there are explicit sort instructions. It could be
    really useful for other things like matrix math.
    Fair enough.

    Oracle JVM 1.6.0_31 and 1.7.0_7 with a Intel Xeon X5680 (3.33 GHz) and an
    Intel i7 920 (2.67 GHz). All pretty much the same result. I don't have an
    AMD machine handy at the moment.
    Sounds good.

    Ismael
  • √iktor Ҡlang at Oct 12, 2012 at 10:44 am

    On Fri, Oct 12, 2012 at 9:59 AM, Rex Kerr wrote:
    On Fri, Oct 12, 2012 at 3:30 AM, Ismael Juma wrote:
    On Thu, Oct 11, 2012 at 5:42 PM, Rex Kerr wrote:

    On a hunch that the JVM was still less than entirely clever about array
    access, I gave this a try:
    For whatever it's worth, there is interesting work going on in HotSpot to
    take advantage of avx/avx2 to optimise array operations.
    In this case, not very much since it's small-stride "random" access.
    Sorting is a high-localization-entropy process; not much vectorization to
    take advantage of unless there are explicit sort instructions. It could be
    really useful for other things like matrix math.

    and it's another 20% faster again. I feel embarrassed on behalf of the
    JVM.
    It makes me uncomfortable when statements like these are made without
    specifying the JVM version and the CPU used in the test. Things can vary
    quite a bit when those variables change.
    Oracle JVM 1.6.0_31 and 1.7.0_7 with a Intel Xeon X5680 (3.33 GHz) and an
    Intel i7 920 (2.67 GHz). All pretty much the same result. I don't have an
    AMD machine handy at the moment.
    They have quite different cache performance (especially for parallel ops
    since AMD usually share L3 instead of L2 between cores, so hopping cores is
    more expensive)

    Cheers,



    --Rex

    --
    Viktor Klang

    Akka Tech Lead
    Typesafe <http://www.typesafe.com/> - The software stack for applications
    that scale

    Twitter: @viktorklang
  • Josh Suereth at Oct 11, 2012 at 8:01 pm
    SO First off - Great work!

    Second - I think specializing has consequences we can't entertain for
    2.10.1, so 2.11 is a better target. And even *better* than that is to
    promote Spire as a library that people bang on and use it in the 2.10.x
    series as a basis for a SIP/Pull Request on improving the standard library
    for 2.11 or later.


    On Thu, Oct 11, 2012 at 10:47 AM, Erik Osheim wrote:

    Hi folks,

    Spire's 2.10.0 branch just got a nice quicksort implementation which is
    faster than Scala's and is competitive with java.util.Arrays. I posted
    on Twitter about it last night but here are the notes:

    http://plastic-idolatry.com/scala/spire-sort.txt

    Sean Parsons pointed out that this could be ported to the standard
    library. So, my questions are:

    1. Is there any chance to specialize scala.math.Ordering? Spire uses a
    specialized spire.math.Order which is very similar, without problems.

    2. Is there any interest in adapting Scala's existing quickSort to use
    this implementation? It seems better across the board.

    3. Are there backwards compatibility issues I'm not aware of? I'm
    asking now with a view towards 2.11, although I would be willing to try
    to port any of it to 2.10.1.

    -- Erik

    P.S. Would love to get a second pair of eyes on the code, or other
    people's results (either confirming or denying my conclusions). The
    branch in question is "2.10.0":

    https://github.com/non/spire/tree/2.10.0

Related Discussions