FAQ
Guys.

I've noticed many having trouble with sorting and OOM. Eventually they solve
it by throwing more memory at the problem.

Should'nt a solution which can sort on disk when neccessary be implemented
in core Lucene ?
Something like this:
http://www.codeodor.com/index.cfm/2007/5/10/Sorting-really-BIG-files/1194

Since you obviously know the result size you can calculate how much memory
is needed for the sort and if the calculated value s higher then a
configurable threshold an external on disk sort is performed and perhaps a
logging message which states something on a WARN level.

Just a thought since I'm about to implement something which could sort any
Comparable object but on disk.

Guess the Hadoop project have the perfect tools for this since everything
the mapred inputfiles are sorted, on disk and huge.

Kindly

//Marcus


--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Search Discussions

  • Yonik Seeley at Jul 29, 2008 at 7:14 pm
    The problem isn't sorting per-se... the problem is quickly retrieving
    the sort value for a document. For that, we currently have the
    FieldCache.... that's what takes up the memory. There are more memory
    efficient ways, but they just haven't been implemented yet.

    -Yonik

    On Tue, Jul 29, 2008 at 3:05 PM, Marcus Herou
    wrote:
    Guys.

    I've noticed many having trouble with sorting and OOM. Eventually they solve
    it by throwing more memory at the problem.

    Should'nt a solution which can sort on disk when neccessary be implemented
    in core Lucene ?
    Something like this:
    http://www.codeodor.com/index.cfm/2007/5/10/Sorting-really-BIG-files/1194

    Since you obviously know the result size you can calculate how much memory
    is needed for the sort and if the calculated value s higher then a
    configurable threshold an external on disk sort is performed and perhaps a
    logging message which states something on a WARN level.

    Just a thought since I'm about to implement something which could sort any
    Comparable object but on disk.

    Guess the Hadoop project have the perfect tools for this since everything
    the mapred inputfiles are sorted, on disk and huge.

    Kindly

    //Marcus


    --
    Marcus Herou CTO and co-founder Tailsweep AB
    +46702561312
    marcus.herou@tailsweep.com
    http://www.tailsweep.com/
    http://blogg.tailsweep.com/
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org
  • Marcus Herou at Jul 30, 2008 at 9:54 am
    Yep a disk sort is slow as hell compared to mem sort. What I was thinking
    was something like a db thinks.

    MySQL for example does exactly this. If the resultset do not fit properly in
    mem spool it on disk and sort it.

    The thing is that it would allow you to continue adding docs to the index
    even though you should invest in more memory asap.

    Kindly

    //Marcus
    On Tue, Jul 29, 2008 at 9:17 PM, Mark Miller wrote:

    I think you'll find it slow to add disk seeks in the sort on each search.
    Something you might be able to work from though (though I doubt it still
    applys cleanly) is Hoss' issue
    https://issues.apache.org/jira/browse/LUCENE-831. This allows for a
    pluggable cache implementation for sorting. Also allows for much faster
    reopening in most cases - hasn't seen any activity, and I think they are
    looking to get the reopen gains elsewhere, but it may be worth playing with.

    - Mark


    Marcus Herou wrote:
    Guys.

    I've noticed many having trouble with sorting and OOM. Eventually they
    solve
    it by throwing more memory at the problem.

    Should'nt a solution which can sort on disk when neccessary be implemented
    in core Lucene ?
    Something like this:
    http://www.codeodor.com/index.cfm/2007/5/10/Sorting-really-BIG-files/1194

    Since you obviously know the result size you can calculate how much memory
    is needed for the sort and if the calculated value s higher then a
    configurable threshold an external on disk sort is performed and perhaps a
    logging message which states something on a WARN level.

    Just a thought since I'm about to implement something which could sort any
    Comparable object but on disk.

    Guess the Hadoop project have the perfect tools for this since everything
    the mapred inputfiles are sorted, on disk and huge.

    Kindly

    //Marcus



    --
    Marcus Herou CTO and co-founder Tailsweep AB
    +46702561312
    marcus.herou@tailsweep.com
    http://www.tailsweep.com/
    http://blogg.tailsweep.com/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-dev @
categorieslucene
postedJul 29, '08 at 7:14p
activeJul 30, '08 at 9:54a
posts3
users2
websitelucene.apache.org

2 users in discussion

Marcus Herou: 2 posts Yonik Seeley: 1 post

People

Translate

site design / logo © 2021 Grokbase