FAQ
I did a quite interesting notice, if i search for IndexId:x

(IndexId is unique) with a sort it still takes very long time, which

it doesn't without the sort.



Does anybody know why? I mean the resultset

contains exactly 1 document.



/Regards

Marcus

Search Discussions

  • Karl wettin at May 17, 2006 at 1:08 pm

    On Wed, 2006-05-17 at 14:23 +0200, Marcus Falck wrote:

    I did a quite interesting notice, if i search for IndexId:x
    (IndexId is unique) with a sort it still takes very long time, which
    it doesn't without the sort.

    Does anybody know why? I mean the resultset contains exactly 1
    document.
    Can you post some code with this?


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at May 17, 2006 at 2:32 pm

    On 5/17/06, Marcus Falck wrote:
    I did a quite interesting notice, if i search for IndexId:x
    (IndexId is unique) with a sort it still takes very long time, which
    it doesn't without the sort.
    This will only be the case the first time you sort on a field because
    a FieldCache entry is created for that field and then cached for
    subsequent sorts.

    -Yonik
    http://incubator.apache.org/solr Solr, the open-source Lucene search server

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Marcus Falck at May 18, 2006 at 6:22 pm
    I'm well aware of the trade offs. But if you were aware of the large amounts of data that this system should be able to search you woldn't propose the usage of a database.

    Since I have an separate alert service for immediatly alerts up and running i may be able to do trade offs with the data availability timings, and hold the indexsearcher open for a longer period.

    But still. The memory is the problem.
    I mean how much memory would the fieldcache take for 500 Millon newsletter articles? Probably a lot,
    ok the system is scaled out over different machines so in reality each machine won't have 500 Million docs but maybe around 100Million.

    So i'm still interesting in changing the relevance.
    Any ideas?

    /
    Marcus

    ________________________________

    Från: Yonik Seeley
    Skickat: to 2006-05-18 17:43
    Till: java-user@lucene.apache.org
    Ämne: Re: Sort problematics


    On 5/18/06, Marcus Falck wrote:
    But since my "real" index will be around 2TB in size I don't think sorting is the right way to go? I pretty sure I will have to modify the ranking.
    They are both sorts, and they both use a priority queue. The
    differences shouldn't be that great after the FieldCache is populated.
    The biggest downside to the FieldCache is the memory usage, not the
    CPU.
    And yes the data must be instantly available.
    For each update? If so, use a database - Lucene made different
    tradeoffs in it's design.

    -Yonik
    http://incubator.apache.org/solr Solr, the open-source Lucene search server

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Yonik Seeley at May 18, 2006 at 6:39 pm

    On 5/18/06, Marcus Falck wrote:
    I'm well aware of the trade offs. But if you were aware of the large amounts of data that this system should be able to search you woldn't propose the usage of a database.
    If you have a hard requirement of instantly seeing any update, you
    can't use Lucene. That's more database-like functionallity. That's
    why I asked.
    Since I have an separate alert service for immediatly alerts up and running i may be able to do trade offs with the data availability timings, and hold the indexsearcher open for a longer period.
    That's pretty much a requirement for using Lucene to support a decent
    query rate.

    But still. The memory is the problem.
    I mean how much memory would the fieldcache take for 500 Millon newsletter articles? Probably a lot,
    ok the system is scaled out over different machines so in reality each machine won't have 500 Million docs but maybe around 100Million.
    Depends on what you are sorting by... for an int/float 100M*4 or
    800MB. Big, but possible.
    So i'm still interesting in changing the relevance.
    Any ideas?
    Depends on what you are sorting by, and how many different ways you
    want to sort. If it's a single sort criteria, you can use index-time
    boosts. If you can sort multiple ways, avoiding the fieldcache
    probably won't help you because the time to retrieve the per-doc sort
    info via termvectors or stored fields will take too long.


    -Yonik
    http://incubator.apache.org/solr Solr, the open-source Lucene search server

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 17, '06 at 12:23p
activeMay 18, '06 at 6:39p
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase