FAQ
As per Lucene documentation -
"For good search performance, implementations of this method should not call
Searcher.doc(int) or IndexReader.document(int) on every document number
encountered. Doing so can slow searches by an order of magnitude or more."

My question is - what's the other way to get the Document object to avoid
performance bottleneck?

--
View this message in context: http://www.nabble.com/How-to-extract-Document-object-after-the-search--tp21788361p21788361.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Ian Lea at Feb 2, 2009 at 12:19 pm
    Hi


    That quote is from the javadoc for
    HitCollector/TopDocCollector.collect(). You missed out the bit saying
    "This is called in an inner search loop".

    If, as your subject implies, you want to get at the Document object
    AFTER the search, those methods are fine. Just don't use them for any
    more documents than you need, and not inside the collect() method.

    If you really need to get at document data inside the inner search
    loop I think you'll have to accept the performance hit or look into
    advanced stuff like payloads.


    --
    Ian.



    On Mon, Feb 2, 2009 at 11:54 AM, mittals
    wrote:
    As per Lucene documentation -
    "For good search performance, implementations of this method should not call
    Searcher.doc(int) or IndexReader.document(int) on every document number
    encountered. Doing so can slow searches by an order of magnitude or more."

    My question is - what's the other way to get the Document object to avoid
    performance bottleneck?
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ganesh at Feb 2, 2009 at 12:38 pm
    Searcher.doc(int) or IndexReader.document(int) will give you the document
    object and to my knowledge this is the only way available, however it is not
    advisable to query all documents (MatchAllDocsQuery) and load all document
    objects. While using Searcher.doc(int) or IndexReader.document(int), load
    only the required fields to display the results Searcher.doc(int,
    FieldSelector).

    Regards
    Ganesh


    ----- Original Message -----
    From: "mittals" <sourabh-931.mittal@morganstanley.com>
    To: <java-user@lucene.apache.org>
    Sent: Monday, February 02, 2009 5:24 PM
    Subject: How to extract Document object after the search?

    As per Lucene documentation -
    "For good search performance, implementations of this method should not
    call
    Searcher.doc(int) or IndexReader.document(int) on every document number
    encountered. Doing so can slow searches by an order of magnitude or more."

    My question is - what's the other way to get the Document object to avoid
    performance bottleneck?

    --
    View this message in context:
    http://www.nabble.com/How-to-extract-Document-object-after-the-search--tp21788361p21788361.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    Send instant messages to your online friends http://in.messenger.yahoo.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Uwe Schindler at Feb 2, 2009 at 12:47 pm
    Hi,

    you should generally not download all fields for all documents in the
    HitCollector Loop, if you really need it (because you want to do some
    analysis on the whole result set after search), you should do the following:

    - only retrieve those document fields, you really need (using a
    FieldSelector like SetBasedFieldSelector).
    - Do some buffering in the HitCollector: Allocate an array of int for the
    collected doc ids with a size of say 16,000. For each collect() call, add
    the document id to the array. When the array is full and at the end of
    collecting, call a flush method: This method sorts the array by ID (because
    if the Ids are in increasing order less seeking is needed) and then calls
    document(id) for each entry in a bulk. This is faster. In older versions of
    Lucene array sorting may not needed, but you really should do it (the newer
    search API may not return documents in doc order).

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: mittals
    Sent: Monday, February 02, 2009 12:54 PM
    To: java-user@lucene.apache.org
    Subject: How to extract Document object after the search?


    As per Lucene documentation -
    "For good search performance, implementations of this method should not
    call
    Searcher.doc(int) or IndexReader.document(int) on every document number
    encountered. Doing so can slow searches by an order of magnitude or more."

    My question is - what's the other way to get the Document object to avoid
    performance bottleneck?

    --
    View this message in context: http://www.nabble.com/How-to-extract-
    Document-object-after-the-search--tp21788361p21788361.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mittals at Feb 3, 2009 at 7:04 am
    Hi,

    I have not seen much time difference between when I load the single field &
    all the fields of a document.

    After search, lucene cache the documents into the memory. Is there any way
    to configure the no. of documents to be cached into the memory?

    what could be the benefit in using FieldSelectorResult.LOAD &
    FieldSelectorResult.LAZY_LOAD?

    Regards,
    Sourabh


    Uwe Schindler wrote:
    Hi,

    you should generally not download all fields for all documents in the
    HitCollector Loop, if you really need it (because you want to do some
    analysis on the whole result set after search), you should do the
    following:

    - only retrieve those document fields, you really need (using a
    FieldSelector like SetBasedFieldSelector).
    - Do some buffering in the HitCollector: Allocate an array of int for the
    collected doc ids with a size of say 16,000. For each collect() call, add
    the document id to the array. When the array is full and at the end of
    collecting, call a flush method: This method sorts the array by ID
    (because
    if the Ids are in increasing order less seeking is needed) and then calls
    document(id) for each entry in a bulk. This is faster. In older versions
    of
    Lucene array sorting may not needed, but you really should do it (the
    newer
    search API may not return documents in doc order).

    Uwe

    -----
    Uwe Schindler
    H.-H.-Meier-Allee 63, D-28213 Bremen
    http://www.thetaphi.de
    eMail: uwe@thetaphi.de

    -----Original Message-----
    From: mittals
    Sent: Monday, February 02, 2009 12:54 PM
    To: java-user@lucene.apache.org
    Subject: How to extract Document object after the search?


    As per Lucene documentation -
    "For good search performance, implementations of this method should not
    call
    Searcher.doc(int) or IndexReader.document(int) on every document number
    encountered. Doing so can slow searches by an order of magnitude or
    more."

    My question is - what's the other way to get the Document object to avoid
    performance bottleneck?

    --
    View this message in context: http://www.nabble.com/How-to-extract-
    Document-object-after-the-search--tp21788361p21788361.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/How-to-extract-Document-object-after-the-search--tp21788361p21804802.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Ian Lea at Feb 3, 2009 at 10:07 am

    I have not seen much time difference between when I load the single field &
    all the fields of a document.
    That's fine - sometimes it helps, sometimes it doesn't. Depends on
    the structure of your documents, maybe your hardware, maybe more. And
    sometimes a small difference, over many documents, can be worth
    having.
    After search, lucene cache the documents into the memory. Is there any way
    to configure the no. of documents to be cached into the memory?
    Umm. No, I don't believe that lucene does explicit document caching.
    Your OS may well cache the data files which can make a significant
    difference. See also all the recommendations elsewhere about sharing
    and warming searchers.
    what could be the benefit in using FieldSelectorResult.LOAD &
    FieldSelectorResult.LAZY_LOAD?
    If you have a document with, say, 2 small fields and 100 large fields
    and in some particular circumstance you only want the 2 small ones,
    using a FieldSelector like SetBasedFieldSelector, as Uwe suggested,
    can help by telling lucene not to load the 100 large fields unless you
    explicitly ask for them. Which you won't in this scenario.


    If you google for something like "lucene lazy loading" you'll find
    lots more info.


    --
    Ian.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Feb 3, 2009 at 1:39 pm
    Here's a writeup I did a couple of years ago that might help...

    http://wiki.apache.org/lucene-java/FieldSelectorPerformance?highlight=(fieldselector)

    Best
    Erick
    On Tue, Feb 3, 2009 at 5:06 AM, Ian Lea wrote:

    I have not seen much time difference between when I load the single field &
    all the fields of a document.
    That's fine - sometimes it helps, sometimes it doesn't. Depends on
    the structure of your documents, maybe your hardware, maybe more. And
    sometimes a small difference, over many documents, can be worth
    having.
    After search, lucene cache the documents into the memory. Is there any way
    to configure the no. of documents to be cached into the memory?
    Umm. No, I don't believe that lucene does explicit document caching.
    Your OS may well cache the data files which can make a significant
    difference. See also all the recommendations elsewhere about sharing
    and warming searchers.
    what could be the benefit in using FieldSelectorResult.LOAD &
    FieldSelectorResult.LAZY_LOAD?
    If you have a document with, say, 2 small fields and 100 large fields
    and in some particular circumstance you only want the 2 small ones,
    using a FieldSelector like SetBasedFieldSelector, as Uwe suggested,
    can help by telling lucene not to load the 100 large fields unless you
    explicitly ask for them. Which you won't in this scenario.


    If you google for something like "lucene lazy loading" you'll find
    lots more info.


    --
    Ian.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedFeb 2, '09 at 11:54a
activeFeb 3, '09 at 1:39p
posts7
users5
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase