FAQ
I'm having a problem with the performance of lazily-loaded fields with
lucene. The basic structure of the code is that I get a set of documents
back from a query, then iterate through them, reading most fields to collect
fragments. This is taking an excessively long amount of time- mostly in my
call to getField on the document. Each such call takes milliseconds,
meaning a query that even returns only a few dozen items takes me hundreds
of milliseconds to process, which is causing problems elsewhere. So the
question I have is if there is some way I can optimize the loading of
fields? I am calling the doc function on the index search with a null
FieldSelector, but this does not seem to reduce the cost of getting fields
(indeed, it seems to slow down the whole query processing by a significant
factor). Is there any help anyone can give me?

Thanks.

Brian

Search Discussions

  • Sanne Grinovero at Mar 21, 2011 at 7:17 pm

    2011/3/21 Brian Hurt <bhurt42@gmail.com>:
    I'm having a problem with the performance of lazily-loaded fields with
    lucene.  The basic structure of the code is that I get a set of documents
    back from a query, then iterate through them, reading most fields to collect
    fragments.  This is taking an excessively long amount of time- mostly in my
    call to getField on the document.  Each such call takes milliseconds,
    meaning a query that even returns only a few dozen items takes me hundreds
    of milliseconds to process, which is causing problems elsewhere.  So the
    question I have is if there is some way I can optimize the loading of
    fields?  I am calling the doc function on the index search with a null
    FieldSelector, but this does not seem to reduce the cost of getting fields
    (indeed, it seems to slow down the whole query processing by a significant
    factor).  Is there any help anyone can give me?
    Hi,
    you should not use a null FieldSelector, but create one instead, for
    example the org.apache.lucene.document.MapFieldSelector,
    so to make sure you're only going to initialize the fields you
    actually need (assuming you need a subset).
    Also you might have a look into FieldCache usage, there's an excellent
    chapter on it in Lucene in Action (second edition);
    this approach assumes you can dedicate some memory to cache extracted
    field values in memory, and you're in a read-mostly
    scenario (as filling the FieldCache might be more costly than a single
    extraction: it affects all elements in the index, not your Hits only).

    I've also seen that the cost of Field extraction is quite cheap on
    solid state disks,
    compared to rotating disks at least.

    Regards,
    Sanne
    Thanks.

    Brian
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erick Erickson at Mar 22, 2011 at 1:23 pm
    Don't do that <G>.... Let's back up a second and
    ask why in the world you want to do this, what's the
    use-case you're satisfying? Because spinning through
    all the results and getting information from the underlying
    documents is inherently expensive since, as Sanne
    says, you're doing disk seeks. Most Lucene apps
    only do this for a few documents (think pages) of a
    search.

    So understanding why you want to do this may allow
    some alternatives to be suggested.

    Best
    Erick
    On Mon, Mar 21, 2011 at 2:16 PM, Brian Hurt wrote:
    I'm having a problem with the performance of lazily-loaded fields with
    lucene.  The basic structure of the code is that I get a set of documents
    back from a query, then iterate through them, reading most fields to collect
    fragments.  This is taking an excessively long amount of time- mostly in my
    call to getField on the document.  Each such call takes milliseconds,
    meaning a query that even returns only a few dozen items takes me hundreds
    of milliseconds to process, which is causing problems elsewhere.  So the
    question I have is if there is some way I can optimize the loading of
    fields?  I am calling the doc function on the index search with a null
    FieldSelector, but this does not seem to reduce the cost of getting fields
    (indeed, it seems to slow down the whole query processing by a significant
    factor).  Is there any help anyone can give me?

    Thanks.

    Brian
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 21, '11 at 6:16p
activeMar 22, '11 at 1:23p
posts3
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase