FAQ
Hi,

I have a Lucene index with 70,000 documents. The size of the index is round
300MB - I have 32 fields, but I only retrieve values for 11 fields when I
display results on the page (I use FieldSelector). The query time is great -
most of the queries execute under 50ms, however when I loop through results
to retrieve documents (I only retrieve 10 documents at once, because this is
my pagination size), the loop takes sometimes even longer than 300ms

I applied all recommendations mentioned on that page
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed to optimize query
time.

I was really surprised when I noticed that the bottleneck is in the loop
below, especially becuase the index is not that big (only 300MB) and the
hits array contains only 10 elements

Is there anything else I can do to optimize document retrieval from lucene
index?

Here is the sample code:

Dim FastFieldSelector As New FastFieldSelector(Fields)
For i As Integer = 0 To hits.Length - 1
Dim DocId As Integer = hits(i).DocId
* Dim doc As Document = reader.Document(DocId, FastFieldSelector) ' this
line is taking long time*
For Each f As Field In Fields
...
Dim values() As Fieldable = doc.GetFieldables(f.Name)
....
Next
Next

Thanks,
Andrew

Search Discussions

  • Noel Lysaght at Jul 4, 2011 at 7:35 pm
    HI Andrew, we're using Lucene.NET version 2.9.2.

    We found Lucene excellent at indexing and searching, but found that when
    using it like a database that retrieval of data that was stored while
    indexing was very slow. So to optimise we stored the minimal amount of data
    possible, by storing identifiers for data we wanted to retrieve, and then
    used those id's to return back the data by calling a database lookup to get
    all the data in a single trip to the DB server.

    That worked out very well for us, with both searching and data retrieval/
    gathering taking about the same amount of time. It's a SQL2005 database that
    we get the data from.

    I don't know if that would work in your situation you may be dependant on
    Lucene for both storage and searching.

    Kind Regards
    Noel


    -----Original Message-----
    From: Andrew W.
    Sent: Monday, July 04, 2011 5:07 PM
    To: lucene-net-user@lucene.apache.org
    Subject: [Lucene.Net] Lucene Document Retrieval

    Hi,

    I have a Lucene index with 70,000 documents. The size of the index is round
    300MB - I have 32 fields, but I only retrieve values for 11 fields when I
    display results on the page (I use FieldSelector). The query time is great -
    most of the queries execute under 50ms, however when I loop through results
    to retrieve documents (I only retrieve 10 documents at once, because this is
    my pagination size), the loop takes sometimes even longer than 300ms

    I applied all recommendations mentioned on that page
    http://wiki.apache.org/lucene-java/ImproveSearchingSpeed to optimize query
    time.

    I was really surprised when I noticed that the bottleneck is in the loop
    below, especially becuase the index is not that big (only 300MB) and the
    hits array contains only 10 elements

    Is there anything else I can do to optimize document retrieval from lucene
    index?

    Here is the sample code:

    Dim FastFieldSelector As New FastFieldSelector(Fields)
    For i As Integer = 0 To hits.Length - 1
    Dim DocId As Integer = hits(i).DocId
    * Dim doc As Document = reader.Document(DocId, FastFieldSelector) ' this
    line is taking long time*
    For Each f As Field In Fields
    ...
    Dim values() As Fieldable = doc.GetFieldables(f.Name)
    ....
    Next
    Next

    Thanks,
    Andrew
  • Moray McConnachie at Jul 5, 2011 at 8:14 am
    Interesting. Our search and filtering approach involves calling the document function on every match to a search result, so frequently 5-10k results on a search run against a store of 175k documents. This is performant - I definitely don't notice document being a slow function

    However, we deliberately store only the contents of key metadata fields (typically 1k-3k per document), nor do we store term positioning data. Does the size of each document make a difference?

    I certainly second the approach which stores most of the document outside Lucene, but if you need it during filtering or the result-generation process (as in highlighting, for example) I suspect it may be highly case-dependent which approach will perform better.

    Our production system version of Lucene is considerably older than this one. We're running the newer version in dev, and haven't noticed any issues, but then we haven't run load testing yet.

    Is there performance data available for different functions in the different versions?

    M.
    -------------------------------------
    Moray McConnachie
    Director of IT +44 1865 261 600
    Oxford Analytica http://www.oxan.com


    ----- Original Message -----
    From: Noel Lysaght
    Sent: Monday, July 04, 2011 08:34 PM
    To: lucene-net-user@lucene.apache.org <lucene-net-user@lucene.apache.org>
    Subject: Re: [Lucene.Net] Lucene Document Retrieval

    HI Andrew, we're using Lucene.NET version 2.9.2.

    We found Lucene excellent at indexing and searching, but found that when
    using it like a database that retrieval of data that was stored while
    indexing was very slow. So to optimise we stored the minimal amount of data
    possible, by storing identifiers for data we wanted to retrieve, and then
    used those id's to return back the data by calling a database lookup to get
    all the data in a single trip to the DB server.

    That worked out very well for us, with both searching and data retrieval/
    gathering taking about the same amount of time. It's a SQL2005 database that
    we get the data from.

    I don't know if that would work in your situation you may be dependant on
    Lucene for both storage and searching.

    Kind Regards
    Noel


    -----Original Message-----
    From: Andrew W.
    Sent: Monday, July 04, 2011 5:07 PM
    To: lucene-net-user@lucene.apache.org
    Subject: [Lucene.Net] Lucene Document Retrieval

    Hi,

    I have a Lucene index with 70,000 documents. The size of the index is round
    300MB - I have 32 fields, but I only retrieve values for 11 fields when I
    display results on the page (I use FieldSelector). The query time is great -
    most of the queries execute under 50ms, however when I loop through results
    to retrieve documents (I only retrieve 10 documents at once, because this is
    my pagination size), the loop takes sometimes even longer than 300ms

    I applied all recommendations mentioned on that page
    http://wiki.apache.org/lucene-java/ImproveSearchingSpeed to optimize query
    time.

    I was really surprised when I noticed that the bottleneck is in the loop
    below, especially becuase the index is not that big (only 300MB) and the
    hits array contains only 10 elements

    Is there anything else I can do to optimize document retrieval from lucene
    index?

    Here is the sample code:

    Dim FastFieldSelector As New FastFieldSelector(Fields)
    For i As Integer = 0 To hits.Length - 1
    Dim DocId As Integer = hits(i).DocId
    * Dim doc As Document = reader.Document(DocId, FastFieldSelector) ' this
    line is taking long time*
    For Each f As Field In Fields
    ...
    Dim values() As Fieldable = doc.GetFieldables(f.Name)
    ....
    Next
    Next

    Thanks,
    Andrew

    ---------------------------------------------------------
    Disclaimer

    This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

    Oxford Analytica Ltd
    Registered in England: No. 1196703
    5 Alfred Street, Oxford
    United Kingdom, OX1 4EH
    ---------------------------------------------------------
  • Miller, Bill (QuickWire) at Jul 6, 2011 at 12:55 pm
    Well, here's my 2 cents of performance comments...

    My test index runs around 70k docs, 50 fields per doc, text size per doc maybe 4k on avg and no term vectors.
    Simple queries run about 50 msecs as well, but I can process 1000 results (pulling about 1k data (max) for each doc) in 125 msecs or less.

    However, all my stored field data I store in one non-indexed field and retrieve in one shot.
    Unfortunately I never tested with multiple stored fields but I'm guessing there may be a big savings there.
    My old indexing engine (AltaVista) saved data in one 'blob' per doc as well and I practically plopped Lucene in its place.
    (just moved to 2.9.4.2)

    Bill Miller, QuickWire Labs
    www.quickwire.com


    -----Original Message-----
    From: Moray McConnachie
    Sent: Tuesday, July 05, 2011 4:14 AM
    To: lucene-net-user@lucene.apache.org
    Subject: Re: [Lucene.Net] Lucene Document Retrieval

    Interesting. Our search and filtering approach involves calling the document function on every match to a search result, so frequently 5-10k results on a search run against a store of 175k documents. This is performant - I definitely don't notice document being a slow function

    However, we deliberately store only the contents of key metadata fields (typically 1k-3k per document), nor do we store term positioning data. Does the size of each document make a difference?

    I certainly second the approach which stores most of the document outside Lucene, but if you need it during filtering or the result-generation process (as in highlighting, for example) I suspect it may be highly case-dependent which approach will perform better.

    Our production system version of Lucene is considerably older than this one. We're running the newer version in dev, and haven't noticed any issues, but then we haven't run load testing yet.

    Is there performance data available for different functions in the different versions?

    M.
    -------------------------------------
    Moray McConnachie
    Director of IT +44 1865 261 600
    Oxford Analytica http://www.oxan.com


    ----- Original Message -----
    From: Noel Lysaght
    Sent: Monday, July 04, 2011 08:34 PM
    To: lucene-net-user@lucene.apache.org <lucene-net-user@lucene.apache.org>
    Subject: Re: [Lucene.Net] Lucene Document Retrieval

    HI Andrew, we're using Lucene.NET version 2.9.2.

    We found Lucene excellent at indexing and searching, but found that when using it like a database that retrieval of data that was stored while indexing was very slow. So to optimise we stored the minimal amount of data possible, by storing identifiers for data we wanted to retrieve, and then used those id's to return back the data by calling a database lookup to get all the data in a single trip to the DB server.

    That worked out very well for us, with both searching and data retrieval/ gathering taking about the same amount of time. It's a SQL2005 database that we get the data from.

    I don't know if that would work in your situation you may be dependant on Lucene for both storage and searching.

    Kind Regards
    Noel


    -----Original Message-----
    From: Andrew W.
    Sent: Monday, July 04, 2011 5:07 PM
    To: lucene-net-user@lucene.apache.org
    Subject: [Lucene.Net] Lucene Document Retrieval

    Hi,

    I have a Lucene index with 70,000 documents. The size of the index is round 300MB - I have 32 fields, but I only retrieve values for 11 fields when I display results on the page (I use FieldSelector). The query time is great - most of the queries execute under 50ms, however when I loop through results to retrieve documents (I only retrieve 10 documents at once, because this is my pagination size), the loop takes sometimes even longer than 300ms

    I applied all recommendations mentioned on that page http://wiki.apache.org/lucene-java/ImproveSearchingSpeed to optimize query time.

    I was really surprised when I noticed that the bottleneck is in the loop below, especially becuase the index is not that big (only 300MB) and the hits array contains only 10 elements

    Is there anything else I can do to optimize document retrieval from lucene index?

    Here is the sample code:

    Dim FastFieldSelector As New FastFieldSelector(Fields) For i As Integer = 0 To hits.Length - 1
    Dim DocId As Integer = hits(i).DocId
    * Dim doc As Document = reader.Document(DocId, FastFieldSelector) ' this
    line is taking long time*
    For Each f As Field In Fields
    ...
    Dim values() As Fieldable = doc.GetFieldables(f.Name)
    ....
    Next
    Next

    Thanks,
    Andrew

    ---------------------------------------------------------
    Disclaimer

    This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

    Oxford Analytica Ltd
    Registered in England: No. 1196703
    5 Alfred Street, Oxford
    United Kingdom, OX1 4EH
    ---------------------------------------------------------

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouplucene-net-user @
categorieslucene
postedJul 4, '11 at 4:25p
activeJul 6, '11 at 12:55p
posts4
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase