FAQ
Hi all.

I am using lucene-1.3-final and have performance problems with fuzzy
queries.

If I understand right to perform fuzzy query lucene enumerate all terms
in the index and construct BooleanQuery which consists of simple
TermQueries.

The main problem is that this process is performing several times during
search significantly decreasing performance.

Let me explain.

For example my search returns 1000 documents. In my application I need
to get all this documents from index for later processing, but lucene
rereads every 100 documents all terms in the index, because by default
we get only 100 documents from index (see Hits class) and when I access
101st document the search process is performed again and absolutely
unnecessary operation of creating FilteredTermEnum is performed.

Unfortunately I can't say to Hits class to get all my 1000 document
initially, because value 100 (actually 50 in the code) is hard coded. So
I think that this value should be configurable in Searcher.

Actually I have performed some research and found that if I get all my
1000 documents at the first stage the speed of my fuzzy query increased
by 4 times!

Also please could you give me some advices on increasing performance of
fuzzy search.

One approach that I already use in our application is custom fuzzy query
that compares word prefix (f.e. the first 3 symbols) of terms and only
if they equals than it tries to compare the rest of terms using the same
algorithm that is used in FuzzyQuery.

Best regards,
Konstantin

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Search Discussions

  • Erik Hatcher at Feb 25, 2004 at 5:09 pm

    On Feb 25, 2004, at 11:20 AM, Konstantin Shaposhnikov wrote:
    For example my search returns 1000 documents. In my application I need
    to get all this documents from index for later processing
    This last sentence is key. If you need all documents from a search,
    then you should be using a HitCollector instead. Look at the variants
    of the search methods and use the HitCollector one to get all documents
    in one shot.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
  • Doug Cutting at Feb 25, 2004 at 6:12 pm

    Konstantin Shaposhnikov wrote:
    For example my search returns 1000 documents. In my application I need
    to get all this documents from index for later processing, but lucene
    rereads every 100 documents all terms in the index, because by default
    we get only 100 documents from index (see Hits class) and when I access
    101st document the search process is performed again and absolutely
    unnecessary operation of creating FilteredTermEnum is performed.
    If you need all of the hits, use HitCollector instead of Hits.

    Doug

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
  • Konstantin Shaposhnikov at Feb 25, 2004 at 11:31 pm

    On 10:12 Wed 25 Feb , Doug Cutting wrote:
    Konstantin Shaposhnikov wrote:
    For example my search returns 1000 documents. In my application I need
    to get all this documents from index for later processing, but lucene
    rereads every 100 documents all terms in the index, because by default
    we get only 100 documents from index (see Hits class) and when I access
    101st document the search process is performed again and absolutely
    unnecessary operation of creating FilteredTermEnum is performed.
    If you need all of the hits, use HitCollector instead of Hits.
    Thank you for reply. I really need to read lucene
    documentation once again :)
    Doug

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedFeb 25, '04 at 4:20p
activeFeb 25, '04 at 11:31p
posts4
users3
websitelucene.apache.org

People

Translate

site design / logo © 2021 Grokbase