FAQ
We are using lucene for one our projects here and has been working very well
for last 2 years.
The new requirement is to use it for autocomplete. Here , queries like a* or
ab* pose a problem.
I have set BooleanQuery.setMaxClauseCount( Integer.MAX_VALUE ) to get around
the TooManyClausesException.
The issue now is performance is not acceptable. It takes about 3 secs for a*
query to return results.
I have 250,000 documents , each document is 5 - 15 words in the indexed
field and am using StandardAnalyzer. I have tried using a filter,
since in this case, I am only interested in documents with a boost higher
than a certain number. I had
the boost value as a separate lucene indexed field so I can filter on it.
I realized that the filtering is only applied after the boolean query is
prepared and scored, so there is no performance benefit with using that
approach.
I cannot use a ConstantScoreQuery as I need the top n matches for the query.
Any suggestions on how I can get around this issue will be highly
appreciated.

Search Discussions

  • Simon Willnauer at Nov 13, 2009 at 2:12 pm
    Anjana, maybe I don't understand you question correctly but what you
    want to do is a spell suggestion kind of thing on terms in the index,
    right? You try to use prefix query to display those terms as an
    auto-completion?! So I assume that what you do is run a query and
    then get the possible terms from the stored values?!

    If I understand you correctly, wouldn't it be easier to just iterate
    the first n terms starting with your prefix? That should be quite fast
    and easy to implement if that would fit your requirements.

    simon
    On Fri, Nov 13, 2009 at 2:50 PM, Anjana Sarkar wrote:
    We are using lucene for one our projects here and has been working very well
    for last 2 years.
    The new requirement is to use it for autocomplete. Here , queries like a* or
    ab* pose a problem.
    I have set BooleanQuery.setMaxClauseCount( Integer.MAX_VALUE ) to get around
    the TooManyClausesException.
    The issue now is performance is not acceptable. It takes about 3 secs for a*
    query to return results.
    I have 250,000 documents , each document is 5 - 15 words in the indexed
    field and am using StandardAnalyzer. I have tried using a filter,
    since in this case, I am only interested in documents with a boost higher
    than a certain number. I had
    the boost value as a separate lucene indexed field so I can filter on it.
    I realized that the filtering is only applied after the boolean query is
    prepared and scored, so there is no performance benefit with using that
    approach.
    I cannot use a ConstantScoreQuery as I need the top n matches for the query.
    Any suggestions on how I can get around this issue will be highly
    appreciated.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]
  • Anjana Sarkar at Nov 13, 2009 at 2:51 pm
    Hi Simon,

    Thank you very much for your reply.

    Maybe an example will help clarify my use case-

    Say I have the following two indexed columns with this data

    *data* *boostfield*
    african ant 10
    alligator 50
    anthem 20
    antelope 30
    another 5

    And the query is "an*" and I am interested in top 3 results.

    I would like "antelope", "anthem" and "african ant" to be returned in that
    order.

    In this case , I am trying to do something like this in lucene

    select * from data where data like "an*" and boost >= 10
    I would like the boost field filtering to happen before looking for data
    like "an*", so I am left with much fewer terms to iterate over.


    --Anjana
    On Fri, Nov 13, 2009 at 9:12 AM, Simon Willnauer wrote:

    Anjana, maybe I don't understand you question correctly but what you
    want to do is a spell suggestion kind of thing on terms in the index,
    right? You try to use prefix query to display those terms as an
    auto-completion?! So I assume that what you do is run a query and
    then get the possible terms from the stored values?!

    If I understand you correctly, wouldn't it be easier to just iterate
    the first n terms starting with your prefix? That should be quite fast
    and easy to implement if that would fit your requirements.

    simon
    On Fri, Nov 13, 2009 at 2:50 PM, Anjana Sarkar wrote:
    We are using lucene for one our projects here and has been working very well
    for last 2 years.
    The new requirement is to use it for autocomplete. Here , queries like a* or
    ab* pose a problem.
    I have set BooleanQuery.setMaxClauseCount( Integer.MAX_VALUE ) to get around
    the TooManyClausesException.
    The issue now is performance is not acceptable. It takes about 3 secs for a*
    query to return results.
    I have 250,000 documents , each document is 5 - 15 words in the indexed
    field and am using StandardAnalyzer. I have tried using a filter,
    since in this case, I am only interested in documents with a boost higher
    than a certain number. I had
    the boost value as a separate lucene indexed field so I can filter on it.
    I realized that the filtering is only applied after the boolean query is
    prepared and scored, so there is no performance benefit with using that
    approach.
    I cannot use a ConstantScoreQuery as I need the top n matches for the query.
    Any suggestions on how I can get around this issue will be highly
    appreciated.
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

    --
    Anjana Sarkar
    Address - 9 Sally Court, Bridgewater, NJ-08807
    732-979-5219(mobile)
  • Otis Gospodnetic at Nov 14, 2009 at 2:53 am
    Hello,

    Also keep in mind prefix queries are not the cheapest.
    Plug:
    We've seen people use this successfully: http://www.sematext.com/products/autocomplete/index.html
    I believe somebody is trying this out with a set of 1B suggestions. The demo at http://www.sematext.com/demo/ac/index.html searches 6M Wikipedia titles with a a *tiny* JVM heap.

    Otis



    ----- Original Message ----
    From: Anjana Sarkar <[email protected]>
    To: [email protected]
    Sent: Fri, November 13, 2009 8:50:38 AM
    Subject: Prefix Query for autocomplete - TooManyClauses

    We are using lucene for one our projects here and has been working very well
    for last 2 years.
    The new requirement is to use it for autocomplete. Here , queries like a* or
    ab* pose a problem.
    I have set BooleanQuery.setMaxClauseCount( Integer.MAX_VALUE ) to get around
    the TooManyClausesException.
    The issue now is performance is not acceptable. It takes about 3 secs for a*
    query to return results.
    I have 250,000 documents , each document is 5 - 15 words in the indexed
    field and am using StandardAnalyzer. I have tried using a filter,
    since in this case, I am only interested in documents with a boost higher
    than a certain number. I had
    the boost value as a separate lucene indexed field so I can filter on it.
    I realized that the filtering is only applied after the boolean query is
    prepared and scored, so there is no performance benefit with using that
    approach.
    I cannot use a ConstantScoreQuery as I need the top n matches for the query.
    Any suggestions on how I can get around this issue will be highly
    appreciated.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 13, '09 at 1:51p
activeNov 14, '09 at 2:53a
posts4
users3
websitelucene.apache.org

People

Translate

site design / logo © 2023 Grokbase