FAQ
Hi,

We have an index of about 9 gigabytes here at work, where a few queries take
a very long time to succeed.

What I noticed is, that we have a large of number of multiple value fields
(50). How does lucene scale with queries going over a large amount of
fields?
Is it better to use a keyword for each possible value in each field and
append those to the remaining field (like if I have a protocol field and one
field having the url, I would add _HTTP_ or _FTP_ to the url field and omit
the protocol field).
A better solution would probably to divide the index in multiple indexes to
of disjunct entries (in this case, divide the index in an index with all the
http urls, and one index with all the ftp urls, because I know that I never
need both protocol types together)

Another thing I noticed is that we append a lot of queries, so we have a lot
of duplicate phrases like (A and B or C) and ... and (A and B or C) (more
nested than that). Is lucene doing any internal query optimization (like
karnaugh maps) by removing the last (A and B or C), as it is not needed, or
do I have to do that myself?


Thanks for your help.
Thibaut

--
View this message in context: http://www.nabble.com/Lucene-Performance-tp14952958p14952958.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Paul Elschot at Jan 19, 2008 at 6:10 pm

    On Friday 18 January 2008 17:52:27 Thibaut Britz wrote:

    Hi, ...
    Another thing I noticed is that we append a lot of queries, so we have a lot
    of duplicate phrases like (A and B or C) and ... and (A and B or C) (more
    nested than that). Is lucene doing any internal query optimization (like
    karnaugh maps) by removing the last (A and B or C), as it is not needed, or
    do I have to do that myself?
    Query optimization like karnaugh maps is not available in Lucene.
    For each level of 'and' and 'or' in the (rewritten) query, as well as for all terms
    in the query, a separate scorer will be used during query search.

    The query rewrite could in principle do this, but it might affect the score values.

    Regards,
    Paul Elschot

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Thibaut Britz at Jan 28, 2008 at 10:13 am
    Thanks for your answer,
    I will look into this in more detail.



    Paul Elschot wrote:
    On Friday 18 January 2008 17:52:27 Thibaut Britz wrote:

    Hi, ...
    Another thing I noticed is that we append a lot of queries, so we have a
    lot
    of duplicate phrases like (A and B or C) and ... and (A and B or C) (more
    nested than that). Is lucene doing any internal query optimization (like
    karnaugh maps) by removing the last (A and B or C), as it is not needed,
    or
    do I have to do that myself?
    Query optimization like karnaugh maps is not available in Lucene.
    For each level of 'and' and 'or' in the (rewritten) query, as well as for
    all terms
    in the query, a separate scorer will be used during query search.

    The query rewrite could in principle do this, but it might affect the
    score values.

    Regards,
    Paul Elschot

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    View this message in context: http://www.nabble.com/Lucene-Performance-tp14952958p15132031.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 18, '08 at 4:53p
activeJan 28, '08 at 10:13a
posts3
users2
websitelucene.apache.org

2 users in discussion

Thibaut Britz: 2 posts Paul Elschot: 1 post

People

Translate

site design / logo © 2022 Grokbase