FAQ
Hi!

I asked this one already on the user mailing list but maybe it's more
appropriate here:

As a simple example imagine every document in your index to have a
field "language" and "country". A tuple of language+country is what I call a
context.

You want to search context-specific, i.e. language+country is always part of
the query (QueryFilter).

FuzzyTermEnum doesn't know about these contexts hence building a BooleanQuery
of all similar terms. E.g. "hello" means "hallo" in german - only one
character difference. But when searching in context english+USA I don't care
about german terms. So I don't want/need "hallo" in the BooleanQuery in this
case.

So I came up with the idea to use reader.termDocs() instead of terms() in
FuzzyTermEnum. By means of a QueryFilter (it's BitSet respectively) for each
context I could determine whether a fuzzy term makes sense to be included in
the BooleanQuery or not.

This results (potentially) in a smaller BooleanQuery but I wonder whether this
approach will gain any mentionable performance advantage (maybe reduce IO?).

Thanks for feedback
Timo

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

  • Timo Nentwig at Nov 7, 2007 at 4:40 pm

    On Wednesday 07 November 2007 10:51:32 Timo Nentwig wrote:
    Hi!

    I asked this one already on the user mailing list but maybe it's more
    appropriate here:

    As a simple example imagine every document in your index to have a
    field "language" and "country". A tuple of language+country is what I call
    a context.

    You want to search context-specific, i.e. language+country is always part
    of the query (QueryFilter).

    FuzzyTermEnum doesn't know about these contexts hence building a
    BooleanQuery of all similar terms. E.g. "hello" means "hallo" in german -
    only one character difference. But when searching in context english+USA I
    don't care about german terms. So I don't want/need "hallo" in the
    BooleanQuery in this case.

    So I came up with the idea to use reader.termDocs() instead of terms() in
    FuzzyTermEnum. By means of a QueryFilter (it's BitSet respectively) for
    Well...I didn't read to carefully, termDocs(Term) "returns an enumeration of
    all the documents which contain term". So for each terms() term I had to
    termDocs(). This will probably tear down performance more than this
    optimization will gain :-\
    each context I could determine whether a fuzzy term makes sense to be
    included in the BooleanQuery or not.

    This results (potentially) in a smaller BooleanQuery but I wonder whether
    this approach will gain any mentionable performance advantage (maybe reduce
    IO?).

    Thanks for feedback
    Timo

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-dev-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedNov 7, '07 at 9:57a
activeNov 7, '07 at 4:40p
posts2
users1
websitelucene.apache.org

1 user in discussion

Timo Nentwig: 2 posts

People

Translate

site design / logo © 2021 Grokbase