FAQ
Hello,
Allen Atamer

The Javadoc spec calls for one or more clauses in a query,
but I had trouble with a NOT query just on its own. For example
Most search engines including Lucene doesn't support this query type.
That's why the query parser treats this queries as invalid (if
it recognizes them).
QueryParser.parse("my_field:-exclude") throws a parsing exception

Same with

QueryParser.parse("my_field:-(exclude)")
QueryParser.parse("my_field:(* AND -exclude")
These queries are defined as invalid.
The query QueryParser.parse("my_field:(-(exclude))") gives a
legitimate query that brings no results.
This is the result returned by many search engines. They select a document
set by searching for the query terms first and get the documents which
contains
them from index. Because your query doesn't contains any term which should
be part of a document, your result is empty. Note that they check the
excluded words only for documents selected during the first step to save
time and memory.
What I would expect is the following: If I have an index with
100 total entries, and 20 records with the word "exclude" in
them, then the above queries should give 80 hits. There is no
test case for this scenario in TestQueryParser. Please
confirm whether this is a bug or not,
This is no error. You might think, that it should be easy to
get all documents and discard all of them which contains the
exluded word.
But imaging your index contains about 1 million documents, the search
requires a very long time and the result will be very huge (and so
mostly useless). Many Engines avoid this trouble and return an empty
result or report an error.

You can workaround this limitation by adding a dummy field and
term to each document while creating your index, e.g.
document.add(Field.Text("dummy_field", "dummy_value"));
You have to add "dummy_field:dummy_value" to your query string, e.g.:
"dummy_field:dummy_value my_field:-(exclude)" will returns 80 hits
in your scenario (after rebuild your index).
Warning: the performance will be very poor on larger indexes and
users can run "denial of service" attacks by sending such queries.

Note that a single "*" is not allowed too, because performance would be
very poor if expression starts with a wildcard. In this case Lucene
would run of memory too, because the internal result contains all
words/terms stored in your index.
Regards,
Wolf-Dietrich Materna



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

  • Sergiu Gordea at Jul 6, 2004 at 8:08 am
    Hi all,

    I have a question,

    I have an index with more fileds and I have to create conjunctive
    queries by default.
    So what I'm trying to say is that we develop a project and we provide
    search functionality
    basing on lucene indexer.

    From what I can see, Multifield query parser creates disjunctive queries:
    if I search for "best test" in fields {title, description} the

    MultiFieldQueryParser.parse(string, fields, analizer)

    will create a query that will mean "fields contain 'best' OR fileds
    contain 'test'" [1]
    by I want to create "fields contain 'best' AND fileds contain 'test'"[2]
    I know, I can place a + before each of this terms, but we also want to
    let the users to create
    custom queries using logical operators and + -, grouping and exact phrases.

    So in this situation we have to parse the query string twice wit the
    only change that we will ad AND operator to
    link the TERMS in the places were no operator is found.

    This seems to me to be just overhead, and I think that tha best way
    would be to overload parse function to

    MultiFieldQueryParser.parse(String queryString, String[] fields,
    Analizer analizer, String/int defaultoperator)[3]
    were default operator can be "AND" or "OR"
    so that I can choose if I want to create query [1] or query [2].

    Do we have an alternative solution, reasonably simple for this problem?
    What do you think about my suggestion of implementing the [3] method .


    Thanks for understanding,

    Sergiu



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Sergiu Gordea at Jul 6, 2004 at 9:18 am

    Daniel Naber wrote:
    On Tuesday 06 July 2004 10:09, Sergiu Gordea wrote:


    Do we have an alternative solution, reasonably simple for this problem?
    No, but are you sure that MultifieldQueryParser does the right thing at all?
    If someone searches for +a +b the parser will (currently) build something
    like this (assuming the fields you want to search arw title and body):

    (+title:a +title:b) (+body:a +body:b)

    ... unfortunately you are right, but it is a pitty that it not working
    as we want to,
    it seemed to be so useful. It seems that this is the backside of the
    "reverse indexing" medal...

    Has anyone some Ideeas about how can we avoid this situation?

    Except ... indexing everithing in one column?

    Or ... do we have some other classes as an alternative to

    MultifieldQueryParser?

    Thanks,

    Sergiu
    This is usually not what I'd expect: the user wants both terms to occur, no
    matter in which field ('a' might be in title, 'b' might be in body). So I
    think MultifieldQueryParser is just broken for many use cases.

    Regards
    Daniel



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Daniel Naber at Jul 6, 2004 at 9:52 am

    On Tuesday 06 July 2004 11:19, Sergiu Gordea wrote:

    (+title:a +title:b) (+body:a +body:b)
    ... unfortunately you are right, but it is a pitty that it not working
    as we want to,
    it seemed to be so useful. It seems that this is the backside of the
    "reverse indexing" medal...

    Has anyone some Ideeas about how can we avoid this situation?
    You'll have to do a real query rewriting, as Nutch does. So "+a +b" will
    become +(title:a body:a) + (title:b body:b)

    Unfortunately you cannot just copy+paste code from Nutch as they added a layer
    on top of the query stuff (or something like that).

    Regards
    Daniel


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Kelvin Tan at Jul 7, 2004 at 2:23 pm
    Hi Sergiu,

    First of all, if your application is web-based, its not necessary to
    programmatically construct the query based on user-input (via
    MultiFieldQueryParser). you can use luceneQueryConstructor.js in Lucene sandbox.
    You can find the documentation here:
    http://cvs.apache.org/viewcvs.cgi/*checkout*/jakarta-lucene-sandbox/contribution
    s/javascript/queryConstructor/luceneQueryConstructor.html

    Secondly, if still necessary to programmatically construct the query, perhaps
    you can consider creating an int[] of MultiFieldQueryParser.REQUIRED_FIELD and
    using
    public static Query parse(String query, String[] fields, int[] flags,
    Analyzer analyzer)
    instead?

    Kelvin

    On Tue, 06 Jul 2004 10:09:00 +0200, Sergiu Gordea said:
    Hi all,
    I have a question,
    I have an index with more fileds and I have to create conjunctive
    queries by default.
    So what I'm trying to say is that we develop a project and we provide
    search functionality
    basing on lucene indexer.
    From what I can see, Multifield query parser creates disjunctive queries:
    if I search for "best test" in fields {title, description} the
    MultiFieldQueryParser.parse(string, fields, analizer)
    will create a query that will mean "fields contain 'best' OR fileds
    contain 'test'" [1]
    by I want to create "fields contain 'best' AND fileds contain 'test'"[2]
    I know, I can place a + before each of this terms, but we also want to
    let the users to create
    custom queries using logical operators and + -, grouping and exact phrases.
    So in this situation we have to parse the query string twice wit the
    only change that we will ad AND operator to
    link the TERMS in the places were no operator is found.
    This seems to me to be just overhead, and I think that tha best way
    would be to overload parse function to
    MultiFieldQueryParser.parse(String queryString, String[] fields,
    Analizer analizer, String/int defaultoperator)[3]
    were default operator can be "AND" or "OR"
    so that I can choose if I want to create query [1] or query [2].
    Do we have an alternative solution, reasonably simple for this problem?
    What do you think about my suggestion of implementing the [3] method .

    Thanks for understanding,
    Sergiu

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
    dow


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 21, '04 at 1:11p
activeJul 7, '04 at 2:23p
posts5
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase