FAQ
Hi,

I'm having problems understanding query parsers handling of AND and OR
if there's more than one operator.

E.g.
a OR b AND c
gives the same number of hits as
b AND c
(only scores are different)

and
a AND b OR c AND d
seems to be equivalent to
a AND b AND C AND d

which doesn't seem logical to me.

I'd expect to have AND higher precedence than OR (as a logical AND / OR in
C or Java) so that a OR b AND c would be equivalent to a OR (b AND c)
and a AND b OR c AND d equivalent to (a AND b) OR (c AND d)


When I look at the query parsers sources, I find, that -- unless paranthesis
are used -- all these terms are added to one boolean query, and the
AND operator makes the term left and right of it required (unless there
are NOT operators making them prohibited).
So
a OR b AND c gives one boolean query where b and c are required, whereas
a is not.
a AND b OR c AND d produces a boolean query where a, b, c and d are required,
which is indeed the same as a AND b AND c AND d.


Should this be considered a bug?

greetings
Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Search Discussions

  • Dror Matalon at Dec 9, 2003 at 6:49 pm

    On Tue, Dec 09, 2003 at 10:57:51AM +0100, Morus Walter wrote:
    Hi,

    I'm having problems understanding query parsers handling of AND and OR
    if there's more than one operator.

    E.g.
    a OR b AND c
    gives the same number of hits as
    b AND c
    (only scores are different)
    This would make sense if all the document that have a also have both B
    and C in them.
    and
    a AND b OR c AND d
    seems to be equivalent to
    a AND b AND C AND d
    That's not what I get.
    http://www.fastbuzz.com/search/results.jsp?query=dean+AND+kerry+AND+clark+AND+gephardt&days=
    returns 479 items
    but
    http://www.fastbuzz.com/search/results.jsp?query=dean+AND+kerry+OR+clark+AND+gephardt&days=
    returns 564 items which indicates that the OR does make a difference.
    As expcted, you end up getting more items with the OR.

    Regards,

    Dror
    which doesn't seem logical to me.

    I'd expect to have AND higher precedence than OR (as a logical AND / OR in
    C or Java) so that a OR b AND c would be equivalent to a OR (b AND c)
    and a AND b OR c AND d equivalent to (a AND b) OR (c AND d)


    When I look at the query parsers sources, I find, that -- unless paranthesis
    are used -- all these terms are added to one boolean query, and the
    AND operator makes the term left and right of it required (unless there
    are NOT operators making them prohibited).
    So
    a OR b AND c gives one boolean query where b and c are required, whereas
    a is not.
    a AND b OR c AND d produces a boolean query where a, b, c and d are required,
    which is indeed the same as a AND b AND c AND d.


    Should this be considered a bug?

    greetings
    Morus

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
    --
    Dror Matalon
    Zapatec Inc
    1700 MLK Way
    Berkeley, CA 94709
    http://www.fastbuzz.com
    http://www.zapatec.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Morus Walter at Dec 10, 2003 at 9:01 am
    Hi Dror,

    thanks for your answer.
    I'm having problems understanding query parsers handling of AND and OR
    if there's more than one operator.

    E.g.
    a OR b AND c
    gives the same number of hits as
    b AND c
    (only scores are different)
    This would make sense if all the document that have a also have both B
    and C in them.
    Then the query should be equivalent to (a OR b) AND c.
    But it isn't. For specific a, b and c I get 766 hits for a OR b AND c
    and 1086 for (a OR b) AND c.
    and
    a AND b OR c AND d
    seems to be equivalent to
    a AND b AND C AND d
    That's not what I get.
    http://www.fastbuzz.com/search/results.jsp?query=dean+AND+kerry+AND+clark+AND+gephardt&days=
    returns 479 items
    but
    http://www.fastbuzz.com/search/results.jsp?query=dean+AND+kerry+OR+clark+AND+gephardt&days=
    returns 564 items which indicates that the OR does make a difference.
    As expcted, you end up getting more items with the OR.
    Hmm. I was sloppy not specifying the lucene version.
    My tests were on 1.2.
    But I reindex a part of my documents using 1.3rc3 and find the same.
    What version does fastbuzz use?

    I wrote s small test programm indexing all documents consisting of
    one or zero occurences of a, b, c and d (ignoring order, so without
    the empty document, that's just 15 docs) and performing some queries
    on it.
    Programm see below, this is what I get:

    a OR b AND c -> a +b +c
    4 documents found
    a b c
    a b c d
    b c
    b c d
    (a OR b) AND c -> +(a b) +c
    6 documents found
    a b c
    a b c d
    a c
    b c
    a c d
    b c d
    a OR (b AND c) -> a (+b +c)
    10 documents found
    a b c
    a b c d
    b c
    a
    b c d
    a b
    a c
    a d
    a b d
    a c d
    b AND c -> +b +c
    4 documents found
    b c
    a b c
    b c d
    a b c d
    a AND b OR c AND d -> +a +b +c +d
    1 documents found
    a b c d
    (a AND b) OR (c AND d) -> (+a +b) (+c +d)
    7 documents found
    a b c d
    a b
    c d
    a b c
    a b d
    a c d
    b c d
    a AND (b OR c) AND d -> +a +(b c) +d
    3 documents found
    a b c d
    a b d
    a c d
    ((a AND b) OR c) AND d -> +((+a +b) c) +d
    5 documents found
    a b c d
    a b d
    c d
    a c d
    b c d
    a AND (b OR (c AND d)) -> +a +(b (+c +d))
    5 documents found
    a b c d
    a c d
    a b
    a b c
    a b d
    a AND b AND c AND d -> +a +b +c +d
    1 documents found
    a b c d

    Using 1.3rc3, 1.3rc2 or 1.3rc1; I get the same results with a slightly
    different order for 1.2.

    So I still get the same for
    a OR b AND c and b AND c
    and
    a AND b OR c AND d and a AND b AND c AND d
    (note, that the result of the toString method of the query is equal in
    both cases)
    but different results for any operator grouping, I can think of.
    So to me, the question remains, what does AND and OR mean, if they are
    combined in one expression?
    I can understand all the query results where AND and OR queries are
    explicitly grouped by paranthesis, and the results are, what I expect.
    But the rules for combined AND and OR aren't what I would expect.

    greetings
    Morus

    PS: the test program:

    import org.apache.lucene.document.*;
    import org.apache.lucene.analysis.*;
    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.index.*;
    import org.apache.lucene.store.*;
    import org.apache.lucene.search.*;
    import org.apache.lucene.queryParser.QueryParser;

    class LuceneTest
    {
    static String[] docs = {
    "a", "b", "c", "d",
    "a b", "a c", "a d", "b c", "b d", "c d",
    "a b c", "a b d", "a c d", "b c d",
    "a b c d"
    };

    static String[] queries = {
    "a OR b AND c",
    "(a OR b) AND c",
    "a OR (b AND c)",
    "b AND c",
    "a AND b OR c AND d",
    "(a AND b) OR (c AND d)",
    "a AND (b OR c) AND d",
    "((a AND b) OR c) AND d",
    "a AND (b OR (c AND d))",
    "a AND b AND c AND d"
    };

    public static void main(String argv[]) throws Exception {
    Directory dir = new RAMDirectory();
    String[] stop = {};
    Analyzer analyzer = new StandardAnalyzer(stop);

    IndexWriter writer = new IndexWriter(dir, analyzer, true);

    for ( int i=0; i < docs.length; i++ ) {
    Document doc = new Document();
    doc.add(Field.Text("text", docs[i]));
    writer.addDocument(doc);
    }
    writer.close();

    Searcher searcher = new IndexSearcher(dir);
    for ( int i=0; i < queries.length; i++ ) {
    Query query = QueryParser.parse(queries[i], "text", analyzer);
    System.out.println(queries[i] + " -> " + query.toString("text"));
    Hits hits = searcher.search(query);
    System.out.println(" " + hits.length() + " documents found");
    for ( int j=0; j < hits.length(); j++ ) {
    Document doc = hits.doc(j);
    System.out.println("\t"+doc.get("text"));
    }
    }
    }
    }

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Jamie Stallwood at Dec 10, 2003 at 4:30 pm
    What Morus is saying is right, an expression without parenthesis, when
    interpreted, assumes terms on either side of an AND clause are compulsory
    terms, and any terms on either side of an OR clause are optional. However,
    if you combine AND and OR in an expression, the optional terms have no
    effect because the others are compulsory.

    What needs to be done is that the query parse should process any query
    string that has AND, and "put brackets" round it first. As it stands it is
    no use, as the OR does not work in the way you would think. AND should be
    given implicit priority.


    -----Original Message-----
    From: Morus Walter
    Sent: 10 December 2003 09:01
    To: Lucene Users List
    Subject: Re: Query Parser AND / OR

    Hi Dror,

    thanks for your answer.
    I'm having problems understanding query parsers handling of AND and OR
    if there's more than one operator.

    E.g.
    a OR b AND c
    gives the same number of hits as
    b AND c
    (only scores are different)
    This would make sense if all the document that have a also have both B
    and C in them.
    Then the query should be equivalent to (a OR b) AND c.
    But it isn't. For specific a, b and c I get 766 hits for a OR b AND c
    and 1086 for (a OR b) AND c.
    and
    a AND b OR c AND d
    seems to be equivalent to
    a AND b AND C AND d
    a OR b AND c -> a +b +c
    4 documents found
    a b c
    a b c d
    b c
    b c d
    (a OR b) AND c -> +(a b) +c
    6 documents found




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Morus Walter at Dec 28, 2003 at 4:24 pm

    Jamie Stallwood wrote:

    What Morus is saying is right, an expression without parenthesis, when
    interpreted, assumes terms on either side of an AND clause are compulsory
    terms, and any terms on either side of an OR clause are optional. However,
    if you combine AND and OR in an expression, the optional terms have no
    effect because the others are compulsory.

    What needs to be done is that the query parse should process any query
    string that has AND, and "put brackets" round it first. As it stands it is
    no use, as the OR does not work in the way you would think. AND should be
    given implicit priority.
    I had a closer look at this and wrote a patch, that implements this by
    changing the vector of boolean clauses into a vector of vectors of boolean
    clauses in the addClause method of the query parser. A new sub-vector is
    created whenever an explicit OR operator is used.

    Queries using explicit AND/OR are grouped by precedence of AND over OR.
    That is a OR b AND c gets a OR (b AND c).

    Queries using implicit AND/OR (depending on the default operator) are handled
    as before (so one can still use a +b -c to create one boolean query, where
    b is required, c forbidden and a optional).

    It's less clear how a query using both explizit AND/OR and implicit operators
    should be handled.
    Since the patch groups on explicit OR operators a query
    a OR b c is read as a (b c)
    whereas
    a AND b c as +a +b c
    (given that default operator or is used).

    There's one issue left:
    The old query parser reads a query
    `a OR NOT b' as `a -b' which is the same as `a AND NOT b'.
    The modified query parser reads this as `a (-b)'.
    While this looks better (at least to me), it does not produce the result
    of a OR NOT b. Instead the (-b) part seems to be silently dropped.
    While I understand that this query is illegal (just searching for one negative
    term) I don't think that silently dropping this part is an appropriate
    way to deal with that. But I don't think that's a query parser issue.
    The only question is, if the query parser should take care of that.

    I attached the patch (made against 1.3rc3 but working for 1.3final as well)
    and a test program.
    The test program parses a number of queries with default-or and default-and
    operator and reparses the result of the toString method of the created query.
    It outputs the initial query, the parsed query with default or, the reparesed
    query, the parsed query with the default and it's reparsed query.
    If called with a -q option, it also run's the queries against an index
    consisting of all documentes containing one or none a b c or d.
    Using an unpatched and a patched version of lucene in the classpath one
    can look at the effect of the patch in detail.

    I'm interested in your comments. Given that noone objects the patch, I'd enter
    a bug report, so it doesn't get lost.

    Morus
  • Morus Walter at Dec 28, 2003 at 4:46 pm

    Morus Walter writes:

    I attached the patch (made against 1.3rc3 but working for 1.3final as well)
    and a test program.
    Seems the attachments got stripped...

    So once again:

    The patch:

    ===File lucene/QueryParser.jj.patch===============
    *** QueryParser.jj.org Mon Dec 22 11:47:30 2003
    --- QueryParser.jj Mon Dec 22 13:20:57 2003
    ***************
    *** 233,255 ****

    protected void addClause(Vector clauses, int conj, int mods, Query q) {
    boolean required, prohibited;
    !
    ! // If this term is introduced by AND, make the preceding term required,
    // unless it's already prohibited
    if (conj == CONJ_AND) {
    ! BooleanClause c = (BooleanClause) clauses.elementAt(clauses.size()-1);
    ! if (!c.prohibited)
    ! c.required = true;
    ! }
    !
    ! if (operator == DEFAULT_OPERATOR_AND && conj == CONJ_OR) {
    ! // If this term is introduced by OR, make the preceding term optional,
    ! // unless it's prohibited (that means we leave -a OR b but +a OR b-->a OR b)
    ! // notice if the input is a OR b, first term is parsed as required; without
    ! // this modification a OR b would parsed as +a OR b
    ! BooleanClause c = (BooleanClause) clauses.elementAt(clauses.size()-1);
    ! if (!c.prohibited)
    ! c.required = false;
    }

    // We might have been passed a null query; the term might have been
    --- 233,249 ----

    protected void addClause(Vector clauses, int conj, int mods, Query q) {
    boolean required, prohibited;
    ! // System.out.println(conj+ " " + mods + " " + q.toString("text"));
    ! // If this term is introduced by AND, check if the previous term is the
    ! // first term in this or-group and make that term required,
    // unless it's already prohibited
    if (conj == CONJ_AND) {
    ! Vector clauses2 = (Vector)clauses.elementAt(clauses.size()-1);
    ! //if ( clauses2.size() == 1 ) {
    ! BooleanClause c = (BooleanClause) clauses2.elementAt(clauses2.size()-1);
    ! if (!c.prohibited)
    ! c.required = true;
    ! //}
    }

    // We might have been passed a null query; the term might have been
    ***************
    *** 257,277 ****
    if (q == null)
    return;

    if (operator == DEFAULT_OPERATOR_OR) {
    - // We set REQUIRED if we're introduced by AND or +; PROHIBITED if
    // introduced by NOT or -; make sure not to set both.
    prohibited = (mods == MOD_NOT);
    ! required = (mods == MOD_REQ);
    ! if (conj == CONJ_AND && !prohibited) {
    ! required = true;
    ! }
    ! } else {
    ! // We set PROHIBITED if we're introduced by NOT or -; We set REQUIRED
    ! // if not PROHIBITED and not introduced by OR
    prohibited = (mods == MOD_NOT);
    ! required = (!prohibited && conj != CONJ_OR);
    }
    ! clauses.addElement(new BooleanClause(q, required, prohibited));
    }

    /**
    --- 251,279 ----
    if (q == null)
    return;

    + // start new or-group if there's an explit or
    + if ( conj == CONJ_OR ) {
    + clauses.addElement(new Vector());
    + }
    +
    if (operator == DEFAULT_OPERATOR_OR) {
    // introduced by NOT or -; make sure not to set both.
    prohibited = (mods == MOD_NOT);
    ! // for explizit conjunctions: set required to true
    ! if ( conj == CONJ_AND ) {
    ! required = true;
    ! }
    ! else {
    ! // default OR -> required only when requested
    ! required = (mods == MOD_REQ);
    ! }
    ! } else { // operator == DEFAULT_OPERATOR_AND
    ! // We set PROHIBITED if we're introduced by NOT or -
    prohibited = (mods == MOD_NOT);
    ! // always REQUIRED unless PROHIBITED
    ! required = (!prohibited);
    }
    ! ((Vector)clauses.elementAt(clauses.size()-1)).addElement(new BooleanClause(q, required, prohibited));
    }

    /**
    ***************
    *** 359,369 ****
    */
    protected Query getBooleanQuery(Vector clauses) throws ParseException
    {
    ! BooleanQuery query = new BooleanQuery();
    ! for (int i = 0; i < clauses.size(); i++) {
    ! query.add((BooleanClause)clauses.elementAt(i));
    ! }
    ! return query;
    }

    /**
    --- 361,389 ----
    */
    protected Query getBooleanQuery(Vector clauses) throws ParseException
    {
    ! BooleanQuery query = new BooleanQuery();
    ! if ( clauses.size() == 1 ) {
    ! clauses = (Vector)clauses.elementAt(0);
    ! for (int i = 0; i < clauses.size(); i++) {
    ! query.add((BooleanClause)clauses.elementAt(i));
    ! }
    ! }
    ! else {
    ! for ( int i = 0; i < clauses.size(); i++ ) {
    ! Vector clauses2 = (Vector)clauses.elementAt(i);
    ! if ( clauses2.size() == 1 && ((BooleanClause)clauses2.elementAt(0)).prohibited == false ) {
    ! query.add(new BooleanClause(((BooleanClause)clauses2.elementAt(0)).query, false, false));
    ! }
    ! else if ( clauses2.size() >= 1 ) {
    ! BooleanQuery query2 = new BooleanQuery();
    ! for ( int j = 0; j < clauses2.size(); j++ ) {
    ! query2.add((BooleanClause)clauses2.elementAt(j));
    ! }
    ! query.add(new BooleanClause(query2, false, false));
    ! }
    ! }
    ! }
    ! return query;
    }

    /**
    ***************
    *** 551,556 ****
    --- 571,577 ----
    Query Query(String field) :
    {
    Vector clauses = new Vector();
    + clauses.addElement(new Vector());
    Query q, firstQuery=null;
    int conj, mods;
    }
    ***************
    *** 566,572 ****
    { addClause(clauses, conj, mods, q); }
    )*
    {
    ! if (clauses.size() == 1 && firstQuery != null)
    return firstQuery;
    else {
    return getBooleanQuery(clauses);
    --- 587,593 ----
    { addClause(clauses, conj, mods, q); }
    )*
    {
    ! if (clauses.size() == 1 && ((Vector)clauses.elementAt(0)).size() == 1 && firstQuery != null)
    return firstQuery;
    else {
    return getBooleanQuery(clauses);
    ============================================================

    and the test program:

    ===File lucene/LuceneTest.java===============
    import org.apache.lucene.document.*;
    import org.apache.lucene.analysis.*;
    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.index.*;
    import org.apache.lucene.store.*;
    import org.apache.lucene.search.*;
    import org.apache.lucene.queryParser.QueryParser;

    class LuceneTest
    {
    static String[] docs = {
    "a", "b", "c", "d",
    "a b", "a c", "a d", "b c", "b d", "c d",
    "a b c", "a b d", "a c d", "b c d",
    "a b c d"
    };

    static String[] queries = {
    "a OR b AND c",
    "(a OR b) AND c",
    "a OR (b AND c)",
    "a AND b",
    "a AND b OR c AND d",
    "(a AND b) OR (c AND d)",
    "a AND (b OR c) AND d",
    "((a AND b) OR c) AND d",
    "a AND (b OR (c AND d))",
    "a AND b AND c AND d",

    "a OR b AND NOT c",
    "(a OR b) AND NOT c",
    "a OR (b AND NOT c)",
    "a AND NOT d",
    "a AND NOT b OR c AND NOT d",
    "(a AND NOT b) OR (c AND NOT d)",
    "a AND NOT (b OR c) AND NOT d",
    "((a AND NOT b) OR c) AND NOT d",
    "a AND NOT (b OR (c AND NOT d))",
    "a AND NOT b AND NOT c AND NOT d",

    "a OR NOT b",
    "a OR NOT a",

    "a b",
    "a b c",
    "a b (c d e)",
    "+a +b",
    "a -b",
    "a +b -c",
    "+a b -c",
    "+a -b c",
    "a -b -c",
    "-a b c",

    "a OR b c AND d",
    "a OR b c",
    "a AND b c",
    "a OR b c OR d",
    "a OR b c d OR e",
    "a AND b c AND d",
    "a AND b c d AND e"
    };

    public static void main(String argv[]) throws Exception {
    Directory dir = new RAMDirectory();
    String[] stop = {};
    Analyzer analyzer = new StandardAnalyzer(stop);

    IndexWriter writer = new IndexWriter(dir, analyzer, true);

    for ( int i=0; i < docs.length; i++ ) {
    Document doc = new Document();
    doc.add(Field.Text("text", docs[i]));
    writer.addDocument(doc);
    }
    writer.close();

    Searcher searcher = new IndexSearcher(dir);
    for ( int i=0; i < queries.length; i++ ) {
    QueryParser parser = new QueryParser("text", analyzer);
    parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);

    Query [] query = new Query[4];

    query[0] = QueryParser.parse(queries[i], "text", analyzer);
    query[1] = QueryParser.parse(query[0].toString("text"), "text", analyzer);
    query[2] = parser.parse(queries[i]);
    query[3] = QueryParser.parse(query[2].toString("text"), "text", analyzer);

    System.out.println(i + ": " + queries[i] + " ==> " + query[0].toString("text") + " -> " + query[1].toString("text") + " / " + query[2].toString("text") + " -> " + query[3].toString("text"));
    if ( argv.length > 0 && argv[0].equals("-q") ) {
    for ( int k=0; k<4; k++ ) {
    Hits hits = searcher.search(query[k]);
    System.out.println(k + " " + query[k].toString("text") + "\t" + hits.length() + " documents found");
    for ( int j=0; j < hits.length(); j++ ) {
    Document doc = hits.doc(j);
    System.out.println("\t"+doc.get("text"));
    }
    }
    }
    }
    }
    }
    ============================================================

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Erik Hatcher at Dec 29, 2003 at 12:11 am
    Morus,

    I haven't had time to think through all of the issues and the patch you
    submitted, but I suggest that you go ahead and attach this to a
    Bugzilla issue so that it can be addressed more formally and avoid
    being lost in the mounds of e-mail we all get.

    Thanks,
    Erik

    On Dec 28, 2003, at 11:46 AM, Morus Walter wrote:

    Morus Walter writes:
    I attached the patch (made against 1.3rc3 but working for 1.3final as
    well)
    and a test program.
    Seems the attachments got stripped...

    So once again:

    The patch:

    ===File lucene/QueryParser.jj.patch===============
    *** QueryParser.jj.org Mon Dec 22 11:47:30 2003
    --- QueryParser.jj Mon Dec 22 13:20:57 2003
    ***************
    *** 233,255 ****

    protected void addClause(Vector clauses, int conj, int mods, Query
    q) {
    boolean required, prohibited;
    !
    ! // If this term is introduced by AND, make the preceding term
    required,
    // unless it's already prohibited
    if (conj == CONJ_AND) {
    ! BooleanClause c = (BooleanClause)
    clauses.elementAt(clauses.size()-1);
    ! if (!c.prohibited)
    ! c.required = true;
    ! }
    !
    ! if (operator == DEFAULT_OPERATOR_AND && conj == CONJ_OR) {
    ! // If this term is introduced by OR, make the preceding term
    optional,
    ! // unless it's prohibited (that means we leave -a OR b but +a
    OR b-->a OR b)
    ! // notice if the input is a OR b, first term is parsed as
    required; without
    ! // this modification a OR b would parsed as +a OR b
    ! BooleanClause c = (BooleanClause)
    clauses.elementAt(clauses.size()-1);
    ! if (!c.prohibited)
    ! c.required = false;
    }

    // We might have been passed a null query; the term might have
    been
    --- 233,249 ----

    protected void addClause(Vector clauses, int conj, int mods, Query
    q) {
    boolean required, prohibited;
    ! // System.out.println(conj+ " " + mods + " " +
    q.toString("text"));
    ! // If this term is introduced by AND, check if the previous term
    is the
    ! // first term in this or-group and make that term required,
    // unless it's already prohibited
    if (conj == CONJ_AND) {
    ! Vector clauses2 = (Vector)clauses.elementAt(clauses.size()-1);
    ! //if ( clauses2.size() == 1 ) {
    ! BooleanClause c = (BooleanClause)
    clauses2.elementAt(clauses2.size()-1);
    ! if (!c.prohibited)
    ! c.required = true;
    ! //}
    }

    // We might have been passed a null query; the term might have
    been
    ***************
    *** 257,277 ****
    if (q == null)
    return;

    if (operator == DEFAULT_OPERATOR_OR) {
    - // We set REQUIRED if we're introduced by AND or +; PROHIBITED
    if
    // introduced by NOT or -; make sure not to set both.
    prohibited = (mods == MOD_NOT);
    ! required = (mods == MOD_REQ);
    ! if (conj == CONJ_AND && !prohibited) {
    ! required = true;
    ! }
    ! } else {
    ! // We set PROHIBITED if we're introduced by NOT or -; We set
    REQUIRED
    ! // if not PROHIBITED and not introduced by OR
    prohibited = (mods == MOD_NOT);
    ! required = (!prohibited && conj != CONJ_OR);
    }
    ! clauses.addElement(new BooleanClause(q, required, prohibited));
    }

    /**
    --- 251,279 ----
    if (q == null)
    return;

    + // start new or-group if there's an explit or
    + if ( conj == CONJ_OR ) {
    + clauses.addElement(new Vector());
    + }
    +
    if (operator == DEFAULT_OPERATOR_OR) {
    // introduced by NOT or -; make sure not to set both.
    prohibited = (mods == MOD_NOT);
    ! // for explizit conjunctions: set required to true
    ! if ( conj == CONJ_AND ) {
    ! required = true;
    ! }
    ! else {
    ! // default OR -> required only when requested
    ! required = (mods == MOD_REQ);
    ! }
    ! } else { // operator == DEFAULT_OPERATOR_AND
    ! // We set PROHIBITED if we're introduced by NOT or -
    prohibited = (mods == MOD_NOT);
    ! // always REQUIRED unless PROHIBITED
    ! required = (!prohibited);
    }
    ! ((Vector)clauses.elementAt(clauses.size()-1)).addElement(new
    BooleanClause(q, required, prohibited));
    }

    /**
    ***************
    *** 359,369 ****
    */
    protected Query getBooleanQuery(Vector clauses) throws
    ParseException
    {
    ! BooleanQuery query = new BooleanQuery();
    ! for (int i = 0; i < clauses.size(); i++) {
    ! query.add((BooleanClause)clauses.elementAt(i));
    ! }
    ! return query;
    }

    /**
    --- 361,389 ----
    */
    protected Query getBooleanQuery(Vector clauses) throws
    ParseException
    {
    ! BooleanQuery query = new BooleanQuery();
    ! if ( clauses.size() == 1 ) {
    ! clauses = (Vector)clauses.elementAt(0);
    ! for (int i = 0; i < clauses.size(); i++) {
    ! query.add((BooleanClause)clauses.elementAt(i));
    ! }
    ! }
    ! else {
    ! for ( int i = 0; i < clauses.size(); i++ ) {
    ! Vector clauses2 = (Vector)clauses.elementAt(i);
    ! if ( clauses2.size() == 1 &&
    ((BooleanClause)clauses2.elementAt(0)).prohibited == false ) {
    ! query.add(new
    BooleanClause(((BooleanClause)clauses2.elementAt(0)).query, false,
    false));
    ! }
    ! else if ( clauses2.size() >= 1 ) {
    ! BooleanQuery query2 = new BooleanQuery();
    ! for ( int j = 0; j < clauses2.size(); j++ ) {
    ! query2.add((BooleanClause)clauses2.elementAt(j));
    ! }
    ! query.add(new BooleanClause(query2, false, false));
    ! }
    ! }
    ! }
    ! return query;
    }

    /**
    ***************
    *** 551,556 ****
    --- 571,577 ----
    Query Query(String field) :
    {
    Vector clauses = new Vector();
    + clauses.addElement(new Vector());
    Query q, firstQuery=null;
    int conj, mods;
    }
    ***************
    *** 566,572 ****
    { addClause(clauses, conj, mods, q); }
    )*
    {
    ! if (clauses.size() == 1 && firstQuery != null)
    return firstQuery;
    else {
    return getBooleanQuery(clauses);
    --- 587,593 ----
    { addClause(clauses, conj, mods, q); }
    )*
    {
    ! if (clauses.size() == 1 &&
    ((Vector)clauses.elementAt(0)).size() == 1 && firstQuery != null)
    return firstQuery;
    else {
    return getBooleanQuery(clauses);
    ============================================================

    and the test program:

    ===File lucene/LuceneTest.java===============
    import org.apache.lucene.document.*;
    import org.apache.lucene.analysis.*;
    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.index.*;
    import org.apache.lucene.store.*;
    import org.apache.lucene.search.*;
    import org.apache.lucene.queryParser.QueryParser;

    class LuceneTest
    {
    static String[] docs = {
    "a", "b", "c", "d",
    "a b", "a c", "a d", "b c", "b d", "c d",
    "a b c", "a b d", "a c d", "b c d",
    "a b c d"
    };

    static String[] queries = {
    "a OR b AND c",
    "(a OR b) AND c",
    "a OR (b AND c)",
    "a AND b",
    "a AND b OR c AND d",
    "(a AND b) OR (c AND d)",
    "a AND (b OR c) AND d",
    "((a AND b) OR c) AND d",
    "a AND (b OR (c AND d))",
    "a AND b AND c AND d",

    "a OR b AND NOT c",
    "(a OR b) AND NOT c",
    "a OR (b AND NOT c)",
    "a AND NOT d",
    "a AND NOT b OR c AND NOT d",
    "(a AND NOT b) OR (c AND NOT d)",
    "a AND NOT (b OR c) AND NOT d",
    "((a AND NOT b) OR c) AND NOT d",
    "a AND NOT (b OR (c AND NOT d))",
    "a AND NOT b AND NOT c AND NOT d",

    "a OR NOT b",
    "a OR NOT a",

    "a b",
    "a b c",
    "a b (c d e)",
    "+a +b",
    "a -b",
    "a +b -c",
    "+a b -c",
    "+a -b c",
    "a -b -c",
    "-a b c",

    "a OR b c AND d",
    "a OR b c",
    "a AND b c",
    "a OR b c OR d",
    "a OR b c d OR e",
    "a AND b c AND d",
    "a AND b c d AND e"
    };

    public static void main(String argv[]) throws Exception {
    Directory dir = new RAMDirectory();
    String[] stop = {};
    Analyzer analyzer = new StandardAnalyzer(stop);

    IndexWriter writer = new IndexWriter(dir, analyzer, true);

    for ( int i=0; i < docs.length; i++ ) {
    Document doc = new Document();
    doc.add(Field.Text("text", docs[i]));
    writer.addDocument(doc);
    }
    writer.close();

    Searcher searcher = new IndexSearcher(dir);
    for ( int i=0; i < queries.length; i++ ) {
    QueryParser parser = new QueryParser("text", analyzer);
    parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);

    Query [] query = new Query[4];

    query[0] = QueryParser.parse(queries[i], "text", analyzer);
    query[1] = QueryParser.parse(query[0].toString("text"), "text",
    analyzer);
    query[2] = parser.parse(queries[i]);
    query[3] = QueryParser.parse(query[2].toString("text"), "text",
    analyzer);

    System.out.println(i + ": " + queries[i] + " ==> " +
    query[0].toString("text") + " -> " + query[1].toString("text") + " / "
    + query[2].toString("text") + " -> " + query[3].toString("text"));
    if ( argv.length > 0 && argv[0].equals("-q") ) {
    for ( int k=0; k<4; k++ ) {
    Hits hits = searcher.search(query[k]);
    System.out.println(k + " " + query[k].toString("text") + "\t" +
    hits.length() + " documents found");
    for ( int j=0; j < hits.length(); j++ ) {
    Document doc = hits.doc(j);
    System.out.println("\t"+doc.get("text"));
    }
    }
    }
    }
    }
    }
    ============================================================

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Dror Matalon at Dec 29, 2003 at 8:07 pm
    my $.02.

    Before having patches, I think it's a good idea to agree on what the
    "right" solution is. Most of it is obvious using boolean logic, but we
    have some additional requirements like not having a query that only has
    a NOT clause. Is this the only exception?


    As far as the actual patch, I would suspect that a better approach than
    using java would be to use precedence operations in the actual parser.
    I've never used javacc, and it's been years since I've used yacc/bison,
    but one of the basic capbilities in parsers is to define precedence. It
    should be quite easy to fix it this way, and it should be more "bullet
    proof." I looked a bit at the javacc code, but I don't really have the
    time right now to analyze it. It certainly seems like the strategy of
    having all the operators together is problematic:

    <DEFAULT> TOKEN : {
    <AND: ("AND" | "&&") >
    <OR: ("OR" | "||") >
    <NOT: ("NOT" | "!") >
    <PLUS: "+" >
    <MINUS: "-" >
    <LPAREN: "(" >
    <RPAREN: ")" >
    <COLON: ":" >
    <CARAT: "^" > : Boost
    <QUOTED: "\"" (~["\""])+ "\"">
    <TERM: <_TERM_START_CHAR> (<_TERM_CHAR>)* >
    <FUZZY: "~" >
    <SLOP: "~" (<_NUM_CHAR>)+ >
    <PREFIXTERM: <_TERM_START_CHAR> (<_TERM_CHAR>)* "*" >
    <WILDTERM: <_TERM_START_CHAR>
    (<_TERM_CHAR> | ( [ "*", "?" ] ))* >
    <RANGEIN_START: "[" > : RangeIn
    <RANGEEX_START: "{" > : RangeEx
    }

    Something like http://www.lysator.liu.se/c/ANSI-C-grammar-y.html where
    different operators are grouped differently according to precedence
    would work better.

    As is often the case, trying to *correctly* parse a string is not
    trivial.

    Regards,

    Dror


    On Sun, Dec 28, 2003 at 07:11:22PM -0500, Erik Hatcher wrote:
    Morus,

    I haven't had time to think through all of the issues and the patch you
    submitted, but I suggest that you go ahead and attach this to a
    Bugzilla issue so that it can be addressed more formally and avoid
    being lost in the mounds of e-mail we all get.

    Thanks,
    Erik

    On Dec 28, 2003, at 11:46 AM, Morus Walter wrote:

    Morus Walter writes:
    I attached the patch (made against 1.3rc3 but working for 1.3final as
    well)
    and a test program.
    Seems the attachments got stripped...

    So once again:

    The patch:

    ===File lucene/QueryParser.jj.patch===============
    *** QueryParser.jj.org Mon Dec 22 11:47:30 2003
    --- QueryParser.jj Mon Dec 22 13:20:57 2003
    ***************
    *** 233,255 ****

    protected void addClause(Vector clauses, int conj, int mods, Query
    q) {
    boolean required, prohibited;
    !
    ! // If this term is introduced by AND, make the preceding term
    required,
    // unless it's already prohibited
    if (conj == CONJ_AND) {
    ! BooleanClause c = (BooleanClause)
    clauses.elementAt(clauses.size()-1);
    ! if (!c.prohibited)
    ! c.required = true;
    ! }
    !
    ! if (operator == DEFAULT_OPERATOR_AND && conj == CONJ_OR) {
    ! // If this term is introduced by OR, make the preceding term
    optional,
    ! // unless it's prohibited (that means we leave -a OR b but +a
    OR b-->a OR b)
    ! // notice if the input is a OR b, first term is parsed as
    required; without
    ! // this modification a OR b would parsed as +a OR b
    ! BooleanClause c = (BooleanClause)
    clauses.elementAt(clauses.size()-1);
    ! if (!c.prohibited)
    ! c.required = false;
    }

    // We might have been passed a null query; the term might have
    been
    --- 233,249 ----

    protected void addClause(Vector clauses, int conj, int mods, Query
    q) {
    boolean required, prohibited;
    ! // System.out.println(conj+ " " + mods + " " +
    q.toString("text"));
    ! // If this term is introduced by AND, check if the previous term
    is the
    ! // first term in this or-group and make that term required,
    // unless it's already prohibited
    if (conj == CONJ_AND) {
    ! Vector clauses2 = (Vector)clauses.elementAt(clauses.size()-1);
    ! //if ( clauses2.size() == 1 ) {
    ! BooleanClause c = (BooleanClause)
    clauses2.elementAt(clauses2.size()-1);
    ! if (!c.prohibited)
    ! c.required = true;
    ! //}
    }

    // We might have been passed a null query; the term might have
    been
    ***************
    *** 257,277 ****
    if (q == null)
    return;

    if (operator == DEFAULT_OPERATOR_OR) {
    - // We set REQUIRED if we're introduced by AND or +; PROHIBITED
    if
    // introduced by NOT or -; make sure not to set both.
    prohibited = (mods == MOD_NOT);
    ! required = (mods == MOD_REQ);
    ! if (conj == CONJ_AND && !prohibited) {
    ! required = true;
    ! }
    ! } else {
    ! // We set PROHIBITED if we're introduced by NOT or -; We set
    REQUIRED
    ! // if not PROHIBITED and not introduced by OR
    prohibited = (mods == MOD_NOT);
    ! required = (!prohibited && conj != CONJ_OR);
    }
    ! clauses.addElement(new BooleanClause(q, required, prohibited));
    }

    /**
    --- 251,279 ----
    if (q == null)
    return;

    + // start new or-group if there's an explit or
    + if ( conj == CONJ_OR ) {
    + clauses.addElement(new Vector());
    + }
    +
    if (operator == DEFAULT_OPERATOR_OR) {
    // introduced by NOT or -; make sure not to set both.
    prohibited = (mods == MOD_NOT);
    ! // for explizit conjunctions: set required to true
    ! if ( conj == CONJ_AND ) {
    ! required = true;
    ! }
    ! else {
    ! // default OR -> required only when requested
    ! required = (mods == MOD_REQ);
    ! }
    ! } else { // operator == DEFAULT_OPERATOR_AND
    ! // We set PROHIBITED if we're introduced by NOT or -
    prohibited = (mods == MOD_NOT);
    ! // always REQUIRED unless PROHIBITED
    ! required = (!prohibited);
    }
    ! ((Vector)clauses.elementAt(clauses.size()-1)).addElement(new
    BooleanClause(q, required, prohibited));
    }

    /**
    ***************
    *** 359,369 ****
    */
    protected Query getBooleanQuery(Vector clauses) throws
    ParseException
    {
    ! BooleanQuery query = new BooleanQuery();
    ! for (int i = 0; i < clauses.size(); i++) {
    ! query.add((BooleanClause)clauses.elementAt(i));
    ! }
    ! return query;
    }

    /**
    --- 361,389 ----
    */
    protected Query getBooleanQuery(Vector clauses) throws
    ParseException
    {
    ! BooleanQuery query = new BooleanQuery();
    ! if ( clauses.size() == 1 ) {
    ! clauses = (Vector)clauses.elementAt(0);
    ! for (int i = 0; i < clauses.size(); i++) {
    ! query.add((BooleanClause)clauses.elementAt(i));
    ! }
    ! }
    ! else {
    ! for ( int i = 0; i < clauses.size(); i++ ) {
    ! Vector clauses2 = (Vector)clauses.elementAt(i);
    ! if ( clauses2.size() == 1 &&
    ((BooleanClause)clauses2.elementAt(0)).prohibited == false ) {
    ! query.add(new
    BooleanClause(((BooleanClause)clauses2.elementAt(0)).query, false,
    false));
    ! }
    ! else if ( clauses2.size() >= 1 ) {
    ! BooleanQuery query2 = new BooleanQuery();
    ! for ( int j = 0; j < clauses2.size(); j++ ) {
    ! query2.add((BooleanClause)clauses2.elementAt(j));
    ! }
    ! query.add(new BooleanClause(query2, false, false));
    ! }
    ! }
    ! }
    ! return query;
    }

    /**
    ***************
    *** 551,556 ****
    --- 571,577 ----
    Query Query(String field) :
    {
    Vector clauses = new Vector();
    + clauses.addElement(new Vector());
    Query q, firstQuery=null;
    int conj, mods;
    }
    ***************
    *** 566,572 ****
    { addClause(clauses, conj, mods, q); }
    )*
    {
    ! if (clauses.size() == 1 && firstQuery != null)
    return firstQuery;
    else {
    return getBooleanQuery(clauses);
    --- 587,593 ----
    { addClause(clauses, conj, mods, q); }
    )*
    {
    ! if (clauses.size() == 1 &&
    ((Vector)clauses.elementAt(0)).size() == 1 && firstQuery != null)
    return firstQuery;
    else {
    return getBooleanQuery(clauses);
    ============================================================

    and the test program:

    ===File lucene/LuceneTest.java===============
    import org.apache.lucene.document.*;
    import org.apache.lucene.analysis.*;
    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.index.*;
    import org.apache.lucene.store.*;
    import org.apache.lucene.search.*;
    import org.apache.lucene.queryParser.QueryParser;

    class LuceneTest
    {
    static String[] docs = {
    "a", "b", "c", "d",
    "a b", "a c", "a d", "b c", "b d", "c d",
    "a b c", "a b d", "a c d", "b c d",
    "a b c d"
    };

    static String[] queries = {
    "a OR b AND c",
    "(a OR b) AND c",
    "a OR (b AND c)",
    "a AND b",
    "a AND b OR c AND d",
    "(a AND b) OR (c AND d)",
    "a AND (b OR c) AND d",
    "((a AND b) OR c) AND d",
    "a AND (b OR (c AND d))",
    "a AND b AND c AND d",

    "a OR b AND NOT c",
    "(a OR b) AND NOT c",
    "a OR (b AND NOT c)",
    "a AND NOT d",
    "a AND NOT b OR c AND NOT d",
    "(a AND NOT b) OR (c AND NOT d)",
    "a AND NOT (b OR c) AND NOT d",
    "((a AND NOT b) OR c) AND NOT d",
    "a AND NOT (b OR (c AND NOT d))",
    "a AND NOT b AND NOT c AND NOT d",

    "a OR NOT b",
    "a OR NOT a",

    "a b",
    "a b c",
    "a b (c d e)",
    "+a +b",
    "a -b",
    "a +b -c",
    "+a b -c",
    "+a -b c",
    "a -b -c",
    "-a b c",

    "a OR b c AND d",
    "a OR b c",
    "a AND b c",
    "a OR b c OR d",
    "a OR b c d OR e",
    "a AND b c AND d",
    "a AND b c d AND e"
    };

    public static void main(String argv[]) throws Exception {
    Directory dir = new RAMDirectory();
    String[] stop = {};
    Analyzer analyzer = new StandardAnalyzer(stop);

    IndexWriter writer = new IndexWriter(dir, analyzer, true);

    for ( int i=0; i < docs.length; i++ ) {
    Document doc = new Document();
    doc.add(Field.Text("text", docs[i]));
    writer.addDocument(doc);
    }
    writer.close();

    Searcher searcher = new IndexSearcher(dir);
    for ( int i=0; i < queries.length; i++ ) {
    QueryParser parser = new QueryParser("text", analyzer);
    parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);

    Query [] query = new Query[4];

    query[0] = QueryParser.parse(queries[i], "text", analyzer);
    query[1] = QueryParser.parse(query[0].toString("text"), "text",
    analyzer);
    query[2] = parser.parse(queries[i]);
    query[3] = QueryParser.parse(query[2].toString("text"), "text",
    analyzer);

    System.out.println(i + ": " + queries[i] + " ==> " +
    query[0].toString("text") + " -> " + query[1].toString("text") + " / "
    + query[2].toString("text") + " -> " + query[3].toString("text"));
    if ( argv.length > 0 && argv[0].equals("-q") ) {
    for ( int k=0; k<4; k++ ) {
    Hits hits = searcher.search(query[k]);
    System.out.println(k + " " + query[k].toString("text") +
    "\t" + hits.length() + " documents found");
    for ( int j=0; j < hits.length(); j++ ) {
    Document doc = hits.doc(j);
    System.out.println("\t"+doc.get("text"));
    }
    }
    }
    }
    }
    }
    ============================================================

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
    --
    Dror Matalon
    Zapatec Inc
    1700 MLK Way
    Berkeley, CA 94709
    http://www.fastbuzz.com
    http://www.zapatec.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Morus Walter at Dec 30, 2003 at 10:40 am

    Dror Matalon writes:
    my $.02.

    Before having patches, I think it's a good idea to agree on what the
    "right" solution is.
    I tried to raise that question in the first place. But there wasn't much
    responce.
    So I decided to make a concrete suggestion, how to change things.
    Most of it is obvious using boolean logic, but we
    have some additional requirements like not having a query that only has
    a NOT clause. Is this the only exception?
    To me the problem is, that there are two forms of queries
    - boolean queries (a OR b AND c...)
    - list of terms where some are flagged required and some are flagged forbidden
    (a +b -c ...) (in two forms: with default or and default and)

    For each of these it seems pretty clear, what they mean, but if you start
    to combine the two in one query, I don't know what that should mean.

    What's the meaning of a OR b c +d ?
    (Acutally there must be two meanings, one for default or, one for default and).
    Maybe it's obvious, but I fail to see it.
    As far as the actual patch, I would suspect that a better approach than
    using java would be to use precedence operations in the actual parser.
    Then you decide to do a complete rewrite of the query parser.
    That's something I wanted to avoid.

    I don't think that it matters how you implement a grammer though.
    The problem here is, that you have to define the grammer first.

    But I agree that doing it by JavaCC means is less error prone.
    Something like http://www.lysator.liu.se/c/ANSI-C-grammar-y.html where
    different operators are grouped differently according to precedence
    would work better.

    As is often the case, trying to *correctly* parse a string is not
    trivial.
    Right. Especially if there's no definition of correct...

    Morus

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Dror Matalon at Dec 30, 2003 at 6:36 pm
    Hi,

    First, let me say something that wasn't obvious from my first mail.
    While I had opinions about the implementation, I have a lot of respect
    for your finding a problem, and going ahead and coding a solution.
    On Tue, Dec 30, 2003 at 11:40:17AM +0100, Morus Walter wrote:
    Dror Matalon writes:
    my $.02.

    Before having patches, I think it's a good idea to agree on what the
    "right" solution is.
    I tried to raise that question in the first place. But there wasn't much
    responce.
    Might be the time of the year when many people are busy with other
    stuff.
    So I decided to make a concrete suggestion, how to change things.
    Most of it is obvious using boolean logic, but we
    have some additional requirements like not having a query that only has
    a NOT clause. Is this the only exception?
    To me the problem is, that there are two forms of queries
    - boolean queries (a OR b AND c...)
    - list of terms where some are flagged required and some are flagged forbidden
    (a +b -c ...) (in two forms: with default or and default and)

    For each of these it seems pretty clear, what they mean, but if you start
    to combine the two in one query, I don't know what that should mean.

    What's the meaning of a OR b c +d ?
    (Acutally there must be two meanings, one for default or, one for default and).
    Maybe it's obvious, but I fail to see it.
    You're right, it is confusing. Assuming default OR I would gess that the
    above means
    b c +d
    and assuming default AND it would mean
    +b +c +d
    Is there another interpretation?
    As far as the actual patch, I would suspect that a better approach than
    using java would be to use precedence operations in the actual parser.
    Then you decide to do a complete rewrite of the query parser.
    That's something I wanted to avoid.
    Ouch. I think you might be right. It might be a good idea to move this
    discussion to lucene-dev where we'd get more attention from the
    developers. This seems more like a developer issue than a user issue.
    I don't think that it matters how you implement a grammer though.
    The problem here is, that you have to define the grammer first.

    But I agree that doing it by JavaCC means is less error prone.
    Something like http://www.lysator.liu.se/c/ANSI-C-grammar-y.html where
    different operators are grouped differently according to precedence
    would work better.

    As is often the case, trying to *correctly* parse a string is not
    trivial.
    Right. Especially if there's no definition of correct...
    Morus

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
    --
    Dror Matalon
    Zapatec Inc
    1700 MLK Way
    Berkeley, CA 94709
    http://www.fastbuzz.com
    http://www.zapatec.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Morus Walter at Dec 30, 2003 at 8:13 pm
    Hi Dror,

    thanks for your answer. I really appreciate your comments.
    Before having patches, I think it's a good idea to agree on what the
    "right" solution is.
    I tried to raise that question in the first place. But there wasn't much
    responce.
    Might be the time of the year when many people are busy with other
    stuff.
    Probably.
    My impression was that many people don't have a problem with this issue.
    Otherwise I'd expecpt that the issue was raised earlier.
    So I decided to make a concrete suggestion, how to change things.
    Most of it is obvious using boolean logic, but we
    have some additional requirements like not having a query that only has
    a NOT clause. Is this the only exception?
    To me the problem is, that there are two forms of queries
    - boolean queries (a OR b AND c...)
    - list of terms where some are flagged required and some are flagged forbidden
    (a +b -c ...) (in two forms: with default or and default and)

    For each of these it seems pretty clear, what they mean, but if you start
    to combine the two in one query, I don't know what that should mean.

    What's the meaning of a OR b c +d ?
    (Acutally there must be two meanings, one for default or, one for default and).
    Maybe it's obvious, but I fail to see it.
    You're right, it is confusing. Assuming default OR I would gess that the
    above means
    b c +d
    and assuming default AND it would mean
    +b +c +d
    Is there another interpretation?
    You left out the 'a' which I intended to be part of the query (sorry if this
    was unclear).

    I was thinking about this issue, and currently I think that the only way to
    define this type of queries formally, is to give the default operator it's own
    precedence relativly to the precedence of 'OR' and 'AND'.
    So there are two possibilities:
    either the default operator has higher precedence than 'AND' or lower than
    'OR'.
    For default OR in the first case
    `a OR b c +d' would be equal to `(a OR b) c +d' == (a b) c +d
    in the second to `a OR (b c +d)' == a (b c +d)
    For default AND one has `+(a b) +c +d' and `a (+b +c +d)'

    (a b) c +d searches all documents containing d, occurences of a, b and c
    influence scoring
    a (b c +d) searches documents containing `a' joined with documents
    containing `d' (where b and c influcence scoring)
    Now, what's closer to what one might have meant by `a OR b c +d'?

    +(a b) +c +d searches documents containing c, d and either a or b.
    a (+b +c +d) searches documents containing a or each of b, c and d.

    The other alternative would be to forbid queries mixing default operators and
    explicit and/or. This is what I'd probably vote for at the moment.

    The patch doesn't implement any of these, as it handles the default operator
    on the same level as AND.
    As far as the actual patch, I would suspect that a better approach than
    using java would be to use precedence operations in the actual parser.
    Then you decide to do a complete rewrite of the query parser.
    That's something I wanted to avoid.
    Ouch. I think you might be right. It might be a good idea to move this
    discussion to lucene-dev where we'd get more attention from the
    developers. This seems more like a developer issue than a user issue.
    Hmm. That's be up to the developers.
    Don't know how many of them are reading lucene-user.

    I'd prefer to keep this on the user list since the query parser is only
    loosely coupled to lucenes core, while it is strongly coupled to the users
    needs. So I think the users should be included in the discussion and I think
    the user list is the best place for that.

    Morus

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Erik Hatcher at Dec 30, 2003 at 8:25 pm

    On Dec 30, 2003, at 3:13 PM, Morus Walter wrote:
    Hmm. That's be up to the developers.
    Don't know how many of them are reading lucene-user.
    I suspect we're all here!

    QueryParser is Lucene's red-headed step-child. It works "well enough",
    but it has more than its share of issues. It is almost a shame it is
    part of Lucene's core because of its loose coupling, but it does make
    Lucene quite approachable for simple applications at least.

    A complete rewrite of QueryParser would certainly be welcomed by most,
    I think.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Dror Matalon at Dec 30, 2003 at 8:42 pm

    On Tue, Dec 30, 2003 at 03:25:08PM -0500, Erik Hatcher wrote:
    On Dec 30, 2003, at 3:13 PM, Morus Walter wrote:
    Hmm. That's be up to the developers.
    Don't know how many of them are reading lucene-user.
    I suspect we're all here! Great.
    QueryParser is Lucene's red-headed step-child. It works "well enough",
    but it has more than its share of issues. It is almost a shame it is
    part of Lucene's core because of its loose coupling, but it does make
    Lucene quite approachable for simple applications at least.
    And to make things worse, I suspect that it works well enough for most
    users so that there's not enough motivation to fix it.

    I'll confess that I seldom use anyting but the defaults not only with
    lucene but also with google.
    A complete rewrite of QueryParser would certainly be welcomed by most,
    I think.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
    --
    Dror Matalon
    Zapatec Inc
    1700 MLK Way
    Berkeley, CA 94709
    http://www.fastbuzz.com
    http://www.zapatec.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Dror Matalon at Dec 30, 2003 at 9:10 pm
    On Tue, Dec 30, 2003 at 09:13:30PM +0100, Morus Walter wrote:
    ...
    What's the meaning of a OR b c +d ?
    (Acutally there must be two meanings, one for default or, one for default and).
    Maybe it's obvious, but I fail to see it.
    You're right, it is confusing. Assuming default OR I would gess that the
    above means
    b c +d
    and assuming default AND it would mean
    +b +c +d
    Is there another interpretation?
    You left out the 'a' which I intended to be part of the query (sorry if this
    was unclear).
    Oops, my mistake.
    I was thinking about this issue, and currently I think that the only way to
    define this type of queries formally, is to give the default operator it's own
    precedence relativly to the precedence of 'OR' and 'AND'.
    So there are two possibilities:
    either the default operator has higher precedence than 'AND' or lower than
    'OR'.
    For default OR in the first case
    `a OR b c +d' would be equal to `(a OR b) c +d' == (a b) c +d
    in the second to `a OR (b c +d)' == a (b c +d)
    For default AND one has `+(a b) +c +d' and `a (+b +c +d)'

    (a b) c +d searches all documents containing d, occurences of a, b and c
    influence scoring
    a (b c +d) searches documents containing `a' joined with documents
    containing `d' (where b and c influcence scoring)
    Now, what's closer to what one might have meant by `a OR b c +d'?

    +(a b) +c +d searches documents containing c, d and either a or b.
    a (+b +c +d) searches documents containing a or each of b, c and d.
    I don't think this is a good idea. Mostly because it would be hard to
    explain/document, and you don't want end users to have to think and read
    a lot of documentation when doing a search.

    For one thing, I would advocate for using the '+' notation as the
    underlying syntax and migrating to boolean operators since that's many
    more people are used to that syntax, and I believe it's better
    understood.
    The other alternative would be to forbid queries mixing default operators and
    explicit and/or. This is what I'd probably vote for at the moment.
    At first I was inclined to agree but as a rule I think we should adopt
    the WWGD (What Would Google Do) philosophy, since that's the syntax and
    behavior that most people are used to.

    It looks like it basically adds an "AND" between any two terms that
    don't have operator between them. We could do the same for both the
    default AND and the default OR. Once you've done that, you just use the
    standard boolean logic precedence rule.

    Now the good news on all of this is that it seems (I did a small test),
    that if you use parenthesis the parser does the right thing. In my mind,
    it's a good idea to use parenthesis whenever you're creating complex
    expressions.
    The patch doesn't implement any of these, as it handles the default operator
    on the same level as AND.
    As far as the actual patch, I would suspect that a better approach than
    using java would be to use precedence operations in the actual parser.
    Then you decide to do a complete rewrite of the query parser.
    That's something I wanted to avoid.
    Ouch. I think you might be right. It might be a good idea to move this
    discussion to lucene-dev where we'd get more attention from the
    developers. This seems more like a developer issue than a user issue.
    Hmm. That's be up to the developers.
    Don't know how many of them are reading lucene-user.

    I'd prefer to keep this on the user list since the query parser is only
    loosely coupled to lucenes core, while it is strongly coupled to the users
    needs. So I think the users should be included in the discussion and I think
    the user list is the best place for that.
    And Erik indicated that they're here anyway, so it's fine.

    Regards,

    Dror
    Morus

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
    --
    Dror Matalon
    Zapatec Inc
    1700 MLK Way
    Berkeley, CA 94709
    http://www.fastbuzz.com
    http://www.zapatec.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Morus Walter at Dec 30, 2003 at 10:19 pm
    Hi Dror,
    I was thinking about this issue, and currently I think that the only way to
    define this type of queries formally, is to give the default operator it's own
    precedence relativly to the precedence of 'OR' and 'AND'.
    So there are two possibilities:
    either the default operator has higher precedence than 'AND' or lower than
    'OR'.
    For default OR in the first case
    `a OR b c +d' would be equal to `(a OR b) c +d' == (a b) c +d
    in the second to `a OR (b c +d)' == a (b c +d)
    For default AND one has `+(a b) +c +d' and `a (+b +c +d)'

    (a b) c +d searches all documents containing d, occurences of a, b and c
    influence scoring
    a (b c +d) searches documents containing `a' joined with documents
    containing `d' (where b and c influcence scoring)
    Now, what's closer to what one might have meant by `a OR b c +d'?

    +(a b) +c +d searches documents containing c, d and either a or b.
    a (+b +c +d) searches documents containing a or each of b, c and d.
    I don't think this is a good idea. Mostly because it would be hard to
    explain/document, and you don't want end users to have to think and read
    a lot of documentation when doing a search.

    For one thing, I would advocate for using the '+' notation as the
    underlying syntax and migrating to boolean operators since that's many
    more people are used to that syntax, and I believe it's better
    understood.
    I'm not sure if I understand what you mean here.
    The other alternative would be to forbid queries mixing default operators and
    explicit and/or. This is what I'd probably vote for at the moment.
    At first I was inclined to agree but as a rule I think we should adopt
    the WWGD (What Would Google Do) philosophy, since that's the syntax and
    behavior that most people are used to.

    It looks like it basically adds an "AND" between any two terms that
    don't have operator between them. We could do the same for both the
    default AND and the default OR. Once you've done that, you just use the
    standard boolean logic precedence rule.
    Hmm. Then you loose the possibility to create BooleanQuery-objects where
    some of the terms are required some forbidden and some have neither flag.
    To have this possibility is the reason why I say that implicit AND/OR and
    explicit AND/OR need to be different things.
    If an implicit OR equals an explicit OR, you would have '+a +b' = '+a OR +b'
    = '(+a) OR (+b)' = 'a OR b' which is probably not, what was intended.
    So either the '+' operator is removed or it is used as an alternative to AND
    in which case it could not be a prefix. So instead of '+a +b' one would use
    'a + b'.

    A consequence of pure boolean operators is, that there won't be a way of
    serializing an arbitray query to a parsable string in standard query parser
    syntax.

    So for completeness and compatibility with the current query parser, I would
    keep the current behaviour of queries without explicit boolean operators.

    The problem for users isn't that big IMHO.
    Unless a user decides to make use of the '+' operator things are pretty clear:
    a b c searches for documents containing one or all of these terms (depending
    on the default operator). Using terms with the '-' operator also does what
    one expects. Only if the user starts to use the '+' operator explicitly,
    things are getting more complicated. So he just shouldn't do that unless
    he knows what he does.
    The same thing applies to queries using AND/OR as long as you don't mix it
    with implicit operators. IMO whoever does the latter get's what he deserves,
    if he has to deal with the difficulties of such queries. One just should
    not do that, and it should be pretty clear, that the meaning of such a query
    is unclear (unless parenthesis are used, in which case there is no mixing
    any longer).
    That is, why I think my patch is good enough, even if it leaves the evaluation
    of such queries without clear definition.
    Now the good news on all of this is that it seems (I did a small test),
    that if you use parenthesis the parser does the right thing. In my mind,
    it's a good idea to use parenthesis whenever you're creating complex
    expressions.
    Sure. All we are talking about is what happens if there are no explicit
    parenthesis. If you use parentheses you break the query into simple parts
    (e.g. (a AND b) OR (c AND d) are two queries of type 'x AND y' and one
    query of typ 'x OR y' (where x and y are queries, not just terms)), which
    are handled correctly even by the current query parser.
    That's one of the reasons, why this hasn't been a big problem in the past.
    If you use (a AND b) OR (c AND d) you will get what you expect.
    It's just that I think the query parser should also create a reasonable
    query if the parenthesis are removed.

    Morus

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Dror Matalon at Dec 30, 2003 at 11:37 pm

    On Tue, Dec 30, 2003 at 11:19:38PM +0100, Morus Walter wrote:
    Hi Dror,
    For one thing, I would advocate for using the '+' notation as the
    underlying syntax and migrating to boolean operators since that's many
    more people are used to that syntax, and I believe it's better
    understood.
    I'm not sure if I understand what you mean here.
    I meant that the queryparse would accept AND and OR which get translated
    into '+' and '-' but does not accept the '+' and '-' directly.
    The other alternative would be to forbid queries mixing default operators and
    explicit and/or. This is what I'd probably vote for at the moment.
    At first I was inclined to agree but as a rule I think we should adopt
    the WWGD (What Would Google Do) philosophy, since that's the syntax and
    behavior that most people are used to.

    It looks like it basically adds an "AND" between any two terms that
    don't have operator between them. We could do the same for both the
    default AND and the default OR. Once you've done that, you just use the
    standard boolean logic precedence rule.
    Hmm. Then you loose the possibility to create BooleanQuery-objects where
    some of the terms are required some forbidden and some have neither flag.
    To have this possibility is the reason why I say that implicit AND/OR and
    explicit AND/OR need to be different things.
    If an implicit OR equals an explicit OR, you would have '+a +b' = '+a OR +b'
    = '(+a) OR (+b)' = 'a OR b' which is probably not, what was intended.
    So either the '+' operator is removed or it is used as an alternative to AND
    in which case it could not be a prefix. So instead of '+a +b' one would use
    'a + b'.
    Which is my point above. It's too confusing to have:
    1. '+' and '-'
    2. Explict AND and OR
    3. Implict AND or OR

    There's some redundancy between all three, and it's quite easy to get
    confused.
    A consequence of pure boolean operators is, that there won't be a way of
    serializing an arbitray query to a parsable string in standard query parser
    syntax.

    So for completeness and compatibility with the current query parser, I would
    keep the current behaviour of queries without explicit boolean operators.

    The problem for users isn't that big IMHO.
    Unless a user decides to make use of the '+' operator things are pretty clear:
    a b c searches for documents containing one or all of these terms (depending
    on the default operator). Using terms with the '-' operator also does what
    one expects. Only if the user starts to use the '+' operator explicitly,
    things are getting more complicated. So he just shouldn't do that unless
    he knows what he does.
    Fair enough.
    The same thing applies to queries using AND/OR as long as you don't mix it
    with implicit operators. IMO whoever does the latter get's what he deserves,
    if he has to deal with the difficulties of such queries. One just should
    not do that, and it should be pretty clear, that the meaning of such a query
    is unclear (unless parenthesis are used, in which case there is no mixing
    any longer).
    That is, why I think my patch is good enough, even if it leaves the evaluation
    of such queries without clear definition.
    I guess I can be convinced. Clearly things are broken, and clearly if
    your patch works as advertised, it should make things better rather than
    worse. And a partial solution is better than no solution. So, if the
    developers bless the patch, run it through the test suite and it comes
    out looking good, I'm for it.


    Again, thanks for spending the time on this.

    Regards,

    Dror
    Now the good news on all of this is that it seems (I did a small test),
    that if you use parenthesis the parser does the right thing. In my mind,
    it's a good idea to use parenthesis whenever you're creating complex
    expressions.
    Sure. All we are talking about is what happens if there are no explicit
    parenthesis. If you use parentheses you break the query into simple parts
    (e.g. (a AND b) OR (c AND d) are two queries of type 'x AND y' and one
    query of typ 'x OR y' (where x and y are queries, not just terms)), which
    are handled correctly even by the current query parser.
    That's one of the reasons, why this hasn't been a big problem in the past.
    If you use (a AND b) OR (c AND d) you will get what you expect.
    It's just that I think the query parser should also create a reasonable
    query if the parenthesis are removed.

    Morus

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
    --
    Dror Matalon
    Zapatec Inc
    1700 MLK Way
    Berkeley, CA 94709
    http://www.fastbuzz.com
    http://www.zapatec.com

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Colin McGuigan at Jan 2, 2004 at 4:39 pm
    Hello all,

    I am new to Lucene and working through the Lucene examples on the Jakarta
    site.
    In the IndexHTML example,
    when I type in (from my Tomcat webapps directory)
    java org.apache.lucene.demo.IndexHTML -create -index{index}..

    It creates an index, but when I search using
    http://localhost:8000/luceneweb/
    The page works but I do not get any replies.


    Could someone please help me-

    1. How do you specify which directory is to be searched
    ( I assumed it was the current directory ie tomcat\webapps but when I put in
    more searchable content nothing comes up in the search
    I have also tried typing java
    org.apache.lucene.demo.IndexHTML -create -index{content}.. where content is
    the directory with the content but this still doesnt work)

    2. What is the easiest way to specify fields (such as title, etc) to be
    searched?
    (i.e. what file needs changed to allow me to search for specific fields)

    3. Is there a very simple step by step guide for someone new on how to use
    lucene.
    (I have looked at Jakartas site but still do not the answers to the above)

    Thanking you in anticipation,

    Colin.


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Erik Hatcher at Jan 2, 2004 at 6:45 pm

    On Jan 2, 2004, at 11:49 AM, Colin McGuigan wrote:
    1. How do you specify which directory is to be searched
    ( I assumed it was the current directory ie tomcat\webapps but when I
    put in
    more searchable content nothing comes up in the search
    I have also tried typing java
    org.apache.lucene.demo.IndexHTML -create -index{content}.. where
    content is
    the directory with the content but this still doesnt work)
    Quite sadly, the demo application that ships with Lucene is inadequate
    for a nice sales pitch or starter demo to lure folks in. It is my plan
    (eventually - more later than sooner at this point, but you can
    definitely count on it from me) to enhance the demo application to be
    quite nice and easy to use.
    2. What is the easiest way to specify fields (such as title, etc) to be
    searched?
    (i.e. what file needs changed to allow me to search for specific
    fields)
    The source code to HTMLDocument shows what fields are indexed. To
    search on a specific field, use the syntax you see here:
    <http://jakarta.apache.org/lucene/docs/queryparsersyntax.html>
    3. Is there a very simple step by step guide for someone new on how to
    use
    lucene.
    (I have looked at Jakartas site but still do not the answers to the
    above)
    There are articles available on the resources page:
    <http://jakarta.apache.org/lucene/docs/resources.html>, and a new one
    of mine that isn't listed there (yet) at
    <http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html>

    My recommendation is for you to do your own experimenting and not try
    to tinker with the demo application. What you need to know to use
    Lucene effectively is actually quite simple and you can glean all of
    that from the articles in a cleaner way than the demo app.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Colin McGuigan at Jan 3, 2004 at 5:18 pm
    Erik, Leo, Daniel,

    just a short note to thank you for your help in the above.
    I realise I have alot of work ahead of myself but am keen to continue with
    Lucene as I have been impressed with what I have got working.

    best regards,

    Colin.
    ----- Original Message -----
    From: "Erik Hatcher" <erik@ehatchersolutions.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Friday, January 02, 2004 6:44 PM
    Subject: Re: IndexHTML example on Jakarta Site

    On Jan 2, 2004, at 11:49 AM, Colin McGuigan wrote:
    1. How do you specify which directory is to be searched
    ( I assumed it was the current directory ie tomcat\webapps but when I
    put in
    more searchable content nothing comes up in the search
    I have also tried typing java
    org.apache.lucene.demo.IndexHTML -create -index{content}.. where
    content is
    the directory with the content but this still doesnt work)
    Quite sadly, the demo application that ships with Lucene is inadequate
    for a nice sales pitch or starter demo to lure folks in. It is my plan
    (eventually - more later than sooner at this point, but you can
    definitely count on it from me) to enhance the demo application to be
    quite nice and easy to use.
    2. What is the easiest way to specify fields (such as title, etc) to be
    searched?
    (i.e. what file needs changed to allow me to search for specific
    fields)
    The source code to HTMLDocument shows what fields are indexed. To
    search on a specific field, use the syntax you see here:
    <http://jakarta.apache.org/lucene/docs/queryparsersyntax.html>
    3. Is there a very simple step by step guide for someone new on how to
    use
    lucene.
    (I have looked at Jakartas site but still do not the answers to the
    above)
    There are articles available on the resources page:
    <http://jakarta.apache.org/lucene/docs/resources.html>, and a new one
    of mine that isn't listed there (yet) at
    <http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html>

    My recommendation is for you to do your own experimenting and not try
    to tinker with the demo application. What you need to know to use
    Lucene effectively is actually quite simple and you can glean all of
    that from the articles in a cleaner way than the demo app.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Leo Galambos at Jan 2, 2004 at 7:51 pm

    Colin McGuigan wrote:
    It creates an index, but when I search using
    http://localhost:8000/luceneweb/
    The page works but I do not get any replies.


    Can it read your index? See indexLocation in configuration.jsp
    1. How do you specify which directory is to be searched
    <snip>
    I agree with Erik, that you would rather use an application which is
    ready for use in a minute. IMHO Lucene is library/API and unless you are
    a JAVA developer, it does not fit your needs. Some applications are
    listed here:
    http://dmoz.org/Computers/Programming/Languages/Java/Server-Side/Search_Engines/
    Omit the Lucene link, else you will be in an endless loop... ;-)

    If you must use Lucene, try to find something for you here:
    http://jakarta.apache.org/lucene/docs/powered.html
    You may be interested in i2a, but their demo (@24.9.177.111) is dead
    right now.

    Cheers,
    Leo


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Daniel Naber at Jan 2, 2004 at 8:52 pm

    On Friday 02 January 2004 20:50, Leo Galambos wrote:

    IMHO Lucene is library/API and unless you are
    a JAVA developer, it does not fit your needs.
    One reason for the confusion might be that the homepage states that Lucene
    is a "full-featured text search engine". IMHO this should be replaced by
    "a powerful Java library for full-text indexing" or something like that.

    Regards
    Daniel

    --
    http://www.danielnaber.de

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Morus Walter at Dec 30, 2003 at 10:39 am
    Hi Erik,
    I haven't had time to think through all of the issues and the patch you
    submitted, but I suggest that you go ahead and attach this to a
    Bugzilla issue so that it can be addressed more formally and avoid
    being lost in the mounds of e-mail we all get.
    Well, I'd have taken care that it doesn't get lost.
    But if you think, that it's better to have the issue as a bug report, no
    problem.
    See:
    http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25820

    Morus

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 9, '03 at 9:58a
activeJan 3, '04 at 5:18p
posts22
users8
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase