FAQ
I'm experiencing some very puzzling search behavior. I am using the CVS head I pulled about a week ago. I use the StandardAnalyzer and QueryParser. I have a collection of XML documents indexed. One field is "subhead", and here's what I find with different queries:
subhead:(missile defense) - works fine
subhead("missile" "defense") - works fine
subhead("missile defense") - fails
subhead(missile defense "missile defense") - fails
subhead(missile defense "missile dork") - works fine
subhead(missile defense "missile defens") - works fine (note misspelling)

At the moment, I can't find any other field or phrase that does this. However, according to my notes (as I'm no longer trusting my mind on this), about a week ago (about the time I started using the new CVS version) I noticed similar behavior with the query 'subhead:"al qaeda" - but that now works perfectly fine! Same thing with the query 'summary:"heart disease"; it failed to work and then a day or so later, it worked. (I merge new documents into the master index each day.)

Any ideas on what might possibly be going on would be very much appreciated.

Regards,

Terry

Search Discussions

  • Erik Hatcher at Mar 31, 2004 at 2:55 pm
    On Mar 31, 2004, at 9:49 AM, Terry Steichen wrote:\
    I'm experiencing some very puzzling search behavior. I am using the
    CVS head I pulled about a week ago. I use the StandardAnalyzer and
    QueryParser. I have a collection of XML documents indexed. One field
    is "subhead", and here's what I find with different queries:
    subhead:(missile defense) - works fine
    subhead("missile" "defense") - works fine
    subhead("missile defense") - fails
    subhead(missile defense "missile defense") - fails
    subhead(missile defense "missile dork") - works fine
    subhead(missile defense "missile defens") - works fine (note
    misspelling)
    I presume the missing colons on all but the first example is just a
    typo in your e-mail? If not, might that be the problem?

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Terry Steichen at Mar 31, 2004 at 4:48 pm
    No, they're typos in the e-mail. In the application, all the colons are
    properly placed. (Guess I was/am so frustrated I can't write right any
    more).

    Terry

    ----- Original Message -----
    From: "Erik Hatcher" <erik@ehatchersolutions.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Wednesday, March 31, 2004 9:55 AM
    Subject: Re: Wierd Search Behavior

    On Mar 31, 2004, at 9:49 AM, Terry Steichen wrote:\
    I'm experiencing some very puzzling search behavior. I am using the
    CVS head I pulled about a week ago. I use the StandardAnalyzer and
    QueryParser. I have a collection of XML documents indexed. One field
    is "subhead", and here's what I find with different queries:
    subhead:(missile defense) - works fine
    subhead("missile" "defense") - works fine
    subhead("missile defense") - fails
    subhead(missile defense "missile defense") - fails
    subhead(missile defense "missile dork") - works fine
    subhead(missile defense "missile defens") - works fine (note
    misspelling)
    I presume the missing colons on all but the first example is just a
    typo in your e-mail? If not, might that be the problem?

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Terry Steichen at Apr 1, 2004 at 12:51 pm
    I did some more checking and uncovered what appears to be a serious Lucene
    problem. (Either that or my merge code - below - is wrong.) Appreciate any
    help in figuring out what's wrong. Here are the facts as I see them:

    1) I put together a large number of canned queries (some rather complex) for
    routine testing purposes.
    2) I created a new compound file index and tested the queries. All worked
    fine.
    3) I then indexed some new documents and merged the new index with the
    original index.
    4) I then tried the queries again. Each time I did this, about 1-3% of the
    queries no longer worked - the actual number appears to vary with each
    merge.
    5) The specific queries that fail change with each merge. Ones that failed
    after the previous merge almost always appear to work again with the next
    merge (which produces a new batch of failures).
    6) In all cases I've so far examined, the offending part of the affected
    queries is a single quoted phrase (even though there may be several such
    phrases in the query) - remove it, and the (now modified) query works fine.
    7) I tried the same thing using the original multi-file index format, with
    the same results.
    8) About a week and a half ago, I migrated from 1.3final to the latest CVS
    head.
    9) I've only just started checking this, so I don't know how long this
    behavior has been going on. The small percentage of errors and (apparent)
    randomness of which query is affected make it hard to detect.
    10) I have about 32 fields per document, most of which are tokenized,
    indexed and stored.
    11) My merge code (for the multi-file index format) is this:

    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.index.IndexWriter;
    import org.apache.lucene.store.FSDirectory;

    class MergeIndices {
    public static void main(String[] args) {

    //args[0]: relative path to main index
    //args[1]: relative path to new index (to be merged with main)

    try {
    IndexWriter writer = new IndexWriter(args[0], new StandardAnalyzer(),
    false);
    // writer.setUseCompoundFile(true); //used for compound format
    FSDirectory dir = FSDirectory.getDirectory(args[1], false);
    FSDirectory[] dirs = new FSDirectory[1];
    dirs[0] = dir;
    writer.addIndexes(dirs);
    writer.optimize();
    writer.close();
    } catch (Exception e) {
    System.out.println(" caught a " + e.getClass() +
    "\n with message: " + e.getMessage());
    }
    }

    }



    ----- Original Message -----
    From: "Terry Steichen" <terry@net-frame.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Wednesday, March 31, 2004 11:47 AM
    Subject: Re: Wierd Search Behavior

    No, they're typos in the e-mail. In the application, all the colons are
    properly placed. (Guess I was/am so frustrated I can't write right any
    more).

    Terry

    ----- Original Message -----
    From: "Erik Hatcher" <erik@ehatchersolutions.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Wednesday, March 31, 2004 9:55 AM
    Subject: Re: Wierd Search Behavior

    On Mar 31, 2004, at 9:49 AM, Terry Steichen wrote:\
    I'm experiencing some very puzzling search behavior. I am using the
    CVS head I pulled about a week ago. I use the StandardAnalyzer and
    QueryParser. I have a collection of XML documents indexed. One field
    is "subhead", and here's what I find with different queries:
    subhead:(missile defense) - works fine
    subhead("missile" "defense") - works fine
    subhead("missile defense") - fails
    subhead(missile defense "missile defense") - fails
    subhead(missile defense "missile dork") - works fine
    subhead(missile defense "missile defens") - works fine (note
    misspelling)
    I presume the missing colons on all but the first example is just a
    typo in your e-mail? If not, might that be the problem?

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Doug Cutting at Apr 1, 2004 at 5:03 pm
    Terry,

    Can you please try to develop a reproducible test case? Otherwise it's
    impossible to verify and debug this.

    For something like this it would suffice to provide:

    1. The initial index, which satisifies the test queries;

    2. The new index you add;

    3. Your merge and test code, as a single class that illustrates the
    problem.

    The smaller the indexes the better: not only will it be easier to
    transfer them, but debugging will be faster.

    Also, you should add a bug to track this, at:

    http://issues.apache.org/bugzilla/enter_bug.cgi?product=Lucene

    Doug

    Terry Steichen wrote:
    I did some more checking and uncovered what appears to be a serious Lucene
    problem. (Either that or my merge code - below - is wrong.) Appreciate any
    help in figuring out what's wrong. Here are the facts as I see them:

    1) I put together a large number of canned queries (some rather complex) for
    routine testing purposes.
    2) I created a new compound file index and tested the queries. All worked
    fine.
    3) I then indexed some new documents and merged the new index with the
    original index.
    4) I then tried the queries again. Each time I did this, about 1-3% of the
    queries no longer worked - the actual number appears to vary with each
    merge.
    5) The specific queries that fail change with each merge. Ones that failed
    after the previous merge almost always appear to work again with the next
    merge (which produces a new batch of failures).
    6) In all cases I've so far examined, the offending part of the affected
    queries is a single quoted phrase (even though there may be several such
    phrases in the query) - remove it, and the (now modified) query works fine.
    7) I tried the same thing using the original multi-file index format, with
    the same results.
    8) About a week and a half ago, I migrated from 1.3final to the latest CVS
    head.
    9) I've only just started checking this, so I don't know how long this
    behavior has been going on. The small percentage of errors and (apparent)
    randomness of which query is affected make it hard to detect.
    10) I have about 32 fields per document, most of which are tokenized,
    indexed and stored.
    11) My merge code (for the multi-file index format) is this:

    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.index.IndexWriter;
    import org.apache.lucene.store.FSDirectory;

    class MergeIndices {
    public static void main(String[] args) {

    //args[0]: relative path to main index
    //args[1]: relative path to new index (to be merged with main)

    try {
    IndexWriter writer = new IndexWriter(args[0], new StandardAnalyzer(),
    false);
    // writer.setUseCompoundFile(true); //used for compound format
    FSDirectory dir = FSDirectory.getDirectory(args[1], false);
    FSDirectory[] dirs = new FSDirectory[1];
    dirs[0] = dir;
    writer.addIndexes(dirs);
    writer.optimize();
    writer.close();
    } catch (Exception e) {
    System.out.println(" caught a " + e.getClass() +
    "\n with message: " + e.getMessage());
    }
    }

    }



    ----- Original Message -----
    From: "Terry Steichen" <terry@net-frame.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Wednesday, March 31, 2004 11:47 AM
    Subject: Re: Wierd Search Behavior


    No, they're typos in the e-mail. In the application, all the colons are
    properly placed. (Guess I was/am so frustrated I can't write right any
    more).

    Terry

    ----- Original Message -----
    From: "Erik Hatcher" <erik@ehatchersolutions.com>
    To: "Lucene Users List" <lucene-user@jakarta.apache.org>
    Sent: Wednesday, March 31, 2004 9:55 AM
    Subject: Re: Wierd Search Behavior


    On Mar 31, 2004, at 9:49 AM, Terry Steichen wrote:\
    I'm experiencing some very puzzling search behavior. I am using the
    CVS head I pulled about a week ago. I use the StandardAnalyzer and
    QueryParser. I have a collection of XML documents indexed. One field
    is "subhead", and here's what I find with different queries:
    subhead:(missile defense) - works fine
    subhead("missile" "defense") - works fine
    subhead("missile defense") - fails
    subhead(missile defense "missile defense") - fails
    subhead(missile defense "missile dork") - works fine
    subhead(missile defense "missile defens") - works fine (note
    misspelling)
    I presume the missing colons on all but the first example is just a
    typo in your e-mail? If not, might that be the problem?

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMar 31, '04 at 2:49p
activeApr 1, '04 at 5:03p
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase