FAQ
My goal is to be able to get meaningful results from search queries that
include some words that are on the default stop words list, especially
"not". I am using the StandardAnalyzer and I have tried passing in null and
an empty set for the set of stop words to use in the constructor hoping that
no words would be stripped but I am getting strange results.

If I enter a query of just the word "not" I get no matches. If I run a
query with just the word "included" I get lots of matches. If I run the
query "not included" (without surrounding quotation marks) I get lots of
matches and the highlighter indicates that "not" is one of the matching
fragments. But if I run the query ""not included"" (with surrounding
quotation marks) I get no matches even though there are many occurrences in
the content of that exact phrase which were matched when I entered the same
query without the quotation marks.

What's going on here? Why can't I search for the word "not" by itself or in
a quote? Similar behaviour happens for other words like "the" but I am
explicitly telling the analyzer not to remove any words (or so I believe).
How can I achieve a StandardAnalyzer where every word in the query is
significant?

Thanks,

-sbs

--
View this message in context: http://lucene.472066.n3.nabble.com/Strange-StopFilter-and-stop-words-behaviour-tp3199367p3199367.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Ian Lea at Jul 26, 2011 at 10:58 am
    I think that passing an empty set or null to StandardAnalyzer should
    do what you want. There are useful tips at
    http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F.

    My guess would be that you aren't using a no-stop-words version of
    StandardAnalyzer at both index and query time.


    --
    Ian.
    On Tue, Jul 26, 2011 at 4:25 AM, SBS wrote:
    My goal is to be able to get meaningful results from search queries that
    include some words that are on the default stop words list, especially
    "not".  I am using the StandardAnalyzer and I have tried passing in null and
    an empty set for the set of stop words to use in the constructor hoping that
    no words would be stripped but I am getting strange results.

    If I enter a query of just the word "not" I get no matches.  If I run a
    query with just the word "included" I get lots of matches.  If I run the
    query "not included" (without surrounding quotation marks) I get lots of
    matches and the highlighter indicates that "not" is one of the matching
    fragments.  But if I run the query ""not included"" (with surrounding
    quotation marks) I get no matches even though there are many occurrences in
    the content of that exact phrase which were matched when I entered the same
    query without the quotation marks.

    What's going on here?  Why can't I search for the word "not" by itself or in
    a quote?  Similar behaviour happens for other words like "the" but I am
    explicitly telling the analyzer not to remove any words (or so I believe).
    How can I achieve a StandardAnalyzer where every word in the query is
    significant?

    Thanks,

    -sbs

    --
    View this message in context: http://lucene.472066.n3.nabble.com/Strange-StopFilter-and-stop-words-behaviour-tp3199367p3199367.html
    Sent from the Lucene - Java Users mailing list archive at Nabble.com.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Dawn Zoë Raison at Jul 26, 2011 at 11:05 am
    Are you using QueryAnalyser...?
    If so remember that NOT is a reserved word.

    Dawn
    On 26/07/2011 04:25, SBS wrote:
    If I enter a query of just the word "not" I get no matches. If I run a
    query with just the word "included" I get lots of matches. If I run the
    query "not included" (without surrounding quotation marks) I get lots of
    matches and the highlighter indicates that "not" is one of the matching
    fragments. But if I run the query ""not included"" (with surrounding
    quotation marks) I get no matches even though there are many occurrences in
    the content of that exact phrase which were matched when I entered the same
    query without the quotation marks.
    --

    Rgds.
    *Dawn Raison*
    Technical Director, Digitorial Ltd.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 26, '11 at 9:13a
activeJul 26, '11 at 11:05a
posts3
users3
websitelucene.apache.org

3 users in discussion

Ian Lea: 1 post Dawn Zoë Raison: 1 post SBS: 1 post

People

Translate

site design / logo © 2022 Grokbase