FAQ
Hi,

I'm building my own BooleanQuery (rather than using Query Parser). That's because I need different defaults from my users:
If a user types:  java program
I need to run the query: +java* +program* (namely AND search, with Prefix so as to hit "programS", "programMER").

So naively I split the user's input into words, and build my query term-by-term:
String[] words= userInput.split(" ");
BooleanQuery query=new BooleanQuery();
for(String word: words)
query.add(new PrefixQuery(new Term("topic", word)), Occur.MUST);

But since my index is built with StandardAnalyzer, I'm having trouble with Stop Words.
If the user types: program of java
Then of course my query (+program* +of* +java+) returns zero results.

Is there an easy solution?
I'd like to keep using StandardAnalyzer (I don't want to index by stupid keywords such as "of"). I would just like stop-words to be removed from my query (as QueryParser does).
Is there some utility method for this? Direct access to the list of stop-words?

Thanks




____________________________________________________________________________________
Bored stiff? Loosen up...
Download and play hundreds of games for free on Yahoo! Games.
http://games.yahoo.com/games/front

Search Discussions

  • Ian Lea at Feb 9, 2011 at 9:59 am
    Have you considered using stemming instead? Sounds like that might
    make most of your problems go away and achieve the same result.

    I'm not aware of a utility method to remove stop words from a string
    but there are ways of passing data through analyzers/tokenizers and
    grabbing the output. StandardAnalyzer stores the stop words in
    STOP_WORDS_SET and you might be able to use that directly, or pass in
    your own set of stop works which you would obviously have access to.

    If you stick with your split/word by word approach, watch out for
    punctuation.and Mixed Case and other complications.


    --
    Ian.
    On Wed, Feb 9, 2011 at 8:43 AM, sol myr wrote:
    Hi,

    I'm building my own BooleanQuery (rather than using Query Parser). That's because I need different defaults from my users:
    If a user types:  java program
    I need to run the query: +java* +program* (namely AND search, with Prefix so as to hit "programS", "programMER").

    So naively I split the user's input into words, and build my query term-by-term:
    String[] words= userInput.split(" ");
    BooleanQuery query=new BooleanQuery();
    for(String word: words)
    query.add(new PrefixQuery(new Term("topic", word)), Occur.MUST);

    But since my index is built with StandardAnalyzer, I'm having trouble with Stop Words.
    If the user types: program of java
    Then of course my query (+program* +of* +java+) returns zero results.

    Is there an easy solution?
    I'd like to keep using StandardAnalyzer (I don't want to index by stupid keywords such as "of"). I would just like stop-words to be removed from my query (as QueryParser does).
    Is there some utility method for this? Direct access to the list of stop-words?

    Thanks




    ____________________________________________________________________________________
    Bored stiff? Loosen up...
    Download and play hundreds of games for free on Yahoo! Games.
    http://games.yahoo.com/games/front
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Sol myr at Feb 10, 2011 at 8:35 am
    Thanks so much - I used STOP_WORDS_SET and it works fine (luckily, punctuation and case are not a problem in our case).
    Thanks !

    --- On Wed, 2/9/11, Ian Lea wrote:

    From: Ian Lea <ian.lea@gmail.com>
    Subject: Re: [Lucene] custom Query, and Stop Words
    To: java-user@lucene.apache.org
    Date: Wednesday, February 9, 2011, 1:58 AM

    Have you considered using stemming instead?  Sounds like that might
    make most of your problems go away and achieve the same result.

    I'm not aware of a utility method to remove stop words from a string
    but there are ways of passing data through analyzers/tokenizers and
    grabbing the output. StandardAnalyzer stores the stop words in
    STOP_WORDS_SET and you might be able to use that directly, or pass in
    your own set of stop works which you would obviously have access to.

    If you stick with your split/word by word approach, watch out for
    punctuation.and Mixed Case and other complications.


    --
    Ian.
    On Wed, Feb 9, 2011 at 8:43 AM, sol myr wrote:
    Hi,

    I'm building my own BooleanQuery (rather than using Query Parser). That's because I need different defaults from my users:
    If a user types:  java program
    I need to run the query: +java* +program* (namely AND search, with Prefix so as to hit "programS", "programMER").

    So naively I split the user's input into words, and build my query term-by-term:
    String[] words= userInput.split(" ");
    BooleanQuery query=new BooleanQuery();
    for(String word: words)
    query.add(new PrefixQuery(new Term("topic", word)), Occur.MUST);

    But since my index is built with StandardAnalyzer, I'm having trouble with Stop Words.
    If the user types: program of java
    Then of course my query (+program* +of* +java+) returns zero results.

    Is there an easy solution?
    I'd like to keep using StandardAnalyzer (I don't want to index by stupid keywords such as "of"). I would just like stop-words to be removed from my query (as QueryParser does).
    Is there some utility method for this? Direct access to the list of stop-words?

    Thanks




    ____________________________________________________________________________________
    Bored stiff? Loosen up...
    Download and play hundreds of games for free on Yahoo! Games.
    http://games.yahoo.com/games/front
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedFeb 9, '11 at 8:44a
activeFeb 10, '11 at 8:35a
posts3
users2
websitelucene.apache.org

2 users in discussion

Sol myr: 2 posts Ian Lea: 1 post

People

Translate

site design / logo © 2022 Grokbase