FAQ
Dear all,

I am using the following code to search indexed data. However, when
the searchKeyword contains some special characters, such as "//", ":",
"+", "-", ".", and even digital numbers, the query removes some
required characters or splits the keyword. Sometimes, it causes no
results although I am sure the results exist. May I cancel the feature
so that the query does not change my original searchKeyword?

......
IndexSearcher searcher = new IndexSearcher(fsDirectory);
Analyzer chineseAnalyzer = new ChineseAnalyzer();
QueryParser queryParser = new QueryParser(searchField, chineseAnalyzer);
Query query = queryParser.Parse(DBTools.FilterKeyFieldValue(searchKeyword));
Hits results = searcher.Search(query);
......

Thanks so much!
LB

Search Discussions

  • Granroth, Neal V. at Aug 20, 2009 at 1:55 pm
    Use of QueryParser to construct the query causes this, with the word breaking specifics being determined by the analyzer that you selected.

    To avoid word breaking and symbol replacement, you could use a different analyzer; but it would be best to construct the query directly using the BooleanQuery, TermQuery and related classes. The latter is preferred because some symbols (for example "+", "-") are an essential part of the query syntax that QueryParser recognizes. For example when run through QueryParser the search [ +red +blue -green ] is identical to the search [ red AND blue NOT green ]

    To directly construct a search that does not strip out the "+" symbol you could do something like this to search for the string "red+green" in a given field:
    Query query = new TermQuery(new Term(searchField,"red+green"));

    The [ red AND blue NOT green ] search from above would be constructed like this:

    BooleanQuery query = new BooleanQuery();
    query.Add(new TermQuery(new Term(searchField,"red")), BooleanClause.Occur.MUST);
    query.Add(new TermQuery(new Term(searchField,"blue")), BooleanClause.Occur.MUST);
    query.Add(new TermQuery(new Term(searchField,"green")), BooleanClause.Occur.MUST_NOT);

    One other consideration. The analyzer used to add documents to the Lucene index will also determines how the original content is broken into searchable terms. If I recall correctly, the StandardAnalyzer will keep the special symbols that comprise a phone number together as a searchable unit; this may not be true for other analyzers.

    There is a very useful tool called Luke that can be used to inspect an index and run trial searches using different analyzers.

    Hope this helps.

    -- Neal

    -----Original Message-----
    From: Li Bing
    Sent: Thursday, August 20, 2009 12:33 AM
    To: lucene-net-user@incubator.apache.org
    Subject: Lucene Query Questions

    Dear all,

    I am using the following code to search indexed data. However, when
    the searchKeyword contains some special characters, such as "//", ":",
    "+", "-", ".", and even digital numbers, the query removes some
    required characters or splits the keyword. Sometimes, it causes no
    results although I am sure the results exist. May I cancel the feature
    so that the query does not change my original searchKeyword?

    ......
    IndexSearcher searcher = new IndexSearcher(fsDirectory);
    Analyzer chineseAnalyzer = new ChineseAnalyzer();
    QueryParser queryParser = new QueryParser(searchField, chineseAnalyzer);
    Query query = queryParser.Parse(DBTools.FilterKeyFieldValue(searchKeyword));
    Hits results = searcher.Search(query);
    ......

    Thanks so much!
    LB

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouplucene-net-user @
categorieslucene
postedAug 20, '09 at 5:33a
activeAug 20, '09 at 1:55p
posts2
users2
websitelucene.apache.org

2 users in discussion

Granroth, Neal V.: 1 post Li Bing: 1 post

People

Translate

site design / logo © 2022 Grokbase