FAQ
I found a discrepancy in results for an identical search ("processing")
done with lucene and mysql. Seems like lucene is not returning results
where the search word is associated with "-"(hyphen) or '."(period). For
example it didn't returned result for a text that contained
"processing-7-bit" and "straighforwerd.processing" but mysql did. Is there
any settings issue or it is something unavoidable?

Thanks
Tareque
ControlDOCS

PS: In contrast to that, I previously found lucene returning some other
results those mysql didn't. For example search phrase associated with "'"
(apostrophe) and "_"(underscore). I am not complaining about this. Rather
I found it preferable for my purpose.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erik Hatcher at Jun 21, 2005 at 7:20 pm

    On Jun 21, 2005, at 2:59 PM, tareque@controldocs.com wrote:

    I found a discrepancy in results for an identical search
    ("processing")
    done with lucene and mysql. Seems like lucene is not returning results
    where the search word is associated with "-"(hyphen) or
    '."(period). For
    example it didn't returned result for a text that contained
    "processing-7-bit" and "straighforwerd.processing" but mysql did.
    Is there
    any settings issue or it is something unavoidable?

    Thanks
    Tareque
    ControlDOCS

    PS: In contrast to that, I previously found lucene returning some
    other
    results those mysql didn't. For example search phrase associated
    with "'"
    (apostrophe) and "_"(underscore). I am not complaining about this.
    Rather
    I found it preferable for my purpose.
    These all boil down to your choice of analyzer. What analyzer are
    you using?

    As you can see below, "processing-7-bit" is tokenized quite
    differently depending on the analyzer:

    $ ant AnalyzerDemo
    Buildfile: build.xml

    [input] String to analyze: [This string will be analyzed.]
    processing-7-bit
    [echo] Running lia.analysis.AnalyzerDemo...
    [java] Analyzing "processing-7-bit"
    [java] WhitespaceAnalyzer:
    [java] [processing-7-bit]

    [java] SimpleAnalyzer:
    [java] [processing] [bit]

    [java] StopAnalyzer:
    [java] [processing] [bit]

    [java] StandardAnalyzer:
    [java] [processing-7-bit]

    If you're using the StandardAnalyzer, you are not indexing the word
    "processing" at all. Grab the source code from Lucene in Action at
    lucenebook.com and type "ant AnalyzerDemo" to try out the basic
    analyzers.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Tareque at Jun 22, 2005 at 3:35 pm

    On Jun 21, 2005, at 2:59 PM, tareque@controldocs.com wrote:

    I found a discrepancy in results for an identical search
    ("processing")
    done with lucene and mysql. Seems like lucene is not returning results
    where the search word is associated with "-"(hyphen) or
    '."(period). For
    example it didn't returned result for a text that contained
    "processing-7-bit" and "straighforwerd.processing" but mysql did.
    Is there
    any settings issue or it is something unavoidable?

    Thanks
    Tareque
    ControlDOCS

    PS: In contrast to that, I previously found lucene returning some
    other
    results those mysql didn't. For example search phrase associated
    with "'"
    (apostrophe) and "_"(underscore). I am not complaining about this.
    Rather
    I found it preferable for my purpose.
    These all boil down to your choice of analyzer. What analyzer are
    you using?

    As you can see below, "processing-7-bit" is tokenized quite
    differently depending on the analyzer:

    $ ant AnalyzerDemo
    Buildfile: build.xml

    [input] String to analyze: [This string will be analyzed.]
    processing-7-bit
    [echo] Running lia.analysis.AnalyzerDemo...
    [java] Analyzing "processing-7-bit"
    [java] WhitespaceAnalyzer:
    [java] [processing-7-bit]

    [java] SimpleAnalyzer:
    [java] [processing] [bit]

    [java] StopAnalyzer:
    [java] [processing] [bit]

    [java] StandardAnalyzer:
    [java] [processing-7-bit]

    If you're using the StandardAnalyzer, you are not indexing the word
    "processing" at all. Grab the source code from Lucene in Action at
    lucenebook.com and type "ant AnalyzerDemo" to try out the basic
    analyzers.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    Thanks! Using StopAnalyzer helped solving the problem. Is there any detail
    documentation of what each of this analyzers do?

    Tareque


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Erik Hatcher at Jun 22, 2005 at 4:29 pm

    On Jun 22, 2005, at 11:35 AM, tareque@controldocs.com wrote:
    Thanks! Using StopAnalyzer helped solving the problem. Is there any
    detail
    documentation of what each of this analyzers do?
    Here are some pointers:

    - Lucene's javadocs give a brief description, such as <http://
    lucene.apache.org/java/docs/api/org/apache/lucene/analysis/
    StopAnalyzer.html>

    - The source code is the ultimate documentation: <http://
    svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/
    lucene/analysis/StopAnalyzer.java?rev=168970&view=markup> - look at
    the tokenStream method

    - Several Lucene articles: <http://wiki.apache.org/jakarta-
    lucene/Resources> with the most relevant being my java.net article
    here: <http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html>
    where the AnalysisDemo code is provided.

    - And last but certainly not least, "Lucene in Action" :) You
    can search for details of analyzers at the lucenebook.com site, like
    this: <http://www.lucenebook.com/search?query=StopAnalyzer> The
    Analysis chapter in LIA provides in-depth details of each of the
    built-in analyzers.

    Erik


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 21, '05 at 6:59p
activeJun 22, '05 at 4:29p
posts4
users2
websitelucene.apache.org

2 users in discussion

Erik Hatcher: 2 posts Tareque: 2 posts

People

Translate

site design / logo © 2022 Grokbase