FAQ
Hi,

I am happily using Lucene for several years to offer French lexical
analysis tools to university researchers. Today, one of them decided
to analyze the use of the French word "or" (meaning "gold" in French) in
one of my corpus powered by Lucene... And, as you probably already
guessed, no results...

I tried not using the default QueryParser implementation and building
programmatically a simple BooleanQuery with the "or" term (surrounded or
not by double quotes) : no results. I also played a lot with Luke to be
sure that my code is not responsible for this behavior. By the way, my
corpus contains a lot of "or" occurrences and everything else is working
perfectly well for many years.

I first thought that modifying the QueryParser JavaCC lexical grammar
would help (desactivating the OR operator and just keep || ), but the
problem seems wider since even without using the QueryParser I am unable
to find the word "or" in my indexes...

Do you have any clue?

Thank you very much in advance!

Best regards,

Benoit (mercibe)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Robert Muir at Jan 13, 2011 at 4:06 am

    On Wed, Jan 12, 2011 at 10:38 PM, Benoit Mercier wrote:
    Hi,

    I am happily using Lucene for several years to offer French lexical analysis
    tools to university researchers.   Today, one of them decided to analyze the
    use of the French word "or" (meaning "gold" in French) in one of my corpus
    powered by Lucene...  And, as you probably already guessed, no results...
    What analyzer are you using?

    By default, StandardAnalyzer and StopAnalyzer uses a set of english
    stopwords. For french, this list is probably not appropriate.
    If you look at the javadocs, you can pass in your own set of
    stopwords... for lexical analysis maybe this should be an empty set.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Benoit Mercier at Jan 13, 2011 at 5:24 am
    Thank you for your reply.

    I am using my own FrenchAnalyzer for lexical analysis. It extends
    org.apache.lucene.analysis.Analyzer and my stopwords set is empty.

    Benoit
    On 2011-01-12 23:05, Robert Muir wrote:
    On Wed, Jan 12, 2011 at 10:38 PM, Benoit Mercier
    wrote:
    Hi,

    I am happily using Lucene for several years to offer French lexical analysis
    tools to university researchers. Today, one of them decided to analyze the
    use of the French word "or" (meaning "gold" in French) in one of my corpus
    powered by Lucene... And, as you probably already guessed, no results...
    What analyzer are you using?

    By default, StandardAnalyzer and StopAnalyzer uses a set of english
    stopwords. For french, this list is probably not appropriate.
    If you look at the javadocs, you can pass in your own set of
    stopwords... for lexical analysis maybe this should be an empty set.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Hongyinjie at Jan 13, 2011 at 5:51 am
    use "TokenStream" to print out the Tokens after using FrenchAnalyzer; test it work well, and the result is just you want?

    You can use another tools "Luke" to see the "Lucene index File", is there any token of "or"(French) ?
    And you can query using the "Luke"...



    Good Luck



    2011-01-13



    ---------------------------------
    Yinjie Hong
    Ph. D. Student
    College of Computer Science, Zhejiang University

    Tel: 86-571-87952026
    E-Mail: hongyj(at)zju(dot)edu(dot)cn
    Office: Room 400, Teaching Building #11, Yuquan Campus, Zhejiang University






    Thank you for your reply.

    I am using my own FrenchAnalyzer for lexical analysis. It extends
    org.apache.lucene.analysis.Analyzer and my stopwords set is empty.

    Benoit
    On 2011-01-12 23:05, Robert Muir wrote:
    On Wed, Jan 12, 2011 at 10:38 PM, Benoit Mercier
    wrote:
    Hi,

    I am happily using Lucene for several years to offer French lexical analysis
    tools to university researchers. Today, one of them decided to analyze the
    use of the French word "or" (meaning "gold" in French) in one of my corpus
    powered by Lucene... And, as you probably already guessed, no results...
    What analyzer are you using?

    By default, StandardAnalyzer and StopAnalyzer uses a set of english
    stopwords. For french, this list is probably not appropriate.
    If you look at the javadocs, you can pass in your own set of
    stopwords... for lexical analysis maybe this should be an empty set.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Benoit Mercier at Jan 14, 2011 at 3:21 am
    Thank you Robert and Hongyinjie for your support.

    I managed to solved my problem. It was simply a wrong application
    configuration. I am using PerFieldAnalyzerWrapper, with analysers
    injected via Spring. A stupid line inversion in a Spring application
    context file that didn't hurd during several years !

    Conclusion: Lucene 3.0.3 can index and query without any problem
    reserved words like OR, AND or NOT if they are not part of the analyzer
    stopWords.

    Best regards,

    Benoit

    On 2011-01-13 00:50, hongyinjie wrote:
    use "TokenStream" to print out the Tokens after using FrenchAnalyzer; test it work well, and the result is just you want?

    You can use another tools "Luke" to see the "Lucene index File", is there any token of "or"(French) ?
    And you can query using the "Luke"...



    Good Luck



    2011-01-13



    ---------------------------------
    Yinjie Hong
    Ph. D. Student
    College of Computer Science, Zhejiang University

    Tel: 86-571-87952026
    E-Mail: hongyj(at)zju(dot)edu(dot)cn
    Office: Room 400, Teaching Building #11, Yuquan Campus, Zhejiang University






    Thank you for your reply.

    I am using my own FrenchAnalyzer for lexical analysis. It extends
    org.apache.lucene.analysis.Analyzer and my stopwords set is empty.

    Benoit
    On 2011-01-12 23:05, Robert Muir wrote:
    On Wed, Jan 12, 2011 at 10:38 PM, Benoit Mercier
    wrote:
    Hi,

    I am happily using Lucene for several years to offer French lexical analysis
    tools to university researchers. Today, one of them decided to analyze the
    use of the French word "or" (meaning "gold" in French) in one of my corpus
    powered by Lucene... And, as you probably already guessed, no results...
    What analyzer are you using?

    By default, StandardAnalyzer and StopAnalyzer uses a set of english
    stopwords. For french, this list is probably not appropriate.
    If you look at the javadocs, you can pass in your own set of
    stopwords... for lexical analysis maybe this should be an empty set.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 13, '11 at 3:38a
activeJan 14, '11 at 3:21a
posts5
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase