FAQ
Hi all,

I am using the lucene German Stemmer/Analyzer. There seems to be a bug
within the GermanStemmer class. As far as i understand the algorithm the
count variable "substCount"
should be set to 0 before processing the next token.
In the current implementation, the stemmed result will differ for the same
terms after a while.
The easiest solution would be to reset that counter variable in the method:
"private StringBuffer substitute( StringBuffer buffer )" .

best regards
Bernhard


--
To unsubscribe, e-mail:
For additional commands, e-mail:

Search Discussions

  • Otis Gospodnetic at Feb 14, 2002 at 1:18 am
    This email sounds right. substCount variable always increases and
    never gets reset to zero and it seems that it should be reset before
    every substitution, so that its value reflects the number of characters
    substituted in each token.

    I will commit the fix now.
    Gerhard, please correct me if I'm wrong.

    Thanks,
    Otis



    --- Bernhard Messer wrote:
    Hi all,

    I am using the lucene German Stemmer/Analyzer. There seems to be a
    bug
    within the GermanStemmer class. As far as i understand the algorithm
    the
    count variable "substCount"
    should be set to 0 before processing the next token.
    In the current implementation, the stemmed result will differ for the
    same
    terms after a while.
    The easiest solution would be to reset that counter variable in the
    method:
    "private StringBuffer substitute( StringBuffer buffer )" .

    best regards
    Bernhard


    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:

    __________________________________________________
    Do You Yahoo!?
    Send FREE Valentine eCards with Yahoo! Greetings!
    http://greetings.yahoo.com

    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:
  • Gerhard Schwarz at Feb 14, 2002 at 9:40 am
    Hi,

    Otis Gospodnetic wrote:
    This email sounds right. substCount variable always increases and
    never gets reset to zero and it seems that it should be reset before
    every substitution, so that its value reflects the number of characters
    substituted in each token.

    I will commit the fix now.
    Gerhard, please correct me if I'm wrong.
    The fix is correct. Sorry that I did not fix it, I wanted to commit
    it with other changes to the Stemmer and Filter.
    Unfortunatly, I have some serious Problems (CeBIT comes closer)
    and therefore I have not much spare time.


    Ciao,
    Gerhard

    Thanks,
    Otis

    --- Bernhard Messer wrote:
    Hi all,

    I am using the lucene German Stemmer/Analyzer. There seems to be a
    bug
    within the GermanStemmer class. As far as i understand the algorithm
    the
    count variable "substCount"
    should be set to 0 before processing the next token.
    In the current implementation, the stemmed result will differ for the
    same
    terms after a while.
    The easiest solution would be to reset that counter variable in the
    method:
    "private StringBuffer substitute( StringBuffer buffer )" .

    best regards
    Bernhard


    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:
    __________________________________________________
    Do You Yahoo!?
    Send FREE Valentine eCards with Yahoo! Greetings!
    http://greetings.yahoo.com

    --
    To unsubscribe, e-mail: For additional commands, e-mail:
    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedJan 9, '02 at 9:27a
activeFeb 14, '02 at 9:40a
posts3
users3
websitelucene.apache.org

People

Translate

site design / logo © 2023 Grokbase