FAQ
Hi all,

Snowball stemmers are part of Lucene, but for few languages only. We
have documents in various languages and so need stemmers for many
languages (in particular polish). One of the ideas is to use ispell
dictionaries. There are ispell dicts for many languages and so this
solution is good for multilingual environment. Maybe this is not
perfect place to ask, but does anyone know about java stemmer using
ispell dicts?
There is aspell-like java spell-checker (Jazzy) but I could not see
how to use it for stemming. We are considering porting part of
postgres tsearch module to java, because tsearch uses ispell dicts for
stemming.
But maybe there is a better way or there are people working on
something like that?

Thanks and regards,
wojtek

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Karl Wettin at Apr 1, 2008 at 6:08 pm

    Wojtek H skrev:
    Snowball stemmers are part of Lucene, but for few languages only. We
    org.apache.lucene.analysis contains a few more stemmers.
    have documents in various languages and so need stemmers for many
    languages (in particular polish).
    Have you seen Stempel?

    http://www.getopt.org/stempel/



    karl

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mathieu Lecarme at Apr 2, 2008 at 11:57 am

    Wojtek H a écrit :
    Hi all,

    Snowball stemmers are part of Lucene, but for few languages only. We
    have documents in various languages and so need stemmers for many
    languages (in particular polish). One of the ideas is to use ispell
    dictionaries. There are ispell dicts for many languages and so this
    solution is good for multilingual environment. Maybe this is not
    perfect place to ask, but does anyone know about java stemmer using
    ispell dicts?
    There is aspell-like java spell-checker (Jazzy) but I could not see
    how to use it for stemming. We are considering porting part of
    postgres tsearch module to java, because tsearch uses ispell dicts for
    stemming.
    But maybe there is a better way or there are people working on
    something like that?
    ispell data is nice for phonetic, and for enumerate a huge list of
    words. The ispell dictionnary is one way : pseudo root => word, it looks
    hard to build the inverse function, lemme is splitted in multiple affix.
    But it can be used to find rules, just like
    http://www.getopt.org/stempel/ do.

    M.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Hannu Väisänen at Apr 15, 2008 at 9:51 am

    Wojtek H wrote:
    Snowball stemmers are part of Lucene, but for few languages only
    But maybe there is a better way or there are people working on
    something like that?
    I use Malaga (http://home.arcor.de/bjoern-beutel/malaga/)
    for lemmatization and index the result.

    http://joyds1.joensuu.fi/programs/index.html

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedApr 1, '08 at 9:59a
activeApr 15, '08 at 9:51a
posts4
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase