FAQ
Hi everybody,

I have a pretty generic question about token filters, and I am not really sure whether it is a developer or a configuration question:

How exactly do I make lucene map letters to each other, e.g. make it treat both 'a' and 'á' as one and the same letter, or both '写' and '寫' one and the same character? I am sure this question has appeared before and there are sample implementations or sample configuration files out there, but I could not find them on my own.

I only need to map single letters (i.e. no 'oe' <=> 'ö'), but in a multi-byte charset. I have some modest experience in programming in java, but am far from being a guru.

Any help is appreciated.

Thanks in advance,

Jan
--
Empfehlen Sie GMX DSL Ihren Freunden und Bekannten und wir
belohnen Sie mit bis zu 50,- Euro! https://freundschaftswerbung.gmx.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Michael Sokolov at Jul 3, 2011 at 4:06 pm
    You should take a look at org.apache.solr.analysis.MappingCharFilter,
    which provides a generic table-based approach for use with solr. There
    are also a lot of other interesting CharFilters in the same package.
    For lucene-only use, there's
    org.apache.lucene.analysis.icu.ICUFoldingFilter, which looks very
    promising, and I'm sure there are others I don't know about.

    -Mike
    On 7/3/2011 8:38 AM, Jan Rothhaar wrote:
    Hi everybody,

    I have a pretty generic question about token filters, and I am not really sure whether it is a developer or a configuration question:

    How exactly do I make lucene map letters to each other, e.g. make it treat both 'a' and 'á' as one and the same letter, or both '写' and '寫' one and the same character? I am sure this question has appeared before and there are sample implementations or sample configuration files out there, but I could not find them on my own.

    I only need to map single letters (i.e. no 'oe'<=> 'ö'), but in a multi-byte charset. I have some modest experience in programming in java, but am far from being a guru.

    Any help is appreciated.

    Thanks in advance,

    Jan

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 3, '11 at 12:39p
activeJul 3, '11 at 4:06p
posts2
users2
websitelucene.apache.org

2 users in discussion

Jan Rothhaar: 1 post Michael Sokolov: 1 post

People

Translate

site design / logo © 2022 Grokbase