Grokbase Groups Lucene dev March 2011
FAQ

2011/3/31 Robert Muir <rcmuir@gmail.com>:
On Thu, Mar 31, 2011 at 9:51 AM, Patrick ALLAERT
wrote:
Hello,

Facing a Solr issue, I have been told that queries with a term like:
Kiinteistösih*
will not match the Finnish word "Kiinteistösihteeri" and that it's a
known limitation of Lucene.
Instead, using the word directly, without wildcard, works.

Do you confirm this a known limitation/bug?
If so do you have any registered issue about that?
this isn't the case, there's no unicode limitation here.

more likely, your analyzer is configured to lowercase text, so in the
index Kiinteistösihteeri is really kiinteistösihteeri
in other words, try kiinteistösih* and see how that works.
Following your suggestion, I tested with:
kiinteistösih*

but it doesn't show me the intended result.

I have found the reason why, this is because of the
ISOLatin1AccentFilterFactory filter which is present for both the
"index" and "query" analyzer.
Searching with:
kiinteistosih*
did the trick.

One question remains now: why should I lowercase terms containing a
wildcard and making the ISO Latin1 accent conversion myself while I do
have:
<analyzer type="query">
...
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
...
for the corresponding fieldType?
I would have guessed it would does it for me.

Your reply helped me a lot understanding what's going on.
Thank you very much for your participation!

Patrick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 4 | next ›
Discussion Overview
groupdev @
categorieslucene
postedMar 31, '11 at 1:52p
activeApr 20, '11 at 8:26p
posts4
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase