Grokbase Groups Lucene dev March 2011
FAQ
Hello,

Facing a Solr issue, I have been told that queries with a term like:
Kiinteistösih*
will not match the Finnish word "Kiinteistösihteeri" and that it's a
known limitation of Lucene.
Instead, using the word directly, without wildcard, works.

Do you confirm this a known limitation/bug?
If so do you have any registered issue about that?

Searching the ML archive and the issue tracker in both SOLR and LUCENE
projects didn't provide me a pointer to this problem.

One of the reference I found on the web talking about this problem is:
http://forum.compass-project.org/message.jspa?messageID=227709
But again, no pointer to a discussion or issue.

Thanks in advance for your help,
Patrick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Search Discussions

  • Robert Muir at Mar 31, 2011 at 1:56 pm

    On Thu, Mar 31, 2011 at 9:51 AM, Patrick ALLAERT wrote:
    Hello,

    Facing a Solr issue, I have been told that queries with a term like:
    Kiinteistösih*
    will not match the Finnish word "Kiinteistösihteeri" and that it's a
    known limitation of Lucene.
    Instead, using the word directly, without wildcard, works.

    Do you confirm this a known limitation/bug?
    If so do you have any registered issue about that?
    this isn't the case, there's no unicode limitation here.

    more likely, your analyzer is configured to lowercase text, so in the
    index Kiinteistösihteeri is really kiinteistösihteeri
    in other words, try kiinteistösih* and see how that works.

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Patrick ALLAERT at Mar 31, 2011 at 4:21 pm

    2011/3/31 Robert Muir <rcmuir@gmail.com>:
    On Thu, Mar 31, 2011 at 9:51 AM, Patrick ALLAERT
    wrote:
    Hello,

    Facing a Solr issue, I have been told that queries with a term like:
    Kiinteistösih*
    will not match the Finnish word "Kiinteistösihteeri" and that it's a
    known limitation of Lucene.
    Instead, using the word directly, without wildcard, works.

    Do you confirm this a known limitation/bug?
    If so do you have any registered issue about that?
    this isn't the case, there's no unicode limitation here.

    more likely, your analyzer is configured to lowercase text, so in the
    index Kiinteistösihteeri is really kiinteistösihteeri
    in other words, try kiinteistösih* and see how that works.
    Following your suggestion, I tested with:
    kiinteistösih*

    but it doesn't show me the intended result.

    I have found the reason why, this is because of the
    ISOLatin1AccentFilterFactory filter which is present for both the
    "index" and "query" analyzer.
    Searching with:
    kiinteistosih*
    did the trick.

    One question remains now: why should I lowercase terms containing a
    wildcard and making the ISO Latin1 accent conversion myself while I do
    have:
    <analyzer type="query">
    ...
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.ISOLatin1AccentFilterFactory"/>
    ...
    for the corresponding fieldType?
    I would have guessed it would does it for me.

    Your reply helped me a lot understanding what's going on.
    Thank you very much for your participation!

    Patrick

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
    For additional commands, e-mail: dev-help@lucene.apache.org
  • Chris Hostetter at Apr 20, 2011 at 8:26 pm

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieslucene
postedMar 31, '11 at 1:52p
activeApr 20, '11 at 8:26p
posts4
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase