[ https://issues.apache.org/jira/browse/LUCENE-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784421#action_12784421 ]

DM Smith commented on LUCENE-2102:

bq. but non-NFC text doesn't work correctly throughout most of lucene's analysis components as it is now anyway, so I don't think we should worry about it right now. Maybe we could add a comment for the future though.

It might be good to note the NFC (NFKC?) requirement in the JavaDoc.

Maybe its just me, but I think it is critical to normalize the input to Lucene for both indexing and searching. Unless a NFCNormalizingFilter is added to Lucene, I think it is the responsibility of the caller.
LowerCaseFilter for Turkish language

Key: LUCENE-2102
URL: https://issues.apache.org/jira/browse/LUCENE-2102
Project: Lucene - Java
Issue Type: Improvement
Components: Analysis
Affects Versions: 3.0
Reporter: Ahmet Arslan
Assignee: Robert Muir
Priority: Minor
Fix For: 3.1

Attachments: LUCENE-2102.patch

java.lang.Character.toLowerCase() converts 'I' to 'i' however in Turkish alphabet lowercase of 'I' is not 'i'. It is LATIN SMALL LETTER DOTLESS I.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 9 of 59 | next ›
Discussion Overview
groupjava-dev @
postedDec 1, '09 at 8:23p
activeDec 5, '09 at 12:47p

1 user in discussion

Simon Willnauer (JIRA): 59 posts



site design / logo © 2018 Grokbase