[ https://issues.apache.org/jira/browse/LUCENE-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved LUCENE-2070.

Assignee: Robert Muir
Fix Version/s: 3.1
Resolution: Fixed

Committed revision 1000675, 1000678 (3x)
document LengthFilter wrt Unicode 4.0

Key: LUCENE-2070
URL: https://issues.apache.org/jira/browse/LUCENE-2070
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/analyzers
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Trivial
Fix For: 3.1, 4.0

Attachments: LUCENE-2070.patch

LengthFilter calculates its min/max length from TermAttribute.termLength()
This is not characters, but instead UTF-16 code units.
In my opinion this should not be changed, merely documented.
If we changed it, it would have an adverse performance impact because we would have to actually calculate Character.codePointCount() on the text.
If you feel strongly otherwise, fixing it to count codepoints would be a trivial patch, but I'd rather not hurt performance.
I admit I don't fully understand all the use cases for this filter.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
postedSep 24, '10 at 12:49a
activeSep 24, '10 at 12:49a

1 user in discussion

Robert Muir (JIRA): 1 post



site design / logo © 2023 Grokbase