[
https://issues.apache.org/jira/browse/LUCENE-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978130#action_12978130 ]
Steven Rowe edited comment on LUCENE-2847 at 1/6/11 12:16 AM:
--------------------------------------------------------------
New patch, with the following changes:
# Added a new target {{gen-uax29-supp-macros}} to {{modules/analysis/icu/build.xml}}, and a {{<subant>}} call to it from the {{jflex}} task in {{modules/analysis/common/build.xml}}.
# Included {{SUPPLEMENTARY.jflex-macro}} in {{UAX29URLEmailTokenizer.jflex}} in the same way as it is included in {{StandardTokenizer.jflex}}
# Copied the simple supplementary characters test from {{TestStandardAnalyzer.java}} to {{TestUAX29URLEmailTokenizer.java}}.
# Modified the CHANGES.txt entry for the UAX#29 issues to include a reference to this issue.
All tests pass.
was (Author: steve_rowe):
New patch, with the following changes:
# Added a new target {{gen-uax29-supp-macros}} to {{modules/analysis/icu/build.xml}}, and a {{<subant>}} call to it from the {{jflex}} task in {{modules/analysis/common/build.xml}}.
# Included SUPPLEMENTARY.jflex-macro}} in {{UAX29URLEmailTokenizer.jflex}} in the same way as it is included in {{StandardTokenizer.jflex}}
# Copied the simple supplementary characters test from {{TestStandardAnalyzer.java}} to {{TestUAX29URLEmailTokenizer.java}}.
# Modified the CHANGES.txt entry for the UAX#29 issues to include a reference to this issue.
All tests pass.
Support all of unicode in StandardTokenizer
-------------------------------------------
Key: LUCENE-2847
URL:
https://issues.apache.org/jira/browse/LUCENE-2847Project: Lucene - Java
Issue Type: Bug
Components: Analysis
Reporter: Robert Muir
Fix For: 3.1, 4.0
Attachments: LUCENE-2847.patch, LUCENE-2847.patch, LUCENE-2847.patch
StandardTokenizer currently only supports the BMP.
If it encounters characters outside of the BMP, it just discards them...
it should instead implement fully implement UAX#29 across all of unicode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org