Grokbase Groups Lucene dev April 2011
[ ]

Robert Muir commented on LUCENE-3038:

This is a duplicate of LUCENE-3022 (as you are using this onlyLongestMatch=true option).

Can we discuss over there please so this discussion is all in one place? Thanks for creating the issue.
DictionaryCompoundWordTokenFilter fails to create some tokens for final parts of words

Key: LUCENE-3038
Project: Lucene - Java
Issue Type: Bug
Components: Analysis
Affects Versions: 3.1, 4.0
Reporter: Filip Svendsen
Fix For: 3.1, 4.0

Attachments: LUCENE-3038.patch

DictionaryCompoundWordTokenFilter: Due to an off-by-one error, a word component placed last in a compound word, will not get a token if its length is equal to the minimal sub-word length.
min sub-word length: 4
Dictionary: {"alfa", "beta"}
word: "alfabeta"
Created tokens: {"alfabeta", "alfa"}
Expected tokens: {"alfabeta", "alfa", "beta"}
I have a patch with a testcase that fails on versions 3.1 and 4.0 (probably for everything between as well, and for previous versions), along with a bugfix.
This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 4 | next ›
Discussion Overview
groupdev @
postedApr 19, '11 at 9:48p
activeApr 19, '11 at 11:04p

1 user in discussion

Robert Muir (JIRA): 4 posts



site design / logo © 2021 Grokbase