FAQ
Hi all.

I created a test using Lucene 2.3. When run, this generates a single token:

public static void main(String[] args) throws Exception {
String string =
"\u0412\u0430\u0441\u0438\u0301\u043B\u044C\u0435\u0432";
StandardAnalyzer analyser = new StandardAnalyzer();
TokenStream stream = analyser.tokenStream("text", new
StringReader(string));
Token token;
while ((token = stream.next()) != null)
{
System.out.println(new String(token.termBuffer(), 0,
token.termLength()));
}
}

I then wrote much a similar test on Lucene 3.0, but specifying the
version of StandardAnalyzer behaviour to use:

public static void main(String[] args) throws Exception {
String string =
"\u0412\u0430\u0441\u0438\u0301\u043B\u044C\u0435\u0432";
StandardAnalyzer analyser = new StandardAnalyzer(Version.LUCENE_23);
TokenStream stream = analyser.tokenStream("text", new
StringReader(string));
TermAttribute termAttribute = stream.getAttribute(TermAttribute.class);
while (stream.incrementToken())
{
System.out.println(termAttribute.term());
}
}

But this generates two tokens, splitting at the accent. (I assume
that this accent issue itself has already been fixed since v3.1.)

I was under the impression that the Version parameter was for
supporting this sort of backwards compatibility, so that indexes
created in the past could still be searched meaningfully using an
updated version of Lucene, but have I found a gap in the backwards
compatibility support here?

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJul 11, '11 at 2:44a
activeJul 11, '11 at 2:44a
posts1
users1
websitelucene.apache.org

1 user in discussion

Trejkaz: 1 post

People

Translate

site design / logo © 2022 Grokbase