For your field configuration, the TokenStream you get with getAnyTokenStream is built from TermVectors.

What tokenizer do you use for populating your field ? Have you check with luke that your term vectors are Ok ?

And what version of lucene ? A change was made on this code recently, for another issue (apparently unrelated, but who knows ?) See https://issues.apache.org/jira/browse/LUCENE-2874


De : Cescy
Envoyé : vendredi 18 mars 2011 07:32
À : java-user; Pierre GOSSE
Objet : Re:RE: About highlighter

Yes, I only search the "contents" field. And I can print the whole contents by doc.get("contents") if there are any keywords in it. And if the number of words is too large, it is cannot highlight the keywords at end part of the contents, as if highlight have a word limitation.

document.add( new Field( "contens", value, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS ) );


------------------ Original ------------------
From: "Pierre GOSSE"<[email protected]>;
Date: Thu, Mar 17, 2011 04:25 PM
To: "java-[email protected]"<[email protected]>;
Subject: RE: About highlighter

500 is the max size of text fragments to be returned by highlight. It shouldn't be the problem here, as far as I understand highlight.

Gong li, how is defined the field "contents" ? Is it the only field on which the search is made ?


-----Message d'origine-----
De : Ian Lea
Envoyé : mercredi 16 mars 2011 22:29
�� : [email protected]
Objet : Re: About highlighter

I know nothing about highlighting but that 500 looks like a good place
to start investigating.


On Tue, Mar 15, 2011 at 8:47 PM, Cescy wrote:

My highlight code is shown as following:

QueryScorer scorer = new QueryScorer(query);
Highlighter highlighter = new Highlighter(simpleHTMLFormatter, scorer);
highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer, 500));
String contents = doc.get("contents");
TokenStream tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), topDocs.scoreDocs[i].doc, "contents", doc, analyzer);
String[] snippet = highlighter.getBestFragments(tokenStream, contents, 10);

snippet is the result contexts and then I will print out them on the screen.
But If I may search for a keyword at the last few paragraph and the essay is too long (1000-2000 words), it will return "document found" and snippet..length=0 (i.e. document is found but context is NOT found). Why???

How could I fix the problem?
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
postedMar 18, '11 at 8:28a
activeMar 18, '11 at 8:28a

1 user in discussion

Pierre GOSSE: 1 post



site design / logo © 2023 Grokbase