FAQ
Hello list,

I'm strugging again with the highlighter. I don't understand why I obtain sporadically InvalidTokenOffsetsException.

The mission: given a query, detect which field was matched, among the names of the concepts: there can be several names for a given concept, also in one language. Concepts are documents and names are in fields name-xx where xx is the two-letter-language.

Here's the method I'm using:

public String computeMatchedField(int docNum, Document doc, Analyzer analyzer, Query query) throws IOException {
//System.out.println("----- computing matched field for query " + query + " on document " + doc.get("uri"));
query = query.rewrite(this.reader);
String found = null;
float maxScore = 0;
try {
for(Field f: (List<Field>) doc.getFields()) {
QueryScorer scorer = new QueryScorer(query,reader,f.name());
if(!f.name().startsWith("name-")) continue;
//System.out.println("Measuring field " + f.name() + ": " + f.stringValue());
String text = f.stringValue();
TokenStream tokenStream = TokenSources.getAnyTokenStream(reader,docNum, f.name(), doc, analyzer);
SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
Highlighter highlighter = new Highlighter(htmlFormatter, scorer);
TextFragment[] frags = highlighter.getBestTextFragments(tokenStream, text, false, 1);
if(frags==null || frags.length==0) continue;
float score = frags[0].getScore();
//System.out.println("Score: " + score);
if(score > maxScore) {
maxScore = score;
found = frags[0].toString();
}
}
} catch(Exception ex) {ex.printStackTrace();}
return found;
}

Unfortunately, I have to catch InvalidTokenOffsetsException which does happen sometimes, not always.
When it occurs, it stops the highlighting (the detected field is "null") and also costs quite some time.

What am I doing wrong?
I tried making my own tokenStream with no difference.

thanks in advance

paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Paul Libbrecht at Sep 6, 2010 at 6:21 am
    ping!
    Any hope for help here?
    I'm a bit stuck before deploying a release.

    thanks in advance

    paul

    On 3 sept. 2010, at 14:05, Paul Libbrecht wrote:


    Hello list,

    I'm strugging again with the highlighter. I don't understand why I obtain sporadically InvalidTokenOffsetsException.

    The mission: given a query, detect which field was matched, among the names of the concepts: there can be several names for a given concept, also in one language. Concepts are documents and names are in fields name-xx where xx is the two-letter-language.

    Here's the method I'm using:

    public String computeMatchedField(int docNum, Document doc, Analyzer analyzer, Query query) throws IOException {
    //System.out.println("----- computing matched field for query " + query + " on document " + doc.get("uri"));
    query = query.rewrite(this.reader);
    String found = null;
    float maxScore = 0;
    try {
    for(Field f: (List<Field>) doc.getFields()) {
    QueryScorer scorer = new QueryScorer(query,reader,f.name());
    if(!f.name().startsWith("name-")) continue;
    //System.out.println("Measuring field " + f.name() + ": " + f.stringValue());
    String text = f.stringValue();
    TokenStream tokenStream = TokenSources.getAnyTokenStream(reader,docNum, f.name(), doc, analyzer);
    SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
    Highlighter highlighter = new Highlighter(htmlFormatter, scorer);
    TextFragment[] frags = highlighter.getBestTextFragments(tokenStream, text, false, 1);
    if(frags==null || frags.length==0) continue;
    float score = frags[0].getScore();
    //System.out.println("Score: " + score);
    if(score > maxScore) {
    maxScore = score;
    found = frags[0].toString();
    }
    }
    } catch(Exception ex) {ex.printStackTrace();}
    return found;
    }

    Unfortunately, I have to catch InvalidTokenOffsetsException which does happen sometimes, not always.
    When it occurs, it stops the highlighting (the detected field is "null") and also costs quite some time.

    What am I doing wrong?
    I tried making my own tokenStream with no difference.

    thanks in advance

    paul


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 3, '10 at 12:06p
activeSep 6, '10 at 6:21a
posts2
users1
websitelucene.apache.org

1 user in discussion

Paul Libbrecht: 2 posts

People

Translate

site design / logo © 2022 Grokbase