FAQ
Hi,

I'm using Lucene 3.0.3. I'm extracting snippets using
FastVectorHighlighter, for some snippets (I think always when searching
for exact matches, quoted) the fragment is null.

Code looks like:


query = QueryParser.escape(query);
if (exact) {
query = "\""+query+"\"";
}
BooleanQuery allQ = new BooleanQuery();
Query bodyQ = new QueryParser(Version.LUCENE_30, BODY, analyser).parse(query);
termQ.add(new BooleanClause(bodyQ, Occur.SHOULD));
// add more queries
allQ.add(new BooleanClause(termQ, Occur.MUST));

TopDocs res = is.search(allQ, null, upperRange);
FastVectorHighlighter highlighter = new FastVectorHighlighter(true, true);

for (int i = in.getLowerRange(); i < Math.min(res.totalHits, upperRange); i++) {

String[] bodyFrags =
highlighter.getBestFragments(highlighter.getFieldQuery(bodyQ),
is.getIndexReader(), res.scoreDocs[i].doc, BODY, 120, 2);

// bodyFrags is null
}


I do get a hit, and the content with the exact match is coming from the
BODY field, but I cann't seem to get the fragment out.

Any clues,

Thanks

- Joel

Search Discussions

  • Pierre GOSSE at May 27, 2011 at 11:57 am
    Hi,

    Maybe is it related to :
    https://issues.apache.org/jira/browse/LUCENE-3087

    Pierre

    -----Message d'origine-----
    De : Joel Halbert
    Envoyé : vendredi 27 mai 2011 12:57
    À : lucene users
    Objet : FastVectorHighlighter.getBestFragments returning null

    Hi,

    I'm using Lucene 3.0.3. I'm extracting snippets using
    FastVectorHighlighter, for some snippets (I think always when searching
    for exact matches, quoted) the fragment is null.

    Code looks like:


    query = QueryParser.escape(query);
    if (exact) {
    query = "\""+query+"\"";
    }
    BooleanQuery allQ = new BooleanQuery();
    Query bodyQ = new QueryParser(Version.LUCENE_30, BODY, analyser).parse(query);
    termQ.add(new BooleanClause(bodyQ, Occur.SHOULD));
    // add more queries
    allQ.add(new BooleanClause(termQ, Occur.MUST));

    TopDocs res = is.search(allQ, null, upperRange);
    FastVectorHighlighter highlighter = new FastVectorHighlighter(true, true);

    for (int i = in.getLowerRange(); i < Math.min(res.totalHits, upperRange); i++) {

    String[] bodyFrags =
    highlighter.getBestFragments(highlighter.getFieldQuery(bodyQ),
    is.getIndexReader(), res.scoreDocs[i].doc, BODY, 120, 2);

    // bodyFrags is null
    }


    I do get a hit, and the content with the exact match is coming from the
    BODY field, but I cann't seem to get the fragment out.

    Any clues,

    Thanks

    - Joel
  • Joel Halbert at May 27, 2011 at 12:05 pm
    Hi Pierre,

    Thanks for the pointer. So if I understand correctly this bug definitely
    applies to fields with TermVector.WITH_OFFSETS.

    My field uses TermVector.WITH_POSITIONS_OFFSETS)

    I wasn't sure from the bug report if it applies to
    WITH_POSITIONS_OFFSETS as well? It looks like it might?

    - Joel
    On Fri, 2011-05-27 at 13:56 +0200, Pierre GOSSE wrote:

    Hi,

    Maybe is it related to :
    https://issues.apache.org/jira/browse/LUCENE-3087

    Pierre

    -----Message d'origine-----
    De : Joel Halbert
    Envoyé : vendredi 27 mai 2011 12:57
    À : lucene users
    Objet : FastVectorHighlighter.getBestFragments returning null

    Hi,

    I'm using Lucene 3.0.3. I'm extracting snippets using
    FastVectorHighlighter, for some snippets (I think always when searching
    for exact matches, quoted) the fragment is null.

    Code looks like:


    query = QueryParser.escape(query);
    if (exact) {
    query = "\""+query+"\"";
    }
    BooleanQuery allQ = new BooleanQuery();
    Query bodyQ = new QueryParser(Version.LUCENE_30, BODY, analyser).parse(query);
    termQ.add(new BooleanClause(bodyQ, Occur.SHOULD));
    // add more queries
    allQ.add(new BooleanClause(termQ, Occur.MUST));

    TopDocs res = is.search(allQ, null, upperRange);
    FastVectorHighlighter highlighter = new FastVectorHighlighter(true, true);

    for (int i = in.getLowerRange(); i < Math.min(res.totalHits, upperRange); i++) {

    String[] bodyFrags =
    highlighter.getBestFragments(highlighter.getFieldQuery(bodyQ),
    is.getIndexReader(), res.scoreDocs[i].doc, BODY, 120, 2);

    // bodyFrags is null
    }


    I do get a hit, and the content with the exact match is coming from the
    BODY field, but I cann't seem to get the fragment out.

    Any clues,

    Thanks

    - Joel
  • Pierre GOSSE at May 27, 2011 at 1:38 pm
    Actually, this second issue was opened since Highlight seams to ignore positions and treats WITH_POSITIONS_OFFSETS like it was WITH_OFFSETS.

    https://issues.apache.org/jira/browse/LUCENE-3091

    As far as I remember, the trouble is that to trust positions in the tokenstream built from termvector, you have to know the field properties, and it isn't accessible at the code level where the decision is made to use offset or positions. So some modifications are to be made to pass this information with the token stream, or to give access to field properties to the highlighter. Neither of those seamed straightforward. But I really did take a very short look so I'm sure of nothing there.

    I hope that someone of greater vision will find an elegant solution to this :). But otherwise I hope to find some time to take a look in a couple weeks, while I've part of the context still in mind.

    Pierre

    -----Message d'origine-----
    De : Joel Halbert
    Envoyé : vendredi 27 mai 2011 14:05
    À : java-user@lucene.apache.org
    Objet : RE: FastVectorHighlighter.getBestFragments returning null

    Hi Pierre,

    Thanks for the pointer. So if I understand correctly this bug definitely
    applies to fields with TermVector.WITH_OFFSETS.

    My field uses TermVector.WITH_POSITIONS_OFFSETS)

    I wasn't sure from the bug report if it applies to
    WITH_POSITIONS_OFFSETS as well? It looks like it might?

    - Joel
    On Fri, 2011-05-27 at 13:56 +0200, Pierre GOSSE wrote:

    Hi,

    Maybe is it related to :
    https://issues.apache.org/jira/browse/LUCENE-3087

    Pierre

    -----Message d'origine-----
    De : Joel Halbert
    Envoyé : vendredi 27 mai 2011 12:57
    À : lucene users
    Objet : FastVectorHighlighter.getBestFragments returning null

    Hi,

    I'm using Lucene 3.0.3. I'm extracting snippets using
    FastVectorHighlighter, for some snippets (I think always when searching
    for exact matches, quoted) the fragment is null.

    Code looks like:


    query = QueryParser.escape(query);
    if (exact) {
    query = "\""+query+"\"";
    }
    BooleanQuery allQ = new BooleanQuery();
    Query bodyQ = new QueryParser(Version.LUCENE_30, BODY, analyser).parse(query);
    termQ.add(new BooleanClause(bodyQ, Occur.SHOULD));
    // add more queries
    allQ.add(new BooleanClause(termQ, Occur.MUST));

    TopDocs res = is.search(allQ, null, upperRange);
    FastVectorHighlighter highlighter = new FastVectorHighlighter(true, true);

    for (int i = in.getLowerRange(); i < Math.min(res.totalHits, upperRange); i++) {

    String[] bodyFrags =
    highlighter.getBestFragments(highlighter.getFieldQuery(bodyQ),
    is.getIndexReader(), res.scoreDocs[i].doc, BODY, 120, 2);

    // bodyFrags is null
    }


    I do get a hit, and the content with the exact match is coming from the
    BODY field, but I cann't seem to get the fragment out.

    Any clues,

    Thanks

    - Joel
  • Koji Sekiguchi at May 27, 2011 at 1:55 pm

    (11/05/27 20:56), Pierre GOSSE wrote:
    Hi,

    Maybe is it related to :
    https://issues.apache.org/jira/browse/LUCENE-3087
    No, because Joel's problem is FastVectorHighlighter, but LUCENE-3087
    is for Highlighter.

    koji
    --
    http://www.rondhuit.com/en/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Koji Sekiguchi at May 27, 2011 at 1:59 pm

    (11/05/27 19:57), Joel Halbert wrote:
    Hi,

    I'm using Lucene 3.0.3. I'm extracting snippets using
    FastVectorHighlighter, for some snippets (I think always when searching
    for exact matches, quoted) the fragment is null.

    Code looks like:


    query = QueryParser.escape(query);
    if (exact) {
    query = "\""+query+"\"";
    }
    BooleanQuery allQ = new BooleanQuery();
    Query bodyQ = new QueryParser(Version.LUCENE_30, BODY, analyser).parse(query);
    What analyzer do you use? And are you sure bodyQ can be composed of
    TermQuery, PhraseQuery, BooleanQuery and DisjunctionMaxQuery?

    koji
    --
    http://www.rondhuit.com/en/

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedMay 27, '11 at 10:58a
activeMay 27, '11 at 1:59p
posts6
users3
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase