FAQ
Hello,
I'm trying to retrieve payloads from the highlighteds terms by Highlighter
class. In my tests, all terms returned from Highlighter has null as payload.
Example:

Highlighter h = new Highlighter(new Formatter() {
public String highlightTerm(String originalText, TokenGroup tokenGroup) {
Token token = tokenGroup.getToken(0);
Payload payload = token.getPayload();
assertNotNull(payload); // <---------------- payload is always null
return originalText;
}
}, scorer);

It seems that Highlighter removes payload attribute from all terms before
creating its TokenGroup.
How can I preserve it?

Thanks.

Search Discussions

  • Fabiano Nunes at Nov 30, 2010 at 4:07 pm
    I've figured out the PayloadSpanUtil class. It's exactly what I'm expecting.
    But, I'm concerned about the warning message in API docs (indeed, I think I
    dont understand it). There is any other approach? Can I have the same
    results retrieving the termPositions without performance issues?

    Thanks.
    On Tue, Nov 30, 2010 at 1:20 PM, Fabiano Nunes wrote:

    Hello,
    I'm trying to retrieve payloads from the highlighteds terms by Highlighter
    class. In my tests, all terms returned from Highlighter has null as payload.
    Example:

    Highlighter h = new Highlighter(new Formatter() {
    public String highlightTerm(String originalText, TokenGroup tokenGroup) {
    Token token = tokenGroup.getToken(0);
    Payload payload = token.getPayload();
    assertNotNull(payload); // <---------------- payload is always null
    return originalText;
    }
    }, scorer);

    It seems that Highlighter removes payload attribute from all terms before
    creating its TokenGroup.
    How can I preserve it?

    Thanks.
  • Erick Erickson at Nov 30, 2010 at 4:32 pm
    That's the Lucene developers' way of saying "we don't guarantee backwards
    compatibility".

    The devs go to great lengths to honor the contract of not changing public
    APIs without
    going through a deprecation process, which causes quite a lot of work. But
    that conflicts
    with the desirable process of having real-live users get a look at new code.
    So this
    warning merely says "the next compilation of Lucene may have a different
    API, so don't
    blame us if you have to recompile against a new jar". OK, I've played pretty
    loose with
    the words, but that's the intent.

    By and large, though, the functionality will be there in new versions, but
    you may have
    to change the calls you use if you get new jars.

    I'd go ahead and use the class/method with confidence that something similar
    will be there
    in the future, but you may have to recompile if you get new jars.

    Best
    Erick
    On Tue, Nov 30, 2010 at 11:06 AM, Fabiano Nunes wrote:

    I've figured out the PayloadSpanUtil class. It's exactly what I'm
    expecting.
    But, I'm concerned about the warning message in API docs (indeed, I think I
    dont understand it). There is any other approach? Can I have the same
    results retrieving the termPositions without performance issues?

    Thanks.
    On Tue, Nov 30, 2010 at 1:20 PM, Fabiano Nunes wrote:

    Hello,
    I'm trying to retrieve payloads from the highlighteds terms by
    Highlighter
    class. In my tests, all terms returned from Highlighter has null as payload.
    Example:

    Highlighter h = new Highlighter(new Formatter() {
    public String highlightTerm(String originalText, TokenGroup tokenGroup) {
    Token token = tokenGroup.getToken(0);
    Payload payload = token.getPayload();
    assertNotNull(payload); // <---------------- payload is always null
    return originalText;
    }
    }, scorer);

    It seems that Highlighter removes payload attribute from all terms before
    creating its TokenGroup.
    How can I preserve it?

    Thanks.
  • Fabiano Nunes at Nov 30, 2010 at 7:17 pm
    Ok. I'll go ahead.
    Just one more thing: the apidocs warning says "(...) IndexReader should only
    contain doc of interest, best to use MemoryIndex (..)".
    How can I build a reader with a subset of docs?

    Thanks!
    On Tue, Nov 30, 2010 at 2:31 PM, Erick Erickson wrote:

    That's the Lucene developers' way of saying "we don't guarantee backwards
    compatibility".

    The devs go to great lengths to honor the contract of not changing public
    APIs without
    going through a deprecation process, which causes quite a lot of work. But
    that conflicts
    with the desirable process of having real-live users get a look at new
    code.
    So this
    warning merely says "the next compilation of Lucene may have a different
    API, so don't
    blame us if you have to recompile against a new jar". OK, I've played
    pretty
    loose with
    the words, but that's the intent.

    By and large, though, the functionality will be there in new versions, but
    you may have
    to change the calls you use if you get new jars.

    I'd go ahead and use the class/method with confidence that something
    similar
    will be there
    in the future, but you may have to recompile if you get new jars.

    Best
    Erick
    On Tue, Nov 30, 2010 at 11:06 AM, Fabiano Nunes wrote:

    I've figured out the PayloadSpanUtil class. It's exactly what I'm
    expecting.
    But, I'm concerned about the warning message in API docs (indeed, I think I
    dont understand it). There is any other approach? Can I have the same
    results retrieving the termPositions without performance issues?

    Thanks.
    On Tue, Nov 30, 2010 at 1:20 PM, Fabiano Nunes wrote:

    Hello,
    I'm trying to retrieve payloads from the highlighteds terms by
    Highlighter
    class. In my tests, all terms returned from Highlighter has null as payload.
    Example:

    Highlighter h = new Highlighter(new Formatter() {
    public String highlightTerm(String originalText, TokenGroup tokenGroup)
    {
    Token token = tokenGroup.getToken(0);
    Payload payload = token.getPayload();
    assertNotNull(payload); // <---------------- payload is always null
    return originalText;
    }
    }, scorer);

    It seems that Highlighter removes payload attribute from all terms
    before
    creating its TokenGroup.
    How can I preserve it?

    Thanks.
  • Erick Erickson at Nov 30, 2010 at 8:50 pm
    Warning, ignorance alert. I'm not all that up on the guts of this one.

    But take a look at MemoryIndex, there's an example there. The gist
    is that you create a MemoryIndex on the fly and index the doc in question
    into it, then you can get the IndexReader from the IndexSearcher associated
    with the MemoryIndex. All this is similar to how Highlighter works, and it
    looks like you have access to the original input?

    Now, this is largely a guess, so don't waste time if I'm really off base
    with
    this.

    Best
    Erick
    On Tue, Nov 30, 2010 at 2:16 PM, Fabiano Nunes wrote:

    Ok. I'll go ahead.
    Just one more thing: the apidocs warning says "(...) IndexReader should
    only
    contain doc of interest, best to use MemoryIndex (..)".
    How can I build a reader with a subset of docs?

    Thanks!

    On Tue, Nov 30, 2010 at 2:31 PM, Erick Erickson <erickerickson@gmail.com
    wrote:
    That's the Lucene developers' way of saying "we don't guarantee backwards
    compatibility".

    The devs go to great lengths to honor the contract of not changing public
    APIs without
    going through a deprecation process, which causes quite a lot of work. But
    that conflicts
    with the desirable process of having real-live users get a look at new
    code.
    So this
    warning merely says "the next compilation of Lucene may have a different
    API, so don't
    blame us if you have to recompile against a new jar". OK, I've played
    pretty
    loose with
    the words, but that's the intent.

    By and large, though, the functionality will be there in new versions, but
    you may have
    to change the calls you use if you get new jars.

    I'd go ahead and use the class/method with confidence that something
    similar
    will be there
    in the future, but you may have to recompile if you get new jars.

    Best
    Erick
    On Tue, Nov 30, 2010 at 11:06 AM, Fabiano Nunes wrote:

    I've figured out the PayloadSpanUtil class. It's exactly what I'm
    expecting.
    But, I'm concerned about the warning message in API docs (indeed, I
    think
    I
    dont understand it). There is any other approach? Can I have the same
    results retrieving the termPositions without performance issues?

    Thanks.
    On Tue, Nov 30, 2010 at 1:20 PM, Fabiano Nunes wrote:

    Hello,
    I'm trying to retrieve payloads from the highlighteds terms by
    Highlighter
    class. In my tests, all terms returned from Highlighter has null as payload.
    Example:

    Highlighter h = new Highlighter(new Formatter() {
    public String highlightTerm(String originalText, TokenGroup
    tokenGroup)
    {
    Token token = tokenGroup.getToken(0);
    Payload payload = token.getPayload();
    assertNotNull(payload); // <---------------- payload is always null
    return originalText;
    }
    }, scorer);

    It seems that Highlighter removes payload attribute from all terms
    before
    creating its TokenGroup.
    How can I preserve it?

    Thanks.
  • Fabiano Nunes at Dec 1, 2010 at 12:10 pm
    Thanks Erick. This solved my problem. Now I can retrieve payloads using on
    the fly readers.


    On Tue, Nov 30, 2010 at 6:49 PM, Erick Erickson wrote:

    Warning, ignorance alert. I'm not all that up on the guts of this one.

    But take a look at MemoryIndex, there's an example there. The gist
    is that you create a MemoryIndex on the fly and index the doc in question
    into it, then you can get the IndexReader from the IndexSearcher associated
    with the MemoryIndex. All this is similar to how Highlighter works, and it
    looks like you have access to the original input?

    Now, this is largely a guess, so don't waste time if I'm really off base
    with
    this.

    Best
    Erick
    On Tue, Nov 30, 2010 at 2:16 PM, Fabiano Nunes wrote:

    Ok. I'll go ahead.
    Just one more thing: the apidocs warning says "(...) IndexReader should
    only
    contain doc of interest, best to use MemoryIndex (..)".
    How can I build a reader with a subset of docs?

    Thanks!

    On Tue, Nov 30, 2010 at 2:31 PM, Erick Erickson <erickerickson@gmail.com
    wrote:
    That's the Lucene developers' way of saying "we don't guarantee
    backwards
    compatibility".

    The devs go to great lengths to honor the contract of not changing
    public
    APIs without
    going through a deprecation process, which causes quite a lot of work. But
    that conflicts
    with the desirable process of having real-live users get a look at new
    code.
    So this
    warning merely says "the next compilation of Lucene may have a
    different
    API, so don't
    blame us if you have to recompile against a new jar". OK, I've played
    pretty
    loose with
    the words, but that's the intent.

    By and large, though, the functionality will be there in new versions, but
    you may have
    to change the calls you use if you get new jars.

    I'd go ahead and use the class/method with confidence that something
    similar
    will be there
    in the future, but you may have to recompile if you get new jars.

    Best
    Erick

    On Tue, Nov 30, 2010 at 11:06 AM, Fabiano Nunes <fabiano@nunes.me>
    wrote:
    I've figured out the PayloadSpanUtil class. It's exactly what I'm
    expecting.
    But, I'm concerned about the warning message in API docs (indeed, I
    think
    I
    dont understand it). There is any other approach? Can I have the same
    results retrieving the termPositions without performance issues?

    Thanks.

    On Tue, Nov 30, 2010 at 1:20 PM, Fabiano Nunes <fabiano@nunes.me>
    wrote:
    Hello,
    I'm trying to retrieve payloads from the highlighteds terms by
    Highlighter
    class. In my tests, all terms returned from Highlighter has null as payload.
    Example:

    Highlighter h = new Highlighter(new Formatter() {
    public String highlightTerm(String originalText, TokenGroup
    tokenGroup)
    {
    Token token = tokenGroup.getToken(0);
    Payload payload = token.getPayload();
    assertNotNull(payload); // <---------------- payload is always
    null
    return originalText;
    }
    }, scorer);

    It seems that Highlighter removes payload attribute from all terms
    before
    creating its TokenGroup.
    How can I preserve it?

    Thanks.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 30, '10 at 3:20p
activeDec 1, '10 at 12:10p
posts6
users2
websitelucene.apache.org

2 users in discussion

Fabiano Nunes: 4 posts Erick Erickson: 2 posts

People

Translate

site design / logo © 2022 Grokbase