FAQ
I'm curious about embedding extra information in an index (and being able to search the extra information as well). In this case certain tokens correspond to recognized entities with ids. I'd like to get the ids into the index so that searching for the id of the entity will also return that document. I can think of three ways and I was curious if there's a preferred way:
1) Add the id as another token during filtering
2) Add the id as a payload
3) Add the id as an attribute (although I don't know how to search on the attribute value)

Thanks,
-Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erick Erickson at Sep 21, 2010 at 11:25 pm
    Off the top of my head...
    1) is certainly easiest. This looks suspiciously like synonyms. That is, at
    index
    time you inject the ID as a synonym in the text and it gets indexed at
    the same
    position as the token. Why this helps is that then phrase queries
    continue to
    work. Lucene in Action has an example of creating a synonym analyzer.
    2) I don't see how payloads really help you here. I confess I'm not
    intimately
    familiar with payloads, but what I've seen is that they're useful when
    you
    match the *term* and want to do something special. Uses I've seen are,
    for instance, parts of speech. So one can alter the score of, say, nouns
    to boost matches on nouns. But I don't recall seeing something that
    allows
    the payload data to be the match.
    3) I have no idea what an attribute is in this context <G>..... Although
    you
    could simply create another field that contained all of the IDs for the
    document and add an SHOULD clause to all your queries on that field.

    HTH
    Erick
    On Tue, Sep 21, 2010 at 3:11 PM, Christopher Condit wrote:

    I'm curious about embedding extra information in an index (and being able
    to search the extra information as well). In this case certain tokens
    correspond to recognized entities with ids. I'd like to get the ids into the
    index so that searching for the id of the entity will also return that
    document. I can think of three ways and I was curious if there's a preferred
    way:
    1) Add the id as another token during filtering
    2) Add the id as a payload
    3) Add the id as an attribute (although I don't know how to search on the
    attribute value)

    Thanks,
    -Chris

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedSep 21, '10 at 7:12p
activeSep 21, '10 at 11:25p
posts2
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase