FAQ
Hello,

since I moved on with my offset-info problem in HTML files, I got a new one
trying to bring the tokens positions information together with tokens/term
offset information. Can someone tell me, how can I get a token, if I know
its position? It would be nice to get the tokens position from the Token
class, but I could only get the positionIncrement, which is not really
helpful..

What I'm actually trying to do, is to find the offset information of a
span/phrase query. I know, that the contrib highligter can highlight phrase
queries, but I want/need to do it one my own (or rather give the information
to another application, that does the highlighting of my documents). I also
couldn't really understand, how does the highlighter recognize, that the
individual tokens/terms belong to the phrase (i.e. if I search for "peter
pan" at the moment I also get the tokens 'peter' and 'pan' as weighted
terms, also if they occur individually).

Thanks so much in advance!
Karolina

Search Discussions

  • Karolina Bernat at Feb 1, 2011 at 10:00 am
    Hi,

    perhaps there is someone, who's trying to do the same thing so I just write
    down, how I got along with this problem. It is NOT the most elegant
    solution, but it works for me. I don't really know yet, how the performance
    of my search will be, but the tests look so far ok.

    For my phrase search I actually used the SpnQuery - I read that the
    QueryParser cant't handle this kind of queries so I do it manually by
    checking, if the user entered the search text within the " ".
    Handling a SpanQuery one have an access to the query spans - and the spans
    give you the start and the end position of the searched words.
    Furthermore I could find the positions and the offset information of the
    WeightedTerms by using :
    QueryTermExtractor.getIdfWeightedTerms(...) and TermPositionVector

    Because there is no possibility (or none that I know of) of getting the
    offset information if you know the terms positions, I thought of saving all
    the term positions and the term offset informations.. and since I get the
    span start- and end-position from a SpanQuery I can look up in the terms
    positions-array, at which index/place in the array I find the position I got
    from the SpanQuery and then go to my array with terms offset information and
    get the one at the same index/position in this array...
    With those informations I can get the start offset of the first term in the
    SpanQuery and the end offset of the last term - and I can highlight those
    continuous.

    That is really not the best way to process, but I couldn't find any better.

    Please let me know, if there is any other (better) way to do it.


    On Fri, Jan 28, 2011 at 4:41 PM, Karolina Bernat wrote:

    Hello,

    since I moved on with my offset-info problem in HTML files, I got a new one
    trying to bring the tokens positions information together with tokens/term
    offset information. Can someone tell me, how can I get a token, if I know
    its position? It would be nice to get the tokens position from the Token
    class, but I could only get the positionIncrement, which is not really
    helpful..

    What I'm actually trying to do, is to find the offset information of a
    span/phrase query. I know, that the contrib highligter can highlight phrase
    queries, but I want/need to do it one my own (or rather give the information
    to another application, that does the highlighting of my documents). I also
    couldn't really understand, how does the highlighter recognize, that the
    individual tokens/terms belong to the phrase (i.e. if I search for "peter
    pan" at the moment I also get the tokens 'peter' and 'pan' as weighted
    terms, also if they occur individually).

    Thanks so much in advance!
    Karolina

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJan 28, '11 at 3:42p
activeFeb 1, '11 at 10:00a
posts2
users1
websitelucene.apache.org

1 user in discussion

Karolina Bernat: 2 posts

People

Translate

site design / logo © 2022 Grokbase