FAQ
Hello, everyone,



I´m using lucene to index some PDF documents an it’s working great. But I’m
wondering if it’s possible to know some extra information of returned
matches by lucene.



I need to know the exact page where lucene has found every match, and it
will be get if I could get also the paragraph or some information to go
directly to the word(s) mathing.



I really appreciate any advice you could give me.



Best regards



Vicente Guardiola

Search Discussions

  • Erick Erickson at Nov 2, 2009 at 12:55 pm
    Well, you have to do some extra work for page matching. If you search the
    user list
    you'll find significant discussions of paging. The short form is that you
    have to
    either index the entire document and record the start and/or end offset for
    each page
    (I'd put the results in a separate field in the document). There are various
    forms of queries that give you match positions and you can look at the
    stored
    field with page offsets to find the page.

    The other possibility is to index each page as a separate document, but this
    solution suffers from the problem of matching, say, phrases that start on
    one
    page and end on the next.

    HTH
    Erick
    On Mon, Nov 2, 2009 at 6:03 AM, Vicente David Guardiola Buitrago wrote:

    Hello, everyone,



    I´m using lucene to index some PDF documents an it’s working great. But I’m
    wondering if it’s possible to know some extra information of returned
    matches by lucene.



    I need to know the exact page where lucene has found every match, and it
    will be get if I could get also the paragraph or some information to go
    directly to the word(s) mathing.



    I really appreciate any advice you could give me.



    Best regards



    Vicente Guardiola


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedNov 2, '09 at 11:01a
activeNov 2, '09 at 12:55p
posts2
users2
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase