Well, you have to do some extra work for page matching. If you search the
you'll find significant discussions of paging. The short form is that you
either index the entire document and record the start and/or end offset for
(I'd put the results in a separate field in the document). There are various
forms of queries that give you match positions and you can look at the
field with page offsets to find the page.
The other possibility is to index each page as a separate document, but this
solution suffers from the problem of matching, say, phrases that start on
page and end on the next.
On Mon, Nov 2, 2009 at 6:03 AM, Vicente David Guardiola Buitrago wrote:
I´m using lucene to index some PDF documents an it’s working great. But I’m
wondering if it’s possible to know some extra information of returned
matches by lucene.
I need to know the exact page where lucene has found every match, and it
will be get if I could get also the paragraph or some information to go
directly to the word(s) mathing.
I really appreciate any advice you could give me.