I think you misunderstood Otis. You will need to use some sort of parser
(i.e. pdfbox, xpdf) to get the title text from the PDF (I assume you are
indexing the documents, so you have this already). Then you create a
"title" field in your index and store the text of the title in there (so
that you can return it with the results). That is independent of what you
do with the rest of the PDF document.
-Mike
At 11:17 AM 6/28/2004, Don Vaillancourt wrote:
It can't be that simple because the field will contain the whole PDF and
not just the title. And for PDFs that are 3 or 4 megs, it is really not
reasonable to store the who PDF in the collection just to get the title.
________________________________________________________________________It can't be that simple because the field will contain the whole PDF and
not just the title. And for PDFs that are 3 or 4 megs, it is really not
reasonable to store the who PDF in the collection just to get the title.
Save and share anything you find online - Furl @ http://www.furl.net
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org