FAQ
Hello,I use Lucene with Tomcat and I can now index and search all html documents. But I would like to index other documents such us pdf or Word (.doc), I hope that sameone can help me !

_______________________________________________
Join Excite! - http://www.excite.com
The most personalized portal on the Web!

Search Discussions

  • Michael Wechner at Dec 19, 2002 at 8:27 pm

    Friaa Nafaa wrote:
    Hello,I use Lucene with Tomcat and I can now index and search all html documents. But I would like to index other documents such us pdf or Word (.doc), I hope that sameone can help me !
    Concerning PDF:

    Before indexing you should extract the text from the PDF and save it
    as .txt (Then you can index the .txt, but reference the PDF uri). To do
    this have a look at


    http://www.foolabs.com/xpdf/download.html

    or

    http://www.pdfbox.org/

    These links are listed at

    http://jakarta.apache.org/lucene/docs/contributions.html

    Also take a look at the FAQ

    HTH

    Michael
    _______________________________________________
    Join Excite! - http://www.excite.com
    The most personalized portal on the Web!


    --
    To unsubscribe, e-mail:
    For additional commands, e-mail:

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedDec 19, '02 at 4:11p
activeDec 19, '02 at 8:27p
posts2
users2
websitelucene.apache.org

2 users in discussion

Friaa Nafaa: 1 post Michael Wechner: 1 post

People

Translate

site design / logo © 2022 Grokbase