FAQ
Hello,
Can the Lucene search engine index and search though PDF documents?
What are the file format limits for Lucene search engine.

Thanks in Advance,

Andre'

Search Discussions

  • Mark Helmstetter at Oct 17, 2003 at 10:19 pm
    Lucene FAQ Sect. 2 #12:
    http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.indexing&toc=faq#q12
    How do I index other document types such as PDF and Word ?

    jGuru FAQ - 4th bullet
    http://www.jguru.com/faq/view.jsp?EID=107423
    How can I index PDF documents?

    Andre Hughes wrote:
    Hello,
    Can the Lucene search engine index and search though PDF documents?
    What are the file format limits for Lucene search engine.

    Thanks in Advance,

    Andre'



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Andre Hughes at Oct 17, 2003 at 11:50 pm
    Thanks for the info.

    It seems though that the jGuru is a dead link.

    Thanks Again.

    Andre'

    -----Original Message-----
    From: Mark Helmstetter
    Sent: Friday, October 17, 2003 3:19 PM
    To: Lucene Users List
    Subject: Re: Does the Lucene search engine work with PDF's?


    Lucene FAQ Sect. 2 #12:
    http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.indexing&toc=faq#q12
    How do I index other document types such as PDF and Word ?

    jGuru FAQ - 4th bullet
    http://www.jguru.com/faq/view.jsp?EID=107423
    How can I index PDF documents?

    Andre Hughes wrote:
    Hello,
    Can the Lucene search engine index and search though PDF documents?
    What are the file format limits for Lucene search engine.

    Thanks in Advance,

    Andre'



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org


    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Ben Litchfield at Oct 18, 2003 at 12:28 am
    You need to be able to extract the text from them and feed that to lucene.
    http://ww.pdfbox.org can extract text from pdf documents.

    Ben

    On Fri, 17 Oct 2003, Andre Hughes wrote:

    Hello,
    Can the Lucene search engine index and search though PDF documents?
    What are the file format limits for Lucene search engine.

    Thanks in Advance,

    Andre'
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • MOYSE Gilles (Cetelem) at Oct 20, 2003 at 7:34 am
    You can also use the TextMining.org toolbox, which provides classes to
    extract text from PDF and DOC files, using the Jakarta POI project. They are
    all free, under Apache Licence.

    The URL
    :http://www.textmining.org/modules.php?op=modload&name=News&file=article&sid
    =6&mode=thread&order=0&thold=0).
    (URL tested today)

    You can try the JGuru page : http://www.jguru.com/faq/view.jsp?EID=1074237

    Gilles Moyse


    -----Message d'origine-----
    De : Andre Hughes
    Envoyé : samedi 18 octobre 2003 00:05
    À : lucene-user@jakarta.apache.org
    Objet : Does the Lucene search engine work with PDF's?


    Hello,
    Can the Lucene search engine index and search though PDF documents?
    What are the file format limits for Lucene search engine.

    Thanks in Advance,

    Andre'
  • Konrad Kolosowski at Oct 20, 2003 at 4:15 pm
    Return Receipt

    Your Does the Lucene search engine work with PDF's?
    document
    :

    was Konrad Kolosowski/Toronto/IBM
    received
    by:

    at: 10/20/2003 12:15:25






    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org
  • Konrad Kolosowski at Oct 20, 2003 at 4:15 pm
    Return Receipt

    Your RE: Does the Lucene search engine work with PDF's?
    document
    :

    was Konrad Kolosowski/Toronto/IBM
    received
    by:

    at: 10/20/2003 12:15:44






    ---------------------------------------------------------------------
    To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
    For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedOct 17, '03 at 10:03p
activeOct 20, '03 at 4:15p
posts7
users5
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase