FAQ
Hi, Guys.
I am trying to extract the PDF file content(to get the specific
information) using python. I already tried pyPdf with no success.
Anyone has suggestions?
Thanks in advance.

Aonlazio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080821/6df97451/attachment.htm>

Search Discussions

  • William Purcell at Aug 21, 2008 at 1:47 pm
    Sorry, this last email was meant to be to the list.

    On Thu, Aug 21, 2008 at 8:41 AM, William Purcell
    wrote:
    I have been trying to do the same thing. Here is something I came up with,
    although it's not completely dependent on Python. It requires pdftotext to
    be installed. If your on a linux box, I think it comes in xpdf-utils but I'm
    not comletely sure. Anyway, install pdftotext and then you could use this
    function:

    ----------------------------------------------------------------------------
    import os

    def readpdf(filepath):
    cmd = 'pdftotext -layout %s -'%(filepath,)
    lines=os.popen(cmd).readlines()
    return lines

    ----------------------------------------------------------------------------
    I would like to find something totally Python, but this has worked for me
    in a pinch.
    -Bill

    On Thu, Aug 21, 2008 at 5:00 AM, AON LAZIO wrote:

    Hi, Guys.
    I am trying to extract the PDF file content(to get the specific
    information) using python. I already tried pyPdf with no success.
    Anyone has suggestions?
    Thanks in advance.

    Aonlazio

    --
    http://mail.python.org/mailman/listinfo/python-list
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20080821/78c7eeb1/attachment-0001.htm>
  • James Matthews at Aug 21, 2008 at 5:14 pm
    You can also use pdflib
    http://www.pdflib.com/download/pdflib-family/pdflib-7/

    On Thu, Aug 21, 2008 at 6:47 AM, William Purcell
    wrote:
    Sorry, this last email was meant to be to the list.

    On Thu, Aug 21, 2008 at 8:41 AM, William Purcell <
    williamhpurcell at gmail.com> wrote:
    I have been trying to do the same thing. Here is something I came up with,
    although it's not completely dependent on Python. It requires pdftotext to
    be installed. If your on a linux box, I think it comes in xpdf-utils but I'm
    not comletely sure. Anyway, install pdftotext and then you could use this
    function:

    ----------------------------------------------------------------------------
    import os

    def readpdf(filepath):
    cmd = 'pdftotext -layout %s -'%(filepath,)
    lines=os.popen(cmd).readlines()
    return lines

    ----------------------------------------------------------------------------
    I would like to find something totally Python, but this has worked for me
    in a pinch.
    -Bill

    On Thu, Aug 21, 2008 at 5:00 AM, AON LAZIO wrote:

    Hi, Guys.
    I am trying to extract the PDF file content(to get the specific
    information) using python. I already tried pyPdf with no success.
    Anyone has suggestions?
    Thanks in advance.

    Aonlazio

    --
    http://mail.python.org/mailman/listinfo/python-list
    --
    http://mail.python.org/mailman/listinfo/python-list


    --
    http://www.goldwatches.com/
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20080821/8bd5abb9/attachment.htm>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedAug 21, '08 at 10:00a
activeAug 21, '08 at 5:14p
posts3
users3
websitepython.org

People

Translate

site design / logo © 2022 Grokbase