FAQ
Hi all,

I've looked around with Google quite a bit, but haven't found anything
like what I'm looking for. Is there a Python library that will extract
images from PDF files? My ultimate goal is to pull the images out, use
the PIL library to reduce the size of the images and rebuild another
PDF file that's an essentially "thumbnail" version of the original PDF
file, smaller in size.

We've been using imagick to extract the images, but it's difficult to
script and slow to process the input PDF. Can someone suggest
something better?

Thanks in advance,
Doug

Search Discussions

  • David Lyon at Jul 28, 2009 at 3:37 am
    pdftohtml on sourceforge may help...
    On Mon, 27 Jul 2009 19:52:01 -0700 (PDT), writeson wrote:
    Hi all,

    I've looked around with Google quite a bit, but haven't found anything
    like what I'm looking for. Is there a Python library that will extract
    images from PDF files? My ultimate goal is to pull the images out, use
    the PIL library to reduce the size of the images and rebuild another
    PDF file that's an essentially "thumbnail" version of the original PDF
    file, smaller in size.

    We've been using imagick to extract the images, but it's difficult to
    script and slow to process the input PDF. Can someone suggest
    something better?

    Thanks in advance,
    Doug
  • Superpollo at Jul 28, 2009 at 10:07 am

    David Lyon wrote:
    pdftohtml on sourceforge may help...
    also see: http://linuxcommand.org/man_pages/pdfimages1.html

    bye
  • Writeson at Jul 28, 2009 at 8:21 pm
    David,

    Thanks for your reply, I'll take a look at pdftohtml and see if it
    suits my needs.

    Thanks!
    Doug
  • Xavier Ho at Jul 28, 2009 at 11:15 pm
    I've got a non-Python solution if you have Acrobat 6 or up.
    From the menu, Advanced -> Document Processing -> Extract All Images...
    If you need multiple PDFs done, Batch Processing would be a great start.

    Then you can run another script to make the thumbnails, or use Photoshop.
    Either way works!

    Best regards,

    Ching-Yun "Xavier" Ho, Technical Artist

    Contact Information
    Mobile: (+61) 04 3335 4748
    Skype ID: SpaXe85
    Email: contact at xavierho.com
    Website: http://xavierho.com/

    On Wed, Jul 29, 2009 at 6:21 AM, writeson wrote:

    David,

    Thanks for your reply, I'll take a look at pdftohtml and see if it
    suits my needs.

    Thanks!
    Doug
    --
    http://mail.python.org/mailman/listinfo/python-list
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20090729/3efb6e59/attachment.htm>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJul 28, '09 at 2:52a
activeJul 28, '09 at 11:15p
posts5
users4
websitepython.org

People

Translate

site design / logo © 2023 Grokbase