Has anyone ever tried to find the pixel (or point) location of text in
a PDF using Python? I've been using the pyPdf libraries for other
things, and it seems to me that if I can find the bounding box for
text, I should be able to calculate the location.

What I want to do is take a PDF of one of our vendor invoices and blur
everything in it except the block that's related to a single
customer. So if I have an invoice that looks like:

Alfred Annoying
123 Elm St
Somewhere, NJ

Barbie Bonehead
456 Pine St
Elsewhere, NJ

Charlie Clueless
789 Beech St.
Everywhere, NJ

I want to show Barbie just her section of the invoice (with the header
intact, so that she can tell it's a real invoice) but with Alfred and
Charlie's information blurred out. I was going to convert the PDF to
a JPG or PNG and do the blurring with ImageMagick/PythonMagick. But
that requires me to know the pixel location of the regions that I want
blurred and left alone.

I'm also open to other ideas if I'm going about this the hard way....

Search Discussions

  • Brown wrap at Mar 9, 2010 at 10:53 pm
    The md5 module is used in build xulrunner and Firefox from source. When I do run the install its says md5 no such module. I compiled Python 2.6.4 after installing openssl and BerkleyDB, yet I get this error when compilying Python:

    Failed to find the necessary bits to build these modules:
    _tkinter bsddb185 dl
    imageop sunaudiodev
    To find the necessary bits, look in setup.py in detect_modules() for the module's name.

    Failed to build these modules:
    _hashlib _ssl

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
postedMar 9, '10 at 10:05p
activeMar 9, '10 at 10:53p

2 users in discussion

Brown wrap: 1 post Chris Curvey: 1 post



site design / logo © 2022 Grokbase