FAQ
I want to use Python to find all "\n" terminated
strings in a PDF file, ideally returning string
starting addresses. Anyone willing to help?


--
--------------------------------- --- -- -
Posted with NewsLeecher v4.0 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -

Search Discussions

  • Emile van Sebille at Jan 5, 2011 at 11:45 pm
    On 1/5/2011 3:12 PM kanthony at woh.rr.com said...
    I want to use Python to find all "\n" terminated
    strings in a PDF file, ideally returning string
    starting addresses. Anyone willing to help?
    pdflines = open(r'c:\shared\python_book_01.pdf').readlines()
    sps = [0]
    for ii in pdflines: sps.append(sps[-1]+len(ii))

    Emile
  • Justin Peel at Jan 6, 2011 at 1:14 am

    On Wed, Jan 5, 2011 at 4:45 PM, Emile van Sebille wrote:

    On 1/5/2011 3:12 PM kanthony at woh.rr.com said...

    I want to use Python to find all "\n" terminated
    strings in a PDF file, ideally returning string
    starting addresses. Anyone willing to help?
    pdflines = open(r'c:\shared\python_book_01.pdf').readlines()
    sps = [0]
    for ii in pdflines: sps.append(sps[-1]+len(ii))

    Emile


    --
    http://mail.python.org/mailman/listinfo/python-list
    Bear in mind that pdf files often have compressed objects in them. If that
    is the case, then I would recommend opening the pdf in binary mode and
    figuring out how to deflate the correct objects before doing any searching.
    PyPDF is a package that might help with this though it could use some
    updating.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20110105/470662e7/attachment.html>
  • Bubba at Jan 6, 2011 at 1:55 am
    Does this work for binary files? (Like PDFs)


    --
    --------------------------------- --- -- -
    Posted with NewsLeecher v4.0 Final
    Web @ http://www.newsleecher.com/?usenet
    ------------------- ----- ---- -- -
  • Emile van Sebille at Jan 6, 2011 at 4:01 am
    On 1/5/2011 5:55 PM Bubba said...
    Does this work for binary files? (Like PDFs)
    I don't know what you want -- pdf's are not line oriented so searching
    for \n's is sketchy from the get go.

    I figured this was homework to test something....

    Emile
  • Bubba at Jan 6, 2011 at 2:24 am
    Your code only shows the first 488 bytes of the file?


    --
    --------------------------------- --- -- -
    Posted with NewsLeecher v4.0 Final
    Web @ http://www.newsleecher.com/?usenet
    ------------------- ----- ---- -- -
  • Emile van Sebille at Jan 6, 2011 at 4:02 am
    On 1/5/2011 6:24 PM Bubba said...
    Your code only shows the first 488 bytes of the file?

    add 'rb' to the open statement...

    pdflines = open(r'c:\shared\python_book_01.pdf','rb').readlines()
    sps = [0]
    for ii in pdflines: sps.append(sps[-1]+len(ii))
    Emile

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJan 5, '11 at 11:12p
activeJan 6, '11 at 4:02a
posts7
users4
websitepython.org

People

Translate

site design / logo © 2022 Grokbase