FAQ
Hi everyone, I am trying to generate a PDF printable format file from
an html page. Is there a way to do this using python. If yes then
which library and functions are required and if no then reasons why it
cant be done.

Thank you All

Search Discussions

  • Zentrader at Dec 16, 2007 at 5:21 pm
    I'm sure it can be done but there is no reason to reinvent the wheel
    unless it's for a programming exercise. You can use pdftohtml and run
    it from a Python program if you want.
    http://pdftohtml.sourceforge.net/
  • Abhishek at Dec 17, 2007 at 8:38 am

    On Dec 16, 10:21 pm, Zentrader wrote:
    I'm sure it can be done but there is no reason to reinvent the wheel
    unless it's for a programming exercise. You can use pdftohtml and run
    it from a Python program if you want.http://pdftohtml.sourceforge.net/
    Hi Zentrader, thanks for your help.
  • Zentrader at Dec 16, 2007 at 5:26 pm
    Sorry, I read that backwards. I do it the opposite of you. Anyway a
    google for "html to pdf python" turns up a lot of hits. Again, no
    reason to reinvent the wheel.
  • Ramsey Nasser at Dec 16, 2007 at 6:46 pm

    On Dec 16, 2007 7:26 PM, Zentrader wrote:
    Sorry, I read that backwards. I do it the opposite of you. Anyway a
    google for "html to pdf python" turns up a lot of hits. Again, no
    reason to reinvent the wheel.

    --
    http://mail.python.org/mailman/listinfo/python-list
    Like Zentrader said, theres no reason to reinvent the wheel. An HTML
    to PDF converter is no trivial task. You would essentially have to
    implement an HTML layout engine that outputs PDF files. Not only does
    that mean you would have to programatically produce a PDF file, but it
    means you would have to parse and correctly render HTML and CSS
    according to accepted web standards, the W3C's specifications. This
    has proved difficult to do and get right in practice, as is evident
    from the browser compatibility issues that continue to plague the web.

    Theres a package called Prince that's supposed to do an excellent job.
    Check it out:

    http://www.princexml.com/

    Its layout engine surpasses some browsers in terms of compatibility
    with web standards. I don't think its free for commercial use, though,
    so this might depend on what exactly you're trying to do.

    An alternative idea it to wait for Firefox 3 to come out. If I'm not
    mistaken, it will feature a new version of the Gecko layout engine
    which will use Cairo for all its rendering. Coincidently, Cairo can be
    made to output PDF files. So, you may be able to hack something
    together.

    --
    nasser
  • Shane Geiger at Dec 16, 2007 at 7:32 pm
    Just some thoughts to get you started:

    You may not get any responses because you weren't specific enough about
    what you want to do. Since you are asking about doing this via Python,
    it seems you want to automate something which can be done from a menu
    option in various Web browsers (use the print feature and print to
    PDF). You could, of course, download the files (as with the
    command-line Web client, wget) and then convert html to PDF using
    various tools. Of course, this gives you a different result--of
    course--because you would be using a different HTML rendering engine.
    So you have to ask yourself: Is your goal to have a page that looks
    exactly like it looks in Firefox? or in IE? or Safari? Or are you only
    concerned that you have the words of the document?

    Hi everyone, I am trying to generate a PDF printable format file from
    an html page. Is there a way to do this using python. If yes then
    which library and functions are required and if no then reasons why it
    cant be done.

    Thank you All

    --
    Shane Geiger
    IT Director
    National Council on Economic Education
    sgeiger at ncee.net | 402-438-8958 | http://www.ncee.net

    Leading the Campaign for Economic and Financial Literacy
  • Waldemar Osuch at Dec 17, 2007 at 5:41 am

    On Dec 16, 3:51 am, abhishek wrote:
    Hi everyone, I am trying to generate a PDF printable format file from
    an html page. Is there a way to do this using python. If yes then
    which library and functions are required and if no then reasons why it
    cant be done.

    Thank you All
    You may want to investigate.
    http://pisa.spirito.de/
    It worked for me in some simple conversions
  • Grant Edwards at Dec 17, 2007 at 3:42 pm

    On 2007-12-16, abhishek wrote:

    Hi everyone, I am trying to generate a PDF printable format file from
    an html page. Is there a way to do this using python. If yes then
    which library and functions are required and if no then reasons why it
    cant be done.
    Here's one way:

    ------------------------------html2pdf.py----------------------------------------
    #!/usr/bin/python
    import os,sys

    inputFilename,outputFilename = sys.argv[1:3]

    os.system("w3m -dump %s | a2ps -B --borders=no | ps2pdf - %s" % (inputFilename,outputFilename))
    ---------------------------------------------------------------------------------

    --
    Grant Edwards grante Yow! Someone in DAYTON,
    at Ohio is selling USED
    visi.com CARPETS to a SERBO-CROATIAN
  • Abhishek at Dec 19, 2007 at 6:03 am

    On Dec 17, 8:42 pm, Grant Edwards wrote:
    On 2007-12-16, abhishek wrote:

    Hi everyone, I am trying to generate a PDF printable format file from
    an html page. Is there a way to do this using python. If yes then
    which library and functions are required and if no then reasons why it
    cant be done.
    Here's one way:

    ------------------------------html2pdf.py-----------------------------------------
    #!/usr/bin/python
    import os,sys

    inputFilename,outputFilename = sys.argv[1:3]

    os.system("w3m -dump %s | a2ps -B --borders=no | ps2pdf - %s" % (inputFilename,outputFilename))
    ----------------------------------------------------------------------------------

    --
    Grant Edwards grante Yow! Someone in DAYTON,
    at Ohio is selling USED
    visi.com CARPETS to a SERBO-CROATIAN
    hi grant have tried the command it resulted in the following errors
    --

    sh: a2ps: not found
    ESP Ghostscript 815.04: **** Could not open the file /home/samba/users/
    Abhishek/newTemplate.pdf .
    **** Unable to open the initial device, quitting.
    256
  • Stefan Behnel at Dec 19, 2007 at 7:50 am

    abhishek wrote:
    sh: a2ps: not found
    This should make you think. Sounds like a good reason to install a2ps...

    Stefan
  • Grant Edwards at Dec 19, 2007 at 3:31 pm

    On 2007-12-19, abhishek wrote:

    Hi everyone, I am trying to generate a PDF printable format file from
    an html page. Is there a way to do this using python. If yes then
    which library and functions are required and if no then reasons why it
    cant be done.
    Here's one way:

    ------------------------------html2pdf.py-----------------------------------------
    #!/usr/bin/python
    import os,sys

    inputFilename,outputFilename = sys.argv[1:3]

    os.system("w3m -dump %s | a2ps -B --borders=no | ps2pdf - %s" % (inputFilename,outputFilename))
    ----------------------------------------------------------------------------------
    hi grant have tried the command it resulted in the following errors

    sh: a2ps: not found
    You'll need to install a2ps. It's available as a standard
    package for all the distros I've ever used.
    ESP Ghostscript 815.04: **** Could not open the file /home/samba/users/
    Abhishek/newTemplate.pdf .
    **** Unable to open the initial device, quitting.
    256
    Either your ghostscript installation is broken, or you've tried
    to use an output path/file that's not writable. I suspect the
    latter.

    --
    Grant Edwards grante Yow! Is it 1974? What's
    at for SUPPER? Can I spend
    visi.com my COLLEGE FUND in one
    wild afternoon??
  • Terry Jones at Dec 19, 2007 at 3:40 pm
    "Grant" == Grant Edwards <grante at visi.com> writes:
    Grant> On 2007-12-19, abhishek wrote:
    Hi everyone, I am trying to generate a PDF printable format file from
    an html page. Is there a way to do this using python. If yes then
    which library and functions are required and if no then reasons why it
    cant be done.
    Here's one way:

    ------------------------------html2pdf.py-----------------------------------------
    #!/usr/bin/python
    import os,sys

    inputFilename,outputFilename = sys.argv[1:3]

    os.system("w3m -dump %s | a2ps -B --borders=no | ps2pdf - %s" % (inputFilename,outputFilename))
    Note that this is highly insecure. outputFilename could be passed e.g., as

    /tmp/file.pdf; rm -fr /home/abhishek

    Terry
  • Ismail Dönmez at Dec 19, 2007 at 3:55 pm

    Wednesday 19 December 2007 17:40:17 tarihinde Terry Jones ?unlar? yazm??t?:
    "Grant" == Grant Edwards <grante at visi.com> writes:
    Grant> On 2007-12-19, abhishek wrote:
    Hi everyone, I am trying to generate a PDF printable format file from
    an html page. Is there a way to do this using python. If yes then
    which library and functions are required and if no then reasons why
    it cant be done.
    Here's one way:

    ------------------------------html2pdf.py------------------------------
    ----------- #!/usr/bin/python
    import os,sys

    inputFilename,outputFilename = sys.argv[1:3]

    os.system("w3m -dump %s | a2ps -B --borders=no | ps2pdf - %s" %
    (inputFilename,outputFilename))
    Note that this is highly insecure. outputFilename could be passed e.g., as

    /tmp/file.pdf; rm -fr /home/abhishek
    And the solution is to use subprocess [0] instead of os.system()

    [0] http://docs.python.org/lib/module-subprocess.html

    Regards,
    ismail

    --
    Never learn by your mistakes, if you do you may never dare to try again.
  • Grant Edwards at Dec 19, 2007 at 4:17 pm

    On 2007-12-19, Terry Jones wrote:
    "Grant" == Grant Edwards <grante at visi.com> writes:
    Grant> On 2007-12-19, abhishek wrote:
    Hi everyone, I am trying to generate a PDF printable format file from
    an html page. Is there a way to do this using python. If yes then
    which library and functions are required and if no then reasons why it
    cant be done.
    Here's one way:

    ------------------------------html2pdf.py-----------------------------------------
    #!/usr/bin/python
    import os,sys

    inputFilename,outputFilename = sys.argv[1:3]

    os.system("w3m -dump %s | a2ps -B --borders=no | ps2pdf - %s" % (inputFilename,outputFilename))
    Note that this is highly insecure. outputFilename could be passed e.g., as

    /tmp/file.pdf; rm -fr /home/abhishek
    Here's a half-assed solution:

    inputFilename = inputFilename.replace("'","")
    outputFilename = outputFilename.replace("'","")

    os.system("w3m -dump '%s' | a2ps -B --borders=no | ps2pdf - '%s'" % (inputFilename,outputFilename))

    As somebody else suggested, building the pipeline "by hand"
    using the subprocess module is the most bullet-proof method.

    --
    Grant Edwards grante Yow! I brought my BOWLING
    at BALL -- and some DRUGS!!
    visi.com
  • MonkeeSage at Dec 20, 2007 at 12:52 pm

    On Dec 19, 10:17 am, Grant Edwards wrote:
    On 2007-12-19, Terry Jones wrote:


    "Grant" == Grant Edwards <gra... at visi.com> writes:
    Grant> On 2007-12-19, abhishek wrote:
    Hi everyone, I am trying to generate a PDF printable format file from
    an html page. Is there a way to do this using python. If yes then
    which library and functions are required and if no then reasons why it
    cant be done.
    Here's one way:
    ------------------------------html2pdf.py-----------------------------------------
    #!/usr/bin/python
    import os,sys
    inputFilename,outputFilename = sys.argv[1:3]
    os.system("w3m -dump %s | a2ps -B --borders=no | ps2pdf - %s" % (inputFilename,outputFilename))
    Note that this is highly insecure. outputFilename could be passed e.g., as
    /tmp/file.pdf; rm -fr /home/abhishek
    Here's a half-assed solution:

    inputFilename = inputFilename.replace("'","")
    outputFilename = outputFilename.replace("'","")

    os.system("w3m -dump '%s' | a2ps -B --borders=no | ps2pdf - '%s'" % (inputFilename,outputFilename))

    As somebody else suggested, building the pipeline "by hand"
    using the subprocess module is the most bullet-proof method.

    --
    Grant Edwards grante Yow! I brought my BOWLING
    at BALL -- and some DRUGS!!
    visi.com
    This looks a little better for me ... | a2ps -B --borders=0 --
    columns=1 -f 10.0 | ...

    Regards,
    Jordan
  • Grant Edwards at Dec 20, 2007 at 3:54 pm

    On 2007-12-20, MonkeeSage wrote:

    This looks a little better for me ... | a2ps -B --borders=0 --
    columns=1 -f 10.0 | ...
    Right. I forgot that I've adjusted my a2ps defaults to using a
    single column and a readable font size instead of the standard
    2-up tiny-font mode.

    --
    Grant Edwards grante Yow! When you get your
    at PH.D. will you get able to
    visi.com work at BURGER KING?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedDec 16, '07 at 10:51a
activeDec 20, '07 at 3:54p
posts16
users10
websitepython.org

People

Translate

site design / logo © 2022 Grokbase