FAQ
Hello everybody,

I'm looking for a pure Python solution for converting word documents
to text. App Engine doesn't allow external programs, which means that
external programs like catdoc and antiword can't be used. Anyone know
of any?

Thanks in advance.


--
mvh Bj?rn

Search Discussions

  • Nitebirdz at Sep 1, 2009 at 11:42 am

    On Tue, Sep 01, 2009 at 11:38:30AM +0200, BJ?rn Lindqvist wrote:
    Hello everybody,

    I'm looking for a pure Python solution for converting word documents
    to text. App Engine doesn't allow external programs, which means that
    external programs like catdoc and antiword can't be used. Anyone know
    of any?
    A quick search returned this:

    http://code.activestate.com/recipes/279003/


    Did you give it a try?
  • Tino Wildenhain at Sep 1, 2009 at 1:20 pm

    Am 01.09.2009 13:42, schrieb Nitebirdz:
    On Tue, Sep 01, 2009 at 11:38:30AM +0200, BJ?rn Lindqvist wrote:
    Hello everybody,

    I'm looking for a pure Python solution for converting word documents
    to text. App Engine doesn't allow external programs, which means that
    external programs like catdoc and antiword can't be used. Anyone know
    of any?
    A quick search returned this:

    http://code.activestate.com/recipes/279003/


    Did you give it a try?
    Thats a funny advice. Did you read that receipe? ;-)
    "Requires the Python for Windows extensions, and MS Word."
    how does this match with "App Engine doesn't allow external programs"? :-)

    For excel this would be easy but word - Bj?rn, did you check google api
    if you would be able to access google docs for this?

    Regards
    Tino

    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: smime.p7s
    Type: application/pkcs7-signature
    Size: 3254 bytes
    Desc: S/MIME Cryptographic Signature
    URL: <http://mail.python.org/pipermail/python-list/attachments/20090901/fca5939d/attachment.bin>
  • Nitebirdz at Sep 1, 2009 at 3:24 pm

    On Tue, Sep 01, 2009 at 03:20:29PM +0200, Tino Wildenhain wrote:

    A quick search returned this:

    http://code.activestate.com/recipes/279003/


    Did you give it a try?
    Thats a funny advice. Did you read that receipe? ;-)
    "Requires the Python for Windows extensions, and MS Word."
    how does this match with "App Engine doesn't allow external programs"? :-)
    Sorry, you're absolutely right. I did notice it required Windows, but
    didn't see any comments in the original message that this wasn't to be
    run on Windows. As for the issue regarding external programs, I assumed
    it only referred to the ones explictly mentioned or similar (catdoc,
    antiword, etc.).

    My apologies.
  • BJörn Lindqvist at Sep 1, 2009 at 5:42 pm

    2009/9/1 Tino Wildenhain <tino at wildenhain.de>:
    Am 01.09.2009 13:42, schrieb Nitebirdz:
    On Tue, Sep 01, 2009 at 11:38:30AM +0200, BJ?rn Lindqvist wrote:

    Hello everybody,

    I'm looking for a pure Python solution for converting word documents
    to text. App Engine doesn't allow external programs, which means that
    external programs like catdoc and antiword can't be used. Anyone know
    of any?
    A quick search returned this:

    http://code.activestate.com/recipes/279003/


    Did you give it a try?
    Thats a funny advice. Did you read that receipe? ;-)
    "Requires the Python for Windows extensions, and MS Word."
    how does this match with "App Engine doesn't allow external programs"? :-)

    For excel this would be easy but word - Bj?rn, did you check google api
    if you would be able to access google docs for this?
    I did not, thanks for the tip! The system I managed to hack together
    uploads the .doc to a google docs account and then retrieves it again
    as plain text. It works but sure feels kind of silly. It's not very
    reliable because if google has some kind of problem with their docs
    application it doesn't work at all. Plus the method is dirt slow due
    to the latency of all the http calls. But better than nothing.


    --
    mvh Bj?rn
  • BJörn Lindqvist at Sep 1, 2009 at 1:56 pm

    2009/9/1 Nitebirdz <nitebirdz at sacredchaos.com>:
    On Tue, Sep 01, 2009 at 11:38:30AM +0200, BJ?rn Lindqvist wrote:
    Hello everybody,

    I'm looking for a pure Python solution for converting word documents
    to text. App Engine doesn't allow external programs, which means that
    external programs like catdoc and antiword can't be used. Anyone know
    of any?
    A quick search returned this:

    http://code.activestate.com/recipes/279003/
    It requires windows.


    --
    mvh Bj?rn
  • Tim Golden at Sep 1, 2009 at 2:05 pm

    BJ?rn Lindqvist wrote:
    2009/9/1 Nitebirdz <nitebirdz at sacredchaos.com>:
    On Tue, Sep 01, 2009 at 11:38:30AM +0200, BJ?rn Lindqvist wrote:
    Hello everybody,

    I'm looking for a pure Python solution for converting word documents
    to text. App Engine doesn't allow external programs, which means that
    external programs like catdoc and antiword can't be used. Anyone know
    of any?
    A quick search returned this:

    http://code.activestate.com/recipes/279003/
    It requires windows.
    I'm moderately confident that no (published) solution exists
    for this without relying on an installed Word or an external
    program of the kind you mentioned. Obviously, there's nothing
    to stop someone creating a Python module which does the
    equivalent, possibly by wrapping the core of the catdoc/antiword
    code in a Python module or by recoding its functionality in
    Python. But I imagine you knew that :)

    If you were talking Excel, you'd be in luck thanks to the
    sterling work done by John Machin and others. But I imagine
    that the market for word doc interchange / conversion is
    considerably smaller, especially within restricted environments.

    Depending on the source of your docs, it would be possible to
    save them as, eg, XML or something for which a converter is
    available in Python. Even text-only, I suppose. But I suppose
    that you're asking because that's not a possibility?

    TJG
  • Gabriel at Sep 1, 2009 at 2:09 pm

    2009/9/1 BJ?rn Lindqvist <bjourne at gmail.com>:
    Hello everybody,

    I'm looking for a pure Python solution for converting word documents
    to text. App Engine doesn't allow external programs, which means that
    external programs like catdoc and antiword can't be used. Anyone know
    of any?
    You could use the google docs api
    (http://code.google.com/apis/documents/docs/3.0/developers_guide_protocol.html#DownloadingDocsAndPresentations)

    --
    Kind Regards

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedSep 1, '09 at 9:38a
activeSep 1, '09 at 5:42p
posts8
users5
websitepython.org

People

Translate

site design / logo © 2022 Grokbase