FAQ
I am working on a program that needs to stat files (gif, swf, xml, dirs,
etc) from the web. I know how to stat a local file...


import os
tplStat = os.stat(path)



but I can't figure out how to stat a file that resides on a web server.
I am not sure if it makes a difference, but most (maybe all) of the
files that I need to stat reside within the same domain that will
generate the request. I am able to open the file by using



import urllib

f = urllib.urlopen(url)



but for some reason I cannot stat the files. Any help will greatly be
appreciated. Thanks!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-list/attachments/20070724/6bb8a187/attachment.html

Search Discussions

  • Carsten Haese at Jul 24, 2007 at 1:47 pm

    On Tue, 2007-07-24 at 09:07 -0400, DB Daniel Brown wrote:
    I am working on a program that needs to stat files (gif, swf, xml,
    dirs, etc) from the web. I know how to stat a local file?


    import os
    tplStat = os.stat(path)



    but I can?t figure out how to stat a file that resides on a web
    server.
    You can't stat a file on a web server.
    I am not sure if it makes a difference, but most (maybe all) of the
    files that I need to stat reside within the same domain that will
    generate the request.
    As long as you use HTTP to get to the file, that makes no difference. If
    you can get to the file via NFS or SMB, that would help.
    I am able to open the file by using



    import urllib

    f = urllib.urlopen(url)



    but for some reason I cannot stat the files.
    That's because urlopen returns a file-like object, not a file. The best
    you can hope for is to inspect the headers that the web server returns:
    import urllib
    f = urllib.urlopen("http://www.python.org")
    f.headers['last-modified']
    'Mon, 23 Jul 2007 20:35:52 GMT'
    f.headers.items()
    [('content-length', '14053'), ('accept-ranges', 'bytes'), ('server',
    'Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 mod_ssl/2.2.3 OpenSSL/0.9.8c'),
    ('last-modified', 'Mon, 23 Jul 2007 20:35:52 GMT'), ('connection',
    'close'), ('etag', '"60193-36e5-39089a00"'), ('date', 'Tue, 24 Jul 2007
    13:42:57 GMT'), ('content-type', 'text/html')]

    Maybe that's good enough for your needs.

    HTH,
  • Gabriel Genellina at Jul 25, 2007 at 1:23 am
    En Tue, 24 Jul 2007 10:47:16 -0300, Carsten Haese <carsten at uniqsys.com>
    escribi?:
    On Tue, 2007-07-24 at 09:07 -0400, DB Daniel Brown wrote:
    I am working on a program that needs to stat files (gif, swf, xml,
    dirs, etc) from the web. I know how to stat a local file?
    but I can?t figure out how to stat a file that resides on a web
    server.
    That's because urlopen returns a file-like object, not a file. The best
    you can hope for is to inspect the headers that the web server returns:
    import urllib
    f = urllib.urlopen("http://www.python.org")
    f.headers['last-modified']
    'Mon, 23 Jul 2007 20:35:52 GMT'
    f.headers.items()
    [('content-length', '14053'), ('accept-ranges', 'bytes'), ('server',
    'Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 mod_ssl/2.2.3 OpenSSL/0.9.8c'),
    ('last-modified', 'Mon, 23 Jul 2007 20:35:52 GMT'), ('connection',
    'close'), ('etag', '"60193-36e5-39089a00"'), ('date', 'Tue, 24 Jul 2007
    13:42:57 GMT'), ('content-type', 'text/html')]

    Maybe that's good enough for your needs.
    This generates an HTTP GET request - transfering the contents too,
    innecesarily. Using an HTTP HEAD request would be better, as only the
    headers are transfered. Since urllib can't generate a HEAD request, one
    has to use httplib instead (it's just a bit more "low level"):

    py> import httplib
    py> conn = httplib.HTTPConnection("www.python.org")
    py> conn.request("HEAD", "/images/python-logo.gif")
    py> resp = conn.getresponse()
    py> resp.getheaders()
    [('content-length', '2549'), ('accept-ranges', 'bytes'), ('server',
    'Apache/2.2.
    3 (Debian) DAV/2 SVN/1.4.2 mod_ssl/2.2.3 OpenSSL/0.9.8c'),
    ('last-modified', 'Tu
    e, 24 Jul 2007 23:41:20 GMT'), ('etag', '"6015b-9f5-ee27c800"'), ('date',
    'Wed,
    25 Jul 2007 01:12:43 GMT'), ('content-type', 'image/gif')]
    py> conn.close()

    --
    Gabriel Genellina
  • Carsten Haese at Jul 25, 2007 at 6:11 pm

    On Tue, 2007-07-24 at 22:23 -0300, Gabriel Genellina wrote:
    En Tue, 24 Jul 2007 10:47:16 -0300, Carsten Haese <carsten at uniqsys.com>
    escribi?:
    On Tue, 2007-07-24 at 09:07 -0400, DB Daniel Brown wrote:
    I am working on a program that needs to stat files (gif, swf, xml,
    dirs, etc) from the web. I know how to stat a local file?
    but I can?t figure out how to stat a file that resides on a web
    server.
    That's because urlopen returns a file-like object, not a file. The best
    you can hope for is to inspect the headers that the web server returns:
    import urllib
    f = urllib.urlopen("http://www.python.org")
    f.headers['last-modified']
    'Mon, 23 Jul 2007 20:35:52 GMT'
    f.headers.items()
    [('content-length', '14053'), ('accept-ranges', 'bytes'), ('server',
    'Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 mod_ssl/2.2.3 OpenSSL/0.9.8c'),
    ('last-modified', 'Mon, 23 Jul 2007 20:35:52 GMT'), ('connection',
    'close'), ('etag', '"60193-36e5-39089a00"'), ('date', 'Tue, 24 Jul 2007
    13:42:57 GMT'), ('content-type', 'text/html')]

    Maybe that's good enough for your needs.
    This generates an HTTP GET request - transfering the contents too,
    innecesarily.
    Yes, but how much of that content will actually be transferred if I
    don't call f.read?

    Consider this little test:

    # urltest.py
    import time, urllib

    t1 = time.time()
    f = urllib.urlopen("http://data.phishtank.com/data/online-valid/")
    print f.headers.items()
    t2 = time.time()
    f.close()
    print t2-t1
    # eof

    $ python urltest.py
    [('content-length', '4390510'), ('accept-ranges', 'bytes'), ('server',
    'Apache/2.2.4 (FreeBSD) mod_ssl/2.2.4 OpenSSL/0.9.7e-p1 DAV/2 PHP/5.2.0
    with Suhosin-Patch'), ('last-modified', 'Wed, 25 Jul 2007 17:58:04
    GMT'), ('connection', 'close'), ('etag', '"5705e1-42fe6e-40612300"'),
    ('date', 'Wed, 25 Jul 2007 18:07:46 GMT'), ('content-type',
    'application/xml')]
    0.303626060486

    I doubt that my computer just downloaded 4 MB of stuff in 0.3 seconds.
  • Gabriel Genellina at Jul 26, 2007 at 1:05 am
    En Wed, 25 Jul 2007 15:11:19 -0300, Carsten Haese <carsten at uniqsys.com>
    escribi?:
    On Tue, 2007-07-24 at 22:23 -0300, Gabriel Genellina wrote:
    En Tue, 24 Jul 2007 10:47:16 -0300, Carsten Haese <carsten at uniqsys.com>
    escribi?:
    On Tue, 2007-07-24 at 09:07 -0400, DB Daniel Brown wrote:
    I am working on a program that needs to stat files (gif, swf, xml,
    dirs, etc) from the web. I know how to stat a local file?
    That's because urlopen returns a file-like object, not a file. The best
    you can hope for is to inspect the headers that the web server
    returns:
    f = urllib.urlopen("http://www.python.org")
    This generates an HTTP GET request - transfering the contents too,
    innecesarily.
    Yes, but how much of that content will actually be transferred if I
    don't call f.read?
    Transferred? The server perhaps has sent all of it, depending on its
    configuration and available bandwidth. The first packets will be in your
    TCP receiving buffers even if you never call f.read(). So be nice to the
    origin server, the whole Internet, and our planet, and don't waste
    bandwidth and energy in requesting things that you're going to throw away
    anyway.
    I doubt that my computer just downloaded 4 MB of stuff in 0.3 seconds.
    Probably not, but I'd use netstat or ntop to find out how much has
    actually been downloaded.

    --
    Gabriel Genellina

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJul 24, '07 at 1:07p
activeJul 26, '07 at 1:05a
posts5
users3
websitepython.org

People

Translate

site design / logo © 2022 Grokbase