I have a question about zipped list archives; the question arose from
a subscriber to one of our lists. I am running Mailman 2.1.11 on
Ubuntu from a package I built from the SourceForge source.

mailman# pwd
/var/lib/mailman/archives/private/LISTNAME
mailman# ls -ald 2009-August*
drwxrwsr-x 2 list list 4096 2009-08-31 11:34 2009-August
-rw-rw-r-- 1 list list 91577 2009-08-31 11:34 2009-August.txt
-rw-rw-r-- 1 list list 20708 2009-09-01 03:27 2009-August.txt.gz
mailman#

The .txt file looks fine, as does the .gz file.
When I go to the list admin web interface and look at the archives,
I see

August 2009: [Thread] [Subject] [Author] [Date] [GZip'd text 20KB]

That value (20KB) seems to be correct. When I click on the "[Gzip...]"
link, Firefox/Solaris gives me a text file, not a .gz file. Maybe
Firefox knows how to unzip the file, as vim does. When I click on
the same link using IE8/XP, IE8 sees the .gz suffix and asks me what
to do with the file. I save it on my desktop, and when I look at the
file, I see that it is a plain text file. It is not a gzip'd file.
Why? Thanks.

----------------------------------------------------------------------
Barry S. Finkel
Computing and Information Systems Division
Argonne National Laboratory Phone: +1 (630) 252-7277
9700 South Cass Avenue Facsimile:+1 (630) 252-4601
Building 222, Room D209 Internet: BSFinkel at anl.gov
Argonne, IL 60439-4828 IBMMAIL: I1004994

Search Discussions

  • Grant Taylor at Sep 3, 2009 at 9:36 pm

    On 09/03/09 15:45, Barry Finkel wrote:
    I save it on my desktop, and when I look at the file, I see that it
    is a plain text file. It is not a gzip'd file. Why? Thanks.
    I'm betting that Apache is automatically decompressing the file and
    sending it to you.

    Apache (and a few other web servers) know how to serve up content that
    has been compressed on disk to save space. It can be configured to send
    either the compressed or decompressed content.

    The thing that I'm not sure about is how Apache will behave (if it's
    working with compression) if it has two files, 2009-August.txt and
    2009-August.txt.gz.

    Another thing that may be messing with you is that Firefox may be
    reporting (via HTTP header) that it can accept and deal with compressed
    content and IE not doing so.

    Unless someone else comes up with any thing else, I think I would go to
    an Apache group and re-pose your scenario (of the compressed and
    uncompressed files) and see if they can help shed some more light on the
    subject. Of course you can do some digging in Apache's documentation
    and probably find some more information too.



    Grant. . . .
  • Mark Sapiro at Sep 3, 2009 at 10:03 pm

    Grant Taylor wrote:
    On 09/03/09 15:45, Barry Finkel wrote:
    I save it on my desktop, and when I look at the file, I see that it
    is a plain text file. It is not a gzip'd file. Why? Thanks.
    I'm betting that Apache is automatically decompressing the file and
    sending it to you.

    I agree.

    Apache (and a few other web servers) know how to serve up content that
    has been compressed on disk to save space. It can be configured to send
    either the compressed or decompressed content.

    The thing that I'm not sure about is how Apache will behave (if it's
    working with compression) if it has two files, 2009-August.txt and
    2009-August.txt.gz.

    The same as it behaves if you ask for 2009-August.txt.gz and there is
    no file named 2009-August.txt. You ask for 2009-August.txt.gz and
    assuming it finds it, it serves it according to how it's configured to
    serve that file. It won't be aware of any similarly named file without
    the .gz extension.

    Another thing that may be messing with you is that Firefox may be
    reporting (via HTTP header) that it can accept and deal with compressed
    content and IE not doing so.

    Or vice versa?

    This is not likely to be the full explanation because the OP reported
    that IE saved the file and the content was uncompressed text.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Mark Sapiro at Sep 3, 2009 at 9:46 pm

    Barry Finkel wrote:
    I have a question about zipped list archives; the question arose from
    a subscriber to one of our lists. I am running Mailman 2.1.11 on
    Ubuntu from a package I built from the SourceForge source.

    mailman# pwd
    /var/lib/mailman/archives/private/LISTNAME
    mailman# ls -ald 2009-August*
    drwxrwsr-x 2 list list 4096 2009-08-31 11:34 2009-August
    -rw-rw-r-- 1 list list 91577 2009-08-31 11:34 2009-August.txt
    -rw-rw-r-- 1 list list 20708 2009-09-01 03:27 2009-August.txt.gz
    mailman#

    The .txt file looks fine, as does the .gz file.
    When I go to the list admin web interface and look at the archives,
    I see

    August 2009: [Thread] [Subject] [Author] [Date] [GZip'd text 20KB]

    That value (20KB) seems to be correct. When I click on the "[Gzip...]"
    link, Firefox/Solaris gives me a text file, not a .gz file. Maybe
    Firefox knows how to unzip the file, as vim does. When I click on
    the same link using IE8/XP, IE8 sees the .gz suffix and asks me what
    to do with the file. I save it on my desktop, and when I look at the
    file, I see that it is a plain text file. It is not a gzip'd file.
    Why? Thanks.

    Your web server is converting the gzipped file and serving it as plain
    text, but MSIE sees the .gz extension and thinks it can't display the
    content.

    However, I recommend you don't gzip the files at all. As you can see,
    doing so doesn't save space; it requires more space because the .txt
    files are kept even after gzipping. The old ones that will have no
    more messages added can be removed, but you have to do that manually.

    Keeping a gzipped file can save some bandwidth when accessing the file
    on the web, but not if your web server converts and serves it as plain
    text, which appears to be the case.

    Also, unless you set GZIP_ARCHIVE_TXT_FILES = Yes in mm_cfg.py (don't
    do it see below), the current day's posts are not in the .txt.gz file
    until cron runs Mailman's cron/nightly_gzip.

    Thus, I recommend not gzipping the archive .txt files at all. I.e., do
    not put GZIP_ARCHIVE_TXT_FILES = Yes in mm_cfg.py and remove or
    comment the cron/nightly_gzip entry from Mailman's crontab.

    This can be a bit tricky to do right because you have links on the
    archive TOC page that point to the .txt.gz files, and if you just
    comment the cron/nightly_gzip entry, the current period's .txt.gz file
    will be quickly out of date.

    You can remove all the .txt.gz files, and the next archived post will
    rebuild the TOC with links to the .txt files, but for the period
    before the next archived post, the archive TOC will have links
    pointing to the removed .txt.gz files.

    One way around this is just to run bin/arch --wipe on a list or lists.
    This will remove all the list's .txt.gz files and build an archive TOC
    with correct links to the .txt files. The .txt.gz files will only be
    regenerated if cron/nightly_gzip is run. The usual caveats about
    running bin/arch --wipe, especially on older lists, apply. Namely,
    it's a good idea to first check the
    archives/private/LIST.mbox/LIST.mbox file with bin/cleanarch, and
    there is a possibility that messages can get renumbered which
    invalidates externally saved links to exisitng messages.

    Another way around it is to remove the .txt.gz files manually and then
    run 'bin/arch LISTNAME /dev/null' to rebuild the archive TOC. Note no
    --wipe option and no input redirection - just /dev/null as a filename
    argument.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • LuKreme at Sep 4, 2009 at 3:03 am

    On 3-Sep-2009, at 14:45, Barry Finkel wrote:
    when I look at the file, I see that it is a plain text file. It is
    not a gzip'd file.
    Why? Thanks.
    This is not anything Mailman is doing. Either your system is
    automatically seeing the zipped data and uncompressing it, or (and
    this is more likely) apache is set to automatically uncompress the .gz
    file and send it along.

    If you can, see if your httpd.conf is using mod_gunzip and if so, what
    its settings are. If you want the gzip archives, then see if you can
    get it disabled for the mailman archives pages.

    --
    Everybody hates a tourist, especially one who thinks it's all such
    laugh. Yeah, and the chip stains and grease will come out in the
    bath. You will never understand how it feels to live your life
    with no meaning or control, and with nowhere left to go. You
    are amazed that the exist, and they burn so bright whilst you
    can only wonder why.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedSep 3, '09 at 8:45p
activeSep 4, '09 at 3:03a
posts5
users4
websitelist.org

People

Translate

site design / logo © 2022 Grokbase