FAQ
I'm doing the admin of a Mailman server with 280+ lists and I'm fighting
with archive sizes. Mailman is on a 5 gig partition, and 4 gigs of that
is taken up by ~mailman/archives.

In some cases the archives are big because the list gets dozens of posts a day.
In other cases, people use them to send 15meg attachments, which also get
archived.

The archive directories contain each months mail in three formats:

1. a plaintext file: 2004-November.txt
2. a gzipped file: 2004-November.txt.gz
3. a directory: 2004-November - contains individual HTML messages.

The web archive uses the files in the directory, and links to the gzipped
file. Does anything use the plaintext file? It seems like it's wasting a
ton of diskspace having the same file gzipped and unzipped in the same space.

So, first off, can I delete the year-month.txt files without causing harm?
Second, once the current month is over, can I prevent the non-zipped files
from ever existing? Finally, is there a way to prevent the archiving of
attachments?

Any other suggestions on how to control or limit the diskspace the archives use
would be greatly appreciated.

--
Michael Alberghini
Software Systems Engineer
Georgia State University
mike at gsu.edu

Search Discussions

  • Mark Sapiro at Feb 12, 2005 at 1:54 am

    Mike Alberghini wrote:
    The archive directories contain each months mail in three formats:

    1. a plaintext file: 2004-November.txt
    2. a gzipped file: 2004-November.txt.gz
    3. a directory: 2004-November - contains individual HTML messages.

    The web archive uses the files in the directory, and links to the gzipped
    file. Does anything use the plaintext file? It seems like it's wasting a
    ton of diskspace having the same file gzipped and unzipped in the same space.
    How the .txt file is used depends on the setting of
    GZIP_ARCHIVE_TXT_FILES in mm_cfg.py. If this is set to Yes, the .txt
    file only exists temporarily while the archiver unzips the .txt.gz and
    appends the .txt into a new .txt.gz. With this setting, there are no
    permanent .txt files, but this is a very inefficient process (see
    comments in Defaults.py).

    If GZIP_ARCHIVE_TXT_FILES is No, then the archive is accumulated in the
    .txt file and is gzip'd by a nightly cron. In this case, the .txt
    files can be deleted for prior months if no new messages ever arrive
    for that month. This can't always be guaranteed as a message could be
    delayed in transit or have a bad date. In general though, old .txt
    files can be deleted, and if a "late" message did arrive and cause
    loss of the .txt.gz information, the archive could be rebuilt from the
    <list>.mbox/<list>.mbox file with bin/arch.
    So, first off, can I delete the year-month.txt files without causing harm?
    Generally, yes after the month is over.
    Second, once the current month is over, can I prevent the non-zipped files
    from ever existing?
    You can set

    GZIP_ARCHIVE_TXT_FILES - Yes

    in mm_cfg.py if you're willing to live with the additional processing
    to unzip/rezip the .txt.gz file for each message.
    Finally, is there a way to prevent the archiving of
    attachments?
    If you don't want to use content filtering to keep them off the list
    entirely, then I think it would require a somewhat tricky hack. You
    could modify the code in Mailman/Handlers/Scrubber.py, but this would
    also affect digests - that's where it gets tricky.

    --
    Mark Sapiro <msapiro at value.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedFeb 9, '05 at 9:07p
activeFeb 12, '05 at 1:54a
posts2
users2
websitelist.org

2 users in discussion

Mike Alberghini: 1 post Mark Sapiro: 1 post

People

Translate

site design / logo © 2022 Grokbase