FAQ
Hi again Mailman-users!

I was wondering why attachments in archive take space more
than mbox-file. Found out that there are many double copies
of attachments like:

[root at lists ~]# cd ~mailman/arc*/*/ok-testi/attachments/*/*
[root at lists c69ef096]# ls -l
total 120
-rw-rw-r-- 1 root mailman 59714 May 3 14:48 attachment-0001.html
-rw-rw-r-- 1 root mailman 59714 May 3 14:48 attachment.html
[root at lists c69ef096]# diff attachment-0001.html attachment.html
[root at lists c69ef096]#

Any clue why?
Version is 2.1.5.

Osmo

Search Discussions

  • Mark Sapiro at May 4, 2006 at 1:18 am

    Osmo Kujala wrote:
    I was wondering why attachments in archive take space more
    than mbox-file. Found out that there are many double copies
    of attachments like:
    Scrubber scrubs the attachment twice. Once for the archive and once for
    the 'plain' digest, so you wind up with two copies.

    One way to avoid this is to set scrub_nondigest to Yes for the list.
    Then the attachment will be scrubbed from all messages before
    digesting and archiving. Or, you can disable digests entirely for the
    list.

    --
    Mark Sapiro <msapiro at value.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Osmo Kujala at May 5, 2006 at 7:14 am
    Mark wrote
    Scrubber scrubs the attachment twice. Once for the archive and once for
    the 'plain' digest, so you wind up with two copies.
    One way to avoid this is to set scrub_nondigest to Yes for the list.
    Okay thanks, but first we have to upgrade from 2.1.5.
    Then the attachment will be scrubbed from all messages before
    digesting and archiving. Or, you can disable digests entirely for the
    list.
    Our problem is more complicated. We have almost disk full of
    archives and have to use own script to cut archives by some date.
    Script uses /usr/lib/mailman/bin/arch to reconstruct archives
    after mbox-file has been cut. If arch will scrub attachments
    for digest too, then there might be situation that cutting
    archive take's more disk space while purpose was to free space.
    (I haven't seen that happen.)

    Osmo
  • Mark Sapiro at May 5, 2006 at 4:14 pm

    Osmo Kujala wrote:
    Scrubber scrubs the attachment twice. Once for the archive and once for
    the 'plain' digest, so you wind up with two copies.
    One way to avoid this is to set scrub_nondigest to Yes for the list.
    Okay thanks, but first we have to upgrade from 2.1.5.

    I think you can just add Scrubber to the global pipeline in mm_cfg.py
    or to a list pipeline. See
    <http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq04.067.htp>
    for ways to do that.

    Our problem is more complicated. We have almost disk full of
    archives and have to use own script to cut archives by some date.
    Script uses /usr/lib/mailman/bin/arch to reconstruct archives
    after mbox-file has been cut. If arch will scrub attachments
    for digest too, then there might be situation that cutting
    archive take's more disk space while purpose was to free space.
    (I haven't seen that happen.)

    If you are concerned about space, and you don't care about preserving
    as valid the links to attachments that were previously in archives or
    digests, you can delete the archives/private/<list>/attachments
    directories prior to running bin/arch --wipe. Then only the
    attachments which are linked from the new archive will be stored, but
    their file names will probably be different.

    --
    Mark Sapiro <msapiro at value.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Osmo Kujala at May 9, 2006 at 9:02 am

    I think you can just add Scrubber to the global pipeline in mm_cfg.py
    or to a list pipeline. See
    <http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq04.067.htp>
    for ways to do that.
    Interesting, I could have used this to add Spam-filtering based on
    SpamAssassin-headers. Did it replacing /usr/lib/mailman/mail/mailman
    by call to procmail...
    If you are concerned about space, and you don't care about preserving
    as valid the links to attachments that were previously in archives or
    digests, ...
    But we do care at least for non-digest mails, (digest-users may use
    archive when links are lost, non-digest users too, but the have
    maybe higher priority and assume attachments in tact)
    you can delete the archives/private/<list>/attachments
    directories prior to running bin/arch --wipe. Then only the
    attachments which are linked from the new archive will be stored, but
    their file names will probably be different.
    Now I see that we must reprogram our archive-cutting-program.
    List admins may turn the archiving off and then running
    bin/arch --wipe will delete all attachments or only those in
    mbox will survive. My tests (so far) show that bin/arch with
    or (whithout) --wipe deletes everything except mbox in lists
    archive. Have to figure out safe way to keep recent attachments
    in tact. (Maybe move to safe and back is the only safe way that
    might work through Mailman versions.)

    Osmo

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedMay 3, '06 at 2:25p
activeMay 9, '06 at 9:02a
posts5
users2
websitelist.org

2 users in discussion

Osmo Kujala: 3 posts Mark Sapiro: 2 posts

People

Translate

site design / logo © 2022 Grokbase