FAQ
hi.

i recently upgraded my mailing list server from debian etch to lenny,
which included an upgrade from mailman 2.1.9 to 2.1.11;

due to some local hackery this somewhat broke my mailing list archives
(the hackery only makes monthly archives available as e.g. 2009-03
rather than 2009-March or 2009-Maerz or whatever).
i fixed my hacks, but in order to get the archives right i ran
"mmarch --wipe ...", and found myself suprised that this did not produce
the desired results: the new archives seemed to contain some more emails
than the original ones, all of them having "No subject" and appearing in
the current archive directory.

it turned out that these new emails where parts of old emails.

the problem seems to be within the parsing of the mbox file: at some (to
me) arbitrary points, Mailbox.py would decide that the mail has finished
and start a new one; since the new one had no proper header, it ended up
as "No subject" (and no author information).

i did not get any error messages during building of the archives.
(else i would have thought of out-of-memory problems or similar)


i noticed that this "bug" (or whatever it is) might have been available
for quite some time: after some searching of my original archives, i
fould at least one similar case when i rebuild the entire archive in
2006-03 (where part of an email from 2003 (or so) ended up in my 2006-03
folder with "No subject")

any ideas?


my archives are rather big by now (i think); e.g. one list has about
68000 emails archived; rebuilding the archives with the renumbering as
found above somehow breaks the entire archive; fixing it manually is no
real option :-(

fgmasdr
IOhannes




--
IEM - network operation center
mailto:noc at iem.at

Search Discussions

  • Mark Sapiro at Mar 16, 2009 at 5:12 pm

    IEM - network operating center (IOhannes m zmoelnig) wrote:
    i fixed my hacks, but in order to get the archives right i ran
    "mmarch --wipe ...", and found myself suprised that this did not produce
    the desired results: the new archives seemed to contain some more emails
    than the original ones, all of them having "No subject" and appearing in
    the current archive directory.

    it turned out that these new emails where parts of old emails.

    the problem seems to be within the parsing of the mbox file: at some (to
    me) arbitrary points, Mailbox.py would decide that the mail has finished
    and start a new one; since the new one had no proper header, it ended up
    as "No subject" (and no author information).

    This is a well known issue. If a message body contains a line beginning
    with "From ", bin/arch takes that as a message separator. Very old
    Mailman didn't escape the From lines in the body.

    You need to first clean your .mbox files with Mailman's bin/cleanarch
    or some other process to escape the "From " lines that aren't message
    separators.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • IEM - network operating center (IOhannes m zmoelnig) at Mar 17, 2009 at 9:42 am

    Mark Sapiro wrote:

    You need to first clean your .mbox files with Mailman's bin/cleanarch
    or some other process to escape the "From " lines that aren't message
    separators.

    ah thanks, that will hopefully do the trick.


    however, this leaves me with one remaining problem:
    rebuilding the archives will rename several tenthousands of emails.
    while our archives do have a search functionality, this will still break
    links to most archived emails.

    how would i rebuild only the last month of an archive?

    and: is there a way to rebuld an archive in a "sandbox"; that is: can i
    somehow specify the destination directory of the archive-building
    process? (so i can rebuild the entire archive till it fits my needs and
    only then swap it with the current one)


    fgmasdr
    IOhannes



    --
    IEM - network operation center
    mailto:noc at iem.at
  • Mark Sapiro at Mar 17, 2009 at 4:25 pm

    IEM - network operating center (IOhannes m zmoelnig) wrote:
    however, this leaves me with one remaining problem:
    rebuilding the archives will rename several tenthousands of emails.
    while our archives do have a search functionality, this will still break
    links to most archived emails.

    This is a known issue with rebuilding old archives. You may find that
    fixing the .mbox solves this because you will no longer be assigning
    message numbers to spurious partial messages, but probably this won't
    completely solve the problem.

    how would i rebuild only the last month of an archive?

    This is only a suggestion to try. I haven't actually tried it.

    0) backup everything
    1) remove all the archives/private/LISTNAME/* files and directories for
    the last month
    2) remove all the archives/private/LISTNAME/database/* files for the
    last month.
    3) consider removing the archives/private/LISTNAME/attachments/*
    directories for the last month, but don't do it if you are concerned
    about attachments scrubbed from the plain digest as you will break
    those links.
    4) using bin/withlist, unpickle the the
    archives/private/LISTNAME/pipermail.pck data, and update 'archive',
    'archivedate', 'archives', 'firstdate', 'lastdate', 'sequence' and
    'size' to make it look like the last message in the archive is the
    last message for the prior month, and pickle it back.
    5) copy the last months messages from
    archives/private/LISTNAME.mbox/LISTNAME.mbox to a separate file.
    6) run bin/arch without the --wipe option and with the separate last
    month's .mbox as input.

    and: is there a way to rebuld an archive in a "sandbox"; that is: can i
    somehow specify the destination directory of the archive-building
    process? (so i can rebuild the entire archive till it fits my needs and
    only then swap it with the current one)

    This is probably more practical. Just create another Mailman instance
    with a different prefix and copy the lists/ and archives/private/
    directories to this second instance.


    You could probably also do it by putting

    PRIVATE_ARCHIVE_FILE_DIR = '/some/path'

    in mm_cfg.py in your current installation, but this is risky. It
    probably won't affect the running qrunners as long as they don't
    restart for any reason, but it will affect any web or command line
    accesses so you wouldn't want to do it unless you stopped web access
    to Mailman.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Stephen J. Turnbull at Mar 16, 2009 at 5:48 pm
    IEM - network operating center (IOhannes m zmoelnig) writes:
    i noticed that this "bug" (or whatever it is) might have been available
    for quite some time: after some searching of my original archives,
    Most likely all the "new" messages start with the word "From".

    It turns out that the only portable way to parse Unix mailboxes into
    messages is to treat an empty line followed by a line starting with
    the word "From" as a message separator. For that reason, most Unix
    mailers "stuff" that word whenever encountered in the body by
    prefixing it with ">". If your mail system doesn't do that, you get
    the effect you've seen.

    I believe that there's an option to mmarch to fix up this problem (or
    maybe a separate utlity).

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedMar 16, '09 at 4:22p
activeMar 17, '09 at 4:25p
posts5
users3
websitelist.org

People

Translate

site design / logo © 2022 Grokbase