FAQ
Hi all,

Newbie here.

I have recently moved a list over to Mailman running under Plesk on a VPS.
The version of Mailman that I currently have access to is 2.1.9.

I had a mbox file to import from my old mail-list system. Originally, I had
some problems - the mbox file was large (7 years of archives) and some
messages had a literal "\nFrom ", which cause the archiver to break the
messages inappropriately. For the moment, I escaped these incidents with a
, although that now means that ">From " appears in the archive, but I
thought it was better than having a corrupted archive, and I could rebuild
at a later date when I fully understood how Mailman copes with this.

However, I have very quickly discovered that if a post to the list contains
- in the actual message text - a newline (a single newline, not a double)
followed by the word From, Mailman interprets that as a new message and
breaks the message at that point, creating a fragment message with no
subject line.

Note that this is not just the archive - this actually affects messages
being sent to the subscribers - i.e. a message containing a newline followed
by "From " will be split in two before going out.

This really surprised me, as it is not at all unlikely that sometime or
other someone will post "From " at the start of a line!

Is this a known bug (I did search, and couldn't spot anything), and is it
fixed in other versions? Or do I have a rogue version of Mailman installed
on my VPS? The behaviour is consistent and repeatable.

Regards

Chris

Search Discussions

  • Chris Malme at Apr 20, 2010 at 9:33 am
    Further to my earlier post.
    Note that this is not just the archive - this actually affects messages
    being sent to the subscribers - i.e. a message containing a newline followed
    by "From " will be split in two before going out.
    Further testing shows that this is incorrect. The problem affects only the
    archive, not the mail-list messages going to the subscribers.

    However, the archive problem is repeatable. Any instance in the message text
    of "From " following a single newline is interpreted as a new message.
  • Terri Oda at Apr 20, 2010 at 3:24 pm
    There's a program in bin called "cleanarch" that can be run on your
    archive to fix this problem. It cleans up the offending From lines from
    older mbox files so that you can run arch again and generate correct
    html versions of the archives.

    Terri

    Chris Malme wrote:
    Further to my earlier post.
    Note that this is not just the archive - this actually affects messages
    being sent to the subscribers - i.e. a message containing a newline followed
    by "From " will be split in two before going out.
    Further testing shows that this is incorrect. The problem affects only the
    archive, not the mail-list messages going to the subscribers.

    However, the archive problem is repeatable. Any instance in the message text
    of "From " following a single newline is interpreted as a new message.
    ------------------------------------------------------
    Mailman-Users mailing list Mailman-Users at python.org
    http://mail.python.org/mailman/listinfo/mailman-users
    Mailman FAQ: http://wiki.list.org/x/AgA3
    Security Policy: http://wiki.list.org/x/QIA9
    Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
    Unsubscribe: http://mail.python.org/mailman/options/mailman-users/terri%40zone12.com
  • Mark Sapiro at Apr 20, 2010 at 3:42 pm

    Terri Oda wrote:
    There's a program in bin called "cleanarch" that can be run on your
    archive to fix this problem. It cleans up the offending From lines from
    older mbox files so that you can run arch again and generate correct
    html versions of the archives.

    Terri is correct. cleanarch will escape the unescaped From_ lines in
    the .mbox, but it seems you have done that yourself in some way, and
    the issue is with new messages.

    So the question is why is this happening with new messages? Again, what
    Mailman version is this?

    Also note that escaping From_ by preceding it with '>' is the accepted
    way to deal with this. Many MUAs will do it before sending the message
    and MDAs will do it too before delivering a message. It is unusual to
    be able to pass a From_ through email from end to end without it being
    escaped to >From_ somewhere between source and destination.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Mark Sapiro at Apr 20, 2010 at 3:26 pm

    Chris Malme wrote:
    Further testing shows that this is incorrect. The problem affects only the
    archive, not the mail-list messages going to the subscribers.

    However, the archive problem is repeatable. Any instance in the message text
    of "From " following a single newline is interpreted as a new message.

    What Mailman version is this?

    This was a bug at one time, but it was fixed years ago.

    There was a change in this area between 2.1.12 and 2.1.13, but that
    change only affected outgoing messages, not archives or digests.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Chris Malme at Apr 21, 2010 at 9:35 am

    On 20/04/2010 18:14, Mark Sapiro wrote:

    That is the normal way of dealing with messages containing From_ in the
    message body. It's not just Mailman or pipermail, and it's problematic
    to unescape them for display, because while the escaping is normal,
    there is no standard for escaping/unescaping so when you see >From_ in
    a message, you don't know if it is an escaped From_, a quoted From_ or
    a literal >From_.
    OK - that is different behaviour from the software I had been using, but if
    that is how it works, then I am happy - I was more concerned that it was an
    indication my installation might be broken than anything else. I take your
    point about not knowing which it is, but in my experience, a quoted From_ is
    most usually >_From_ not >From_, and I had been happy to take a potential
    hit on the exceptions to this, as they would be far less common than the
    escaped From_. But this is fairly unimportant, compared to the other problem
    of archive corruption.
    The issue with unescaped From_ in the body causing archive corruption
    was fixed long before Mailman 2.1.9
    To go to a more recent version is not impossible, but not trivial for me
    (a Linux VPS newbie), so I wanted to see if it was the solution before >>
    rolling my sleeves up.
    You shouldn't need to. Mailman 2.1.9 should not have this problem.
    OK - that means the weirdness in my specific system, not in Mailman. Not
    ideal, but at least I have narrowed things down a bit.
    I believe you, so I don't think I need to see the test list. The
    question is why isn't Mailman escaping the From_ when it archives and
    sends the message.

    It actually relies on the Python email library to do this, but Mailman
    2.1.9 should install its own version of the email package in Mailman's
    pythonlib/ directory, and this should always escape From_ lines when
    converting an email.Message.Message object to text. Why it doesn't is
    the question.

    Also curious is that I think you said the problem occurs with
    "text\nFrom " in the body, but not with "text\n\nFrom ". If I
    understood that correctly, that is really strange.
    Ah, no, it is happening with both "text\nFrom " and "text\n\nFrom ". It's
    just that I encountered "text\nFrom " very early on. (the mail-list is
    primarily about songwriting, which means we get lyrics posted to the list.
    Hence there is a higher than normal chance that you will get new lines
    starting with capitals.)

    Anyway, many thanks for your help. While it hasn't resolved my immediate
    problem, it has told me what it isn't, which is a great help. I'll go away
    and have a fiddle. I am actually tempted to reinstall 2.1.9 myself from
    scratch, which should have less issues regarding support than going to a new
    version. Originally, my VPS was supposed to have Mailman (not sure what
    version), but I had no access to it via Plesk (this is before I knew how to
    configure it direct). The VPS support then said they had upgraded it, which
    seemed to fix the problem. I can't help but wonder if they did a botched job
    of it. First thing to check is if Mailman's pythonlib/ directory, which you
    mention exists.

    If I get anywhere, I will let you know!

    Chris
  • Mark Sapiro at Apr 20, 2010 at 5:14 pm

    Chris Malme wrote:
    That is correct, I did it manually (or rather, with a quick script I wrote),
    preceding each message text line that begins with a "From " with a ">". This
    enabled the mbox to be imported into the Mailman Archive without splitting
    messages as it did when I first tried it. However, those "From " lines that
    I manually escaped are showing clearly in the Archive *with* the escape
    character - i.e. as ">From".

    That is the normal way of dealing with messages containing From_ in the
    message body. It's not just Mailman or pipermail, and it's problematic
    to unescape them for display, because while the escaping is normal,
    there is no standard for escaping/unescaping so when you see >From_ in
    a message, you don't know if it is an escaped From_, a quoted From_ or
    a literal >From_.

    Furthermore any new emails to the list that
    have a newline/From in the message text are archiving incorrectly.
    So the question is why is this happening with new messages? Again, what
    Mailman version is this?
    Sorry, I should have stated earlier. This is Mailman version 2.1.9, which is
    the version that the folk who support my VPS (using Plesk) have rolled out.

    The issue with unescaped From_ in the body causing archive corruption
    was fixed long before Mailman 2.1.9

    To go to a more recent version is not impossible, but not trivial for me (a
    Linux VPS newbie), so I wanted to see if it was the solution before rolling
    my sleeves up.

    You shouldn't need to. Mailman 2.1.9 should not have this problem.

    Also note that escaping From_ by preceding it with '>' is the accepted
    way to deal with this. Many MUAs will do it before sending the message
    and MDAs will do it too before delivering a message. It is unusual to
    be able to pass a From_ through email from end to end without it being
    escaped to >From_ somewhere between source and destination.
    That is what is puzzling me. I am able to send an email from Thunderbird
    through my VPS's mail server and see it go straight into the archive
    unescaped, splitting into multiple messages at every incidence of
    newline/From. I can also do it from Gmail.

    It doesn't happen with every MUA/MTA/MDA, but for example, I just sent
    such a message from Tbird 3.0.3 via Exim on localhost to a mailbox on
    localhost, and the From_ was unescaped in Tbird's Sent folder, but it
    was escaped in the recipient mailbox.

    I have a Test list set up that I am happy for anyone to access, if it might
    shed any light on the matter. The Test list does not have the imported
    archive, but it demonstrates the same behaviour regarding new messages.

    I'm happy to post the list URL if that is appropriate.

    I believe you, so I don't think I need to see the test list. The
    question is why isn't Mailman escaping the From_ when it archives and
    sends the message.

    It actually relies on the Python email library to do this, but Mailman
    2.1.9 should install its own version of the email package in Mailman's
    pythonlib/ directory, and this should always escape From_ lines when
    converting an email.Message.Message object to text. Why it doesn't is
    the question.

    Also curious is that I think you said the problem occurs with
    "text\nFrom " in the body, but not with "text\n\nFrom ". If I
    understood that correctly, that is really strange.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Chris Malme at Apr 21, 2010 at 12:38 pm
    Based on the comments received here, I have gone back and had another look
    at this, and discovered I was wrong on a number of important issues.
    Apologies for this, but I am (obviously) new to Mailman, and didn't
    completely realise what I was seeing, the first time.

    This time, having looked at the actual mbox file held in the Archive folder,
    I can see that incidents of "\nFrom " in the message body of new messages
    have been received correctly escaped by a ">"; and the mbox file clearly has
    them marked as >From_ lines. So that is all good.

    However, the Pipermail Archive does consistently split messages whenever a
    message-body "\nFrom " occurs, as I described earlier, with the second part
    being attributed to "bogus at does.not.exist.com".

    I've found that if I then run arch on the list (using the mbox file) the
    Archive is created correctly, without this splitting, although any
    subsequent messages with a message-body "\nFrom " cause further split messages.

    So it looks like my problem is with the dynamic creation of the Pipermail
    Archive, rather than the generation from the mbox file. I haven't yet pinned
    down what script/process is responsible for this.

    This suggests a perfectly acceptable "quick fix" of a daily cron job running
    Arch on the list, but I will look into this further when I get time.

    Many thanks for your help, which pointed me in the right direction.

    Chris
  • Mark Sapiro at Apr 21, 2010 at 2:48 pm

    Chris Malme wrote:
    However, the Pipermail Archive does consistently split messages whenever a
    message-body "\nFrom " occurs, as I described earlier, with the second part
    being attributed to "bogus at does.not.exist.com". [...]
    So it looks like my problem is with the dynamic creation of the Pipermail
    Archive, rather than the generation from the mbox file. I haven't yet pinned
    down what script/process is responsible for this.

    A word of caution. The archiver is a tangled web of subclasses and
    overridden methods and is quite difficult to follow.

    That said, I suspect the underlying OS here is Debian/Ubuntu and
    Mailman is the Debian/Ubuntu package which has patches in this area
    which are causing this. The patch is to fix
    <http://bugs.debian.org/244673>. The 2.1.9 patch is at
    <http://patch-tracker.debian.org/patch/series/view/mailman/1:2.1.9-7/77_header_folding_in_attachments.patch>
    (if that URL doesn't work, go to
    <http://patch-tracker.debian.org/package/mailman> and navigate from
    there - the direct URL is not stable and changes every time there is a
    package update).

    I don't specifically recall if this patch causes your problem or not,
    but I'm pretty sure it does. I think you can fix it by finding the
    added code around line 200 of Mailman/Message.py and changing

    g = Generator(fp)

    to

    g = Generator(fp, mangle_from_=True)

    I have installed a refactored version of this patch upstream as of
    Mailman 2.1.13 which doesn't have this problem.

    If you're interested, I can provide more detail on this, but I think
    the above change will fix your problem. It will also cause From_ to be
    escaped in outgoing non-digest messages (it is already escaped in
    digests) which may be an esthetic issue for some recipients, but for
    others, it will have been escaped anyway by an MTA/MDA in the delivery
    path.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Chris Malme at Apr 21, 2010 at 3:12 pm
    What a star!

    Thanks Mark, I will take a look at it later today.

    Yes, it is Debian/Ubuntu - I must learn to specify this things from the start.

    Chris
  • Chris Malme at Apr 24, 2010 at 12:12 pm
    Regarding my earlier query, regarding archive messages breaking, due to to
    in message "\nFrom " text; I am pleased to say that Mark's suggestion worked
    a treat, and fixed that archive woe. Many thanks.

    However, it uncovered another problem, less serious, but fascinating. One of
    my users uses a mail client that creates MessageIDs containing the character
    "-". As far as I can tell, this is completely legit.

    I've discovered that for every "-" in the MessageID, that message is moved
    one place across in the nesting of threads. As his MessageID can contain up
    to 5 "-" characters, this means any thread he participates in gets messed up
    somewhat.

    Looking Mailman/Archive/pipermail.py, I can see lines such as

    article.threadKey = parent.threadKey + article.date + '/' + article.msgid + '-'

    and

    self.write_threadindex_entry(article, artkey.count('-') - 1)

    which suggests the dash is being used as a delimiter/flag in Pipermail, but
    I haven't looked into it in any detail, yet.

    Before I do so, or begin to experiment, I thought I would ask if this is a
    known problem with a existing solution? I did do a quick search of the
    archive, but couldn't find anything obvious. If there isn't an existing fix,
    I might try something basic, like a global replacement of "'-'" for "'~'" in
    pipermail.py and just see what it does.

    As before, running Mailman 2.1.9/Pipermail 0.09 on Debian/Ubuntu, running Plesk.

    Chris
  • Chris Malme at Apr 24, 2010 at 12:15 pm
    By the way, the mail client in question is Apple Mail 2.1, and I have
    confirmed that this is normal behaviour for it.
  • Mark Sapiro at Apr 24, 2010 at 2:20 pm

    Chris Malme wrote:
    I've discovered that for every "-" in the MessageID, that message is moved
    one place across in the nesting of threads. As his MessageID can contain up
    to 5 "-" characters, this means any thread he participates in gets messed up
    somewhat.

    Good work. It took me quite a bit of effort to figure that one out,
    even after I knew which bad Debian patch caused it.

    [...]
    Before I do so, or begin to experiment, I thought I would ask if this is a
    known problem with a existing solution?

    It is a known problem caused by another bad Debian patch. See some of
    the gory details in the post at
    <http://mail.python.org/pipermail/mailman-users/2009-July/066610.html>
    and related posts.

    The cure is to replace the debian patch with the one at
    <http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/revision/1186>.

    (That URL is currently returning "internal server error". This is
    generally a temporary Launchpad condition which will correct itself.
    If you can't get the patch, let me know and I'll send it.)

    The bad Debian patch takes statements similar to

    myThreadKey = parent.threadKey + article.date + '-'

    in five places in pipermail.py and replaces "article.date + '-'" with
    "article.date + '/' + article.msgid + '-'".

    The correct fix is to replace "article.date + '/' + article.msgid +
    '-'" in the Debian patch with "article.date + '.' +
    str(article.sequence) + '-'".

    Or, you can go to
    <http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/annotate/head%3A/Mailman/Archiver/pipermail.py>
    and look at the 5 groups of one or two lines marked revision 1186.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Barry Finkel at Apr 21, 2010 at 4:28 pm

    Mark Sapiro wrote:
    That said, I suspect the underlying OS here is Debian/Ubuntu and
    Mailman is the Debian/Ubuntu package which has patches in this area
    which are causing this.
    As I have written on this forum before, the Debian/Ubuntu package
    for Mailman has a large number of patches. One I know is definitely
    wrong. Most of the others had no documentation, so I could not
    determine what the patch was doing. For three security patches,
    two matched the SourceForge source, and one did not. As I could not
    determine exactly what I would be running, I decided to build my
    own Mailman package from the SourceForge source. The only patch
    I kept was one that places file in proper libraries for Ubuntu/Debian.
    I can provide details for anyone who is interested.
    ----------------------------------------------------------------------
    Barry S. Finkel
    Computing and Information Systems Division
    Argonne National Laboratory Phone: +1 (630) 252-7277
    9700 South Cass Avenue Facsimile:+1 (630) 252-4601
    Building 240, Room 5.B.8 Internet: BSFinkel at anl.gov
    Argonne, IL 60439-4828 IBMMAIL: I1004994

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedApr 20, '10 at 9:14a
activeApr 24, '10 at 2:20p
posts14
users4
websitelist.org

People

Translate

site design / logo © 2022 Grokbase