FAQ
Recently several of the mailing lists we run have developed a strange
problem when generating the pipermail web archive pages. For a period of
about ten days or so, all mail messages posted to the mailing list appear in
the web archive with a subject of "No subject".

When I view the actual messages through the web, they're not complete - in
most cases the first few lines of the message are missing, and in some cases
the entire message is.

If I copy the relevent mbox for the list and open it using my mail client
(mutt), it opens fine and it appears as if all the mail messages are
complete. mutt indexes it as I remember the actual postings being, complete
with subjects. The same is true of all lists that have this problem.

I suspected that the indexes were corrupt, and so I ran arch --wipe on the
list to rebuild the archive in the hopes that it would fix the problem.
Unfortunately it didn't.

We're running mailman 2.1.3 with the July 2003 RSS feed patches from
http://sourceforge.net/tracker/index.php?funcÞtail&aide7951&group_id3&atid00103

Any help or suggestions people might give that will help me regenerate the
archive indexes properly would be appreciated.

- guy
--
Systems Manager, IT Division, Rhodes University, Grahamstown, South Africa
Email: G.Halse at ru.ac.za Web: http://mombe.org/ IRC: rm-rf at irc.zanet.net
*** ANSI Standard Disclaimer *** J.A.P.H

Search Discussions

  • Guy Antony Halse at Dec 24, 2003 at 10:29 am
    It seems that the problem is more interesting that I originally thought.
    What's happening is that arch isn't properly parsing the mbox, and is taking
    paragraphs that begin with a capitalised "From" to be be new mail messages
    (ie, it sees the "From" as a mbox From_ line).

    The no subject header comes because the messages aren't complete messages,
    they're just paragraphs that happen to start with the word "From", for
    example "From the Cisco Report ( 20h00 -> 22h00 )[2]". It only happens if
    the "F" is capitalised and "From" is the first word after a blank line.

    mutt somehow correctly interprets the "From" as part of the text, not an
    mbox From_ line. I'm presuming that it uses regex to match the envelope.

    I'm not sure how lines begining with "From" that aren't From_ lines are
    supposed to be quoted, and whether it is mailman or exim (our MTA) that is
    supposed to be doing the quoting.

    It seems that this only started happening when we upgraded to mailman 2.1.3,
    did anything change in the way it handled mboxes?

    - Guy
    --
    Systems Manager, IT Division, Rhodes University, Grahamstown, South Africa
    Email: G.Halse at ru.ac.za Web: http://mombe.org/ IRC: rm-rf at irc.zanet.net
    *** ANSI Standard Disclaimer *** J.A.P.H
  • Simon White at Dec 24, 2003 at 10:33 am

    24-Dec-03 at 12:29, Guy Antony Halse (g.halse at ru.ac.za) wrote :
    It seems that the problem is more interesting that I originally thought.
    What's happening is that arch isn't properly parsing the mbox, and is taking
    paragraphs that begin with a capitalised "From" to be be new mail messages
    (ie, it sees the "From" as a mbox From_ line).

    The no subject header comes because the messages aren't complete messages,
    they're just paragraphs that happen to start with the word "From", for
    example "From the Cisco Report ( 20h00 -> 22h00 )[2]". It only happens if
    the "F" is capitalised and "From" is the first word after a blank line.

    mutt somehow correctly interprets the "From" as part of the text, not an
    mbox From_ line. I'm presuming that it uses regex to match the envelope.

    I'm not sure how lines begining with "From" that aren't From_ lines are
    supposed to be quoted, and whether it is mailman or exim (our MTA) that is
    supposed to be doing the quoting.
    They are supposed to be quoted as
    From
    However there should be a way to parse the lines a bit better. The From
    lines usually contain a date (should be an RFC formatted date string,
    don't remember which RFC) and so on.

    Quick workaround : use an external archiver like MHonArc, which works
    wonderfully for me, and handles strange character quoting and HTML
    messages better than pipermail...

    --
    Simon White. Internet Consultant, Linux/Windows Server Administration.
    email, dns and web servers; php javascript perl asp; MySQL MSSQL Access
    Bridging the gap between management, HR and the tech team.
  • Guy Antony Halse at Jan 14, 2004 at 8:56 am

    On Wed 2003-12-24 (11:33), Simon White wrote:
    However there should be a way to parse the lines a bit better. The From
    lines usually contain a date (should be an RFC formatted date string,
    don't remember which RFC) and so on.
    I found a reasonably simple solution to this, although it is probably
    somewhat of a hack. I've replaced references to PortableUnixMailbox in
    Mailman/Mailbox.py with UnixMailbox.

    Looking at the source code for the python mailbox modules,
    PortableUnixMailbox simply checks that the first five characters of the line
    are 'From ', while UnixMailbox does a long complicated regex.

    This solution might not be portable (as its name suggests), but it seems to
    work for our MTA (exim). At least I don't have (no subject) in my archives
    any more.

    - Guy
    --
    Systems Manager, IT Division, Rhodes University, Grahamstown, South Africa
    Email: G.Halse at ru.ac.za Web: http://mombe.org/ IRC: rm-rf at irc.zanet.net
    *** ANSI Standard Disclaimer *** J.A.P.H

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedDec 17, '03 at 12:33p
activeJan 14, '04 at 8:56a
posts4
users2
websitelist.org

2 users in discussion

Guy Antony Halse: 3 posts Simon White: 1 post

People

Translate

site design / logo © 2022 Grokbase