FAQ
Guys --

I have a problem with the monthly archives from Mailman 2.1.11 (using
Pipermail 0.09 [default]) and wonder if there is any help available.
I've read several FAQs, but no luck in understanding the problem yet.

The list is a announce-only type in Russian (Cyrillic), but the
default language is set to English (so I can read the admin pages and
complete the necessary tasks). As I believe Mark mentioned before,
this means that the messages themselves (sent by the Russian
Moderator team using Outlook Express or webmail and either
windows-1251 or KOI8-R encondings) arrive at Mailman and are
distributed in email as with their original encodings. However, the
mailman archive in this configuration seems to save the messages as
HTML entity codes which display fine in the Mailman archive as single
messages, but are unreadable once they get to the monthly archive
(all "?????").

1) What can be done to get the monthly archive in a readable format?
2) Is there any way to correct the existing monthly archives?

Thanks in Advance (and have a good weekend),
Drew Tenenholz

Search Discussions

  • Mark Sapiro at Mar 27, 2009 at 8:09 pm

    Drew Tenenholz wrote:
    The list is a announce-only type in Russian (Cyrillic), but the
    default language is set to English (so I can read the admin pages and
    complete the necessary tasks). As I believe Mark mentioned before,
    this means that the messages themselves (sent by the Russian
    Moderator team using Outlook Express or webmail and either
    windows-1251 or KOI8-R encondings) arrive at Mailman and are
    distributed in email as with their original encodings. However, the
    mailman archive in this configuration seems to save the messages as
    HTML entity codes which display fine in the Mailman archive as single
    messages, but are unreadable once they get to the monthly archive
    (all "?????").

    By monthly archive, I assume you mean the .txt and/or .txt.gz files. Is
    that correct?

    1) What can be done to get the monthly archive in a readable format?

    Either set the list's preferred language to Russian (and navigate
    through the admin pages by position), or set Mailman's character set
    for English to UTF-8 by putting the following line in mm_cfg.py.

    add_language('en', 'English (USA)', 'utf-8', 'ltr')

    2) Is there any way to correct the existing monthly archives?

    The messages in the cumulative
    archives/private/LISTNAME.mbox/LISTNAME.mbox file are all in their
    original charset and encoding, so if you do 1), you can then rebuild
    the archive with bin/arch --wipe and that will rebuild the .txt files
    with the new charset.

    One thing to be aware of though is that although the monthly .txt files
    look like .mbox files, they don't contain complete message headers. In
    particular, even though the character set may now be utf-8 or koi8-r,
    there are no content-type or other headers in the file to so indicate.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Stephen J. Turnbull at Mar 28, 2009 at 6:40 am
    Mark Sapiro writes:
    add_language('en', 'English (USA)', 'utf-8', 'ltr')
    Shouldn't this probably be default by now? I'm usually pretty
    hesitant to mess with existing defaults, but as I understand this
    there's actually corrupt data in the .txt files. This can be a real
    headache for people using cPanel or whatever, right?
    One thing to be aware of though is that although the monthly .txt files
    look like .mbox files, they don't contain complete message headers. In
    particular, even though the character set may now be utf-8 or koi8-r,
    there are no content-type or other headers in the file to so
    indicate.
    "Bad Pipermail! Baad, baaad Pipermail!" Or am I missing something?
    Shouldn't the .txt files have a simple text/plain;charset=WHATEVER
    MIME Content-Type?
  • Mark Sapiro at Mar 28, 2009 at 3:23 pm

    Stephen J. Turnbull wrote:
    Mark Sapiro writes:
    add_language('en', 'English (USA)', 'utf-8', 'ltr')
    Shouldn't this probably be default by now?

    Yes, it should. But, we have superstitious beliefs that something
    unintended will be broken by this. Yet, I continue to suggest it as a
    workaround, and I've never recieved a report of a problem, so I will
    at least test it as default for Mailman 2.2.

    One thing to be aware of though is that although the monthly .txt files
    look like .mbox files, they don't contain complete message headers. In
    particular, even though the character set may now be utf-8 or koi8-r,
    there are no content-type or other headers in the file to so
    indicate.
    "Bad Pipermail! Baad, baaad Pipermail!" Or am I missing something?
    Shouldn't the .txt files have a simple text/plain;charset=WHATEVER
    MIME Content-Type?

    The issue is the .txt files for public archives are served directly by
    the web browser, not through a Mailman CGI, so it's entirely up to the
    web browser to specify the charset.

    We could put a Content-Type: at the head of the file, but in most
    cases, this would just be served as part of the text. Still, it would
    be useful information for the recipient, so I will look at that too.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Barry Warsaw at Mar 28, 2009 at 4:05 pm

    On Mar 28, 2009, at 10:23 AM, Mark Sapiro wrote:

    Stephen J. Turnbull wrote:
    Mark Sapiro writes:
    add_language('en', 'English (USA)', 'utf-8', 'ltr')
    Shouldn't this probably be default by now?

    Yes, it should. But, we have superstitious beliefs that something
    unintended will be broken by this. Yet, I continue to suggest it as a
    workaround, and I've never recieved a report of a problem, so I will
    at least test it as default for Mailman 2.2.
    +1 for making this the default in 2.2 (and 3.0). I do have concerns
    about changing this in 2.1 just because it's /potentially/ risky and
    we should be ultra-conservative with the 2.1 tree.

    BARRY
  • Stephen J. Turnbull at Mar 28, 2009 at 5:50 pm

    Barry Warsaw writes:
    On Mar 28, 2009, at 10:23 AM, Mark Sapiro wrote:

    Yes, it should. But, we have superstitious beliefs that something
    unintended will be broken by [setting the default encoding to
    UTF-8].
    Oh, me too! The Ghost of Charsets Past will undoubtedly rise up to
    remind us of our sins. ;-)
    we should be ultra-conservative with the 2.1 tree.
    Yes indeed. But that shouldn't be a constraint for 2.2 and definitely
    not for 3.0.
  • Stefan Förster at Mar 29, 2009 at 8:43 am

    * Mark Sapiro wrote:
    Stephen J. Turnbull wrote:
    Mark Sapiro writes:
    add_language('en', 'English (USA)', 'utf-8', 'ltr')
    Shouldn't this probably be default by now?

    Yes, it should. But, we have superstitious beliefs that something
    unintended will be broken by this. Yet, I continue to suggest it as a
    workaround, and I've never recieved a report of a problem, so I will
    at least test it as default for Mailman 2.2.
    If changing a language definition like that, doesn't that mean one
    will have to change the various predefined template files, all message
    codes for that language and after that recreate the archive files?


    Ciao
    Stefan
    --
    Stefan F?rster http://www.incertum.net/ Public Key: 0xBBE2A9E9
    "The brain is a wonderful organ. It starts working the moment you get up in
    the morning, and does not stop until you get into the office."
    -- Robert Frost (1874-1963)
  • Stephen J. Turnbull at Mar 29, 2009 at 2:10 pm
    Stefan F?rster writes:
    If changing a language definition like that, doesn't that mean one
    will have to change the various predefined template files, all message
    codes for that language and after that recreate the archive files?
    Normally, yes. But this is *English*. It's reasonably likely that
    the template files etc. are already 100% ASCII, encoded in ASCII, and
    thus encoded in UTF-8.

    Recreating the archive files, yes. But they're most likely currently
    broken anyway.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedMar 27, '09 at 7:37p
activeMar 29, '09 at 2:10p
posts8
users6
websitelist.org

People

Translate

site design / logo © 2022 Grokbase