FAQ
Recently, I was asked to set our mailman installation to utf-8.
Our server is running the debian 2.1.9 mailman package.
I did this, and it "works" -- but I have a question anyway.

I started with the current settings: default language Finnish,
charset iso-8859-1.

The changes made for utf-8 included:
- changing the template files and the Finnish translation file
into utf-8 with iconv.
- fixing the mm_cfg.py by making DEFAULT_CHARSET = 'UTF-8'
and copying the relevant litany of LC_DESCRIPTIONS from
Defaults.py into mm_cfg.py and setting Finnish into utf-8.

Now, the problem that keeps puzzling me:

1. Before the change, mails with charset=iso-8859-1 and
charset=utf-8 were being distributed with the charset
untouched. I figured out as this is how it should be - right?

2. As our server hosts a lot of lists (almost 400 of them)
I decided to try utf-8 out on a smaller scale first, on an
Ubuntu server, running only a couple of lists with its standard
mailman package, version 2.1.5. The utf-8 change was a success.
The webpages and archive were all okay, and the charset of
mails was untouched too.

3. As things looked promising, I decided to proceed with our
real list server. The result:
Web pages, archive, all okay, now in utf-8.
Mails... all of them in utf-8. And I mean ALL mails.

My question: was this to be expected? Is everything meant
to be in utf-8 from now on, including the forcing of
charset=utf-8 into all list mail headers? And why didn't this
happen neither with the iso-8859-1 settings, nor with the
Ubuntu server?


- Eva
--
Eva Isaksson * eva at vihreat.fi * Eva.Isaksson at Helsinki.Fi
http://www.helsinki.fi/~eisaksso/

Search Discussions

  • Mark Sapiro at Jan 10, 2008 at 9:22 pm

    Eva Isaksson wrote:
    The changes made for utf-8 included:
    - changing the template files and the Finnish translation file
    into utf-8 with iconv.
    - fixing the mm_cfg.py by making DEFAULT_CHARSET = 'UTF-8'
    and copying the relevant litany of LC_DESCRIPTIONS from
    Defaults.py into mm_cfg.py and setting Finnish into utf-8.

    You really only needed to copy the one add_language() that you changed
    (and also leave off the _() around the name) as in

    add_language('fi', 'Finnish', 'utf-8')
    Now, the problem that keeps puzzling me:

    1. Before the change, mails with charset=iso-8859-1 and
    charset=utf-8 were being distributed with the charset
    untouched. I figured out as this is how it should be - right?

    Yes, but not necessarily in all cases.

    2. As our server hosts a lot of lists (almost 400 of them)
    I decided to try utf-8 out on a smaller scale first, on an
    Ubuntu server, running only a couple of lists with its standard
    mailman package, version 2.1.5. The utf-8 change was a success.
    The webpages and archive were all okay, and the charset of
    mails was untouched too.

    3. As things looked promising, I decided to proceed with our
    real list server. The result:
    Web pages, archive, all okay, now in utf-8.
    Mails... all of them in utf-8. And I mean ALL mails.

    My question: was this to be expected? Is everything meant
    to be in utf-8 from now on, including the forcing of
    charset=utf-8 into all list mail headers? And why didn't this
    happen neither with the iso-8859-1 settings, nor with the
    Ubuntu server?

    I'm not certain about all of this, but there are places including
    Scrubber (removing attachments and flattening a message to plain text)
    and adding msg_header and msg_footer where the character set of a
    message can be coerced.

    I did do a very simple test, and I don't see the problem.

    Can you post an example of a test message as sent to a list and the
    corresponding message as received from the list with the character set
    coerced to UTF-8?


    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Eva Isaksson at Jan 10, 2008 at 11:35 pm

    Mark Sapiro:
    I'm not certain about all of this, but there are places including
    Scrubber (removing attachments and flattening a message to plain text)
    and adding msg_header and msg_footer where the character set of a
    message can be coerced.
    Yes, this seems to be the case. Our lists typically have umlaut
    characters in their descriptions. Here's an example:

    List-Id: =?utf-8?q?Vihreiden_vaikuttajien_sisäinen_keskustelulista?
    Using a test list, I was able to find out that a list that has
    only us-ascii in its headers and msg_footer will keep the original
    charset untouched. As soon as any of those has any utf-8 stuff
    in it, it causes a forced charset=utf-8.

    I had spent more than a week trying to figure this out,
    so it's good to find out why this happens.

    - Eva
    --
    Eva Isaksson * eva at vihreat.fi * Eva.Isaksson at Helsinki.Fi
    http://www.helsinki.fi/~eisaksso/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedJan 9, '08 at 2:45p
activeJan 10, '08 at 11:35p
posts3
users2
websitelist.org

2 users in discussion

Eva Isaksson: 2 posts Mark Sapiro: 1 post

People

Translate

site design / logo © 2022 Grokbase