FAQ
Hello world,

I have a very strange performance problem which only affects one small
announce-only list with approximately 11000 recipients: The smtp
logfile shows that it takes Mailman about 8400 seconds to deliver the
mails, which just doesn't make sense.

Setup: Mailman is configured to deliver outgoing mails to a dedicated
Postfix smtpd(8) daemon listening on port 10031. That daemon is
configured with all the usual stuff, no DNS lookups, no recipient
verification, no pre-queue filters, dedicated DNS caches and so on (if
it really matters, I can post the complete configuration to this
list). VERP and personalization are turned off on the Mailman side.
Postfix version is 2.8-20100213, but I ran a quick test with 2.6.5
just to be safe, and it doesn't change anything. Mailman version is
2.1.11 from Debian/stable.

After I first noticed the problem, I checked the logs - nothing
suspicious there. So I decided to to take some TCP captures.

For all other lists on this server, the conversation between Postfix
and Mailman is very fast paced, but for that one list, it takes almost
one second for a recipient to be specified (which is then acknowledged
immediately by Postfix).

I really don't have any idea where I coul start debugging, or how.
Normally it is the MTAs performance that people need to worry about,
but that particular mailserver isn't busy at all, not even handling
550k messages per day. Posting to all other lists only takes a
fraction of time, even ones that are much larger. There are no old
queue files around, no exceptions being thrown, no process running at
100% CPU load during delivery, nothing shunted - it's just slow.

As a quick workaround, I've increased the overall parallelism, as Ian
Eiloart pointed out in [1], to ensure that the one slow list doesn't
block anything, so the issue isn't really "top priority" - I'd be very
grateful for any hints, though.


Stefan

[1] http://mail.python.org/pipermail/mailman-developers/2009-June/020643.html

Search Discussions

  • Mark Sapiro at Feb 20, 2010 at 5:02 pm

    On 2/20/2010 4:21 AM, Stefan Foerster wrote:
    For all other lists on this server, the conversation between Postfix
    and Mailman is very fast paced, but for that one list, it takes almost
    one second for a recipient to be specified (which is then acknowledged
    immediately by Postfix).

    So, without VERP or personalization, you should be seeing SMTP
    transactions that look like

    HELO
    response
    MAIL FROM
    response
    RCPT TO
    response
    (repeated for up to SMTP_MAX_RCPTS recipients)
    DATA
    response
    (message data)
    (MAIL FROM through DATA repeats until all recipients are delivered)
    QUIT

    And, if I understand what you're saying, the delay is in the RCPT
    TO/response loop and it occurs between the response and the next RCPT TO.

    This is really wierd. There is not even any Mailman code involved in
    this. The entire sequence from MAIL FROM to end of DATA is done by one
    call to the Python smtplib.SMTP.sendmail() method.

    There is nothing list specific other than the envelope sender in the
    sendmail() call/MAIL FROM command in Mailman's interaction with smtplib,
    and if it were related somehow to that, I would expect the delay to be
    in Postfix between the RCPT TO and response.

    I really don't have any idea where I coul start debugging, or how.

    Nor do I really. You could look at the FAQ at
    <http://wiki.list.org/x/-IA9> for the way to enable smtplib debugging
    (as noted in the FAQ, only for Python 2.4.x and newer). This will
    produce voluminous Mailman error log output which may help pinpoint
    where in smtplib.py the delay is, but probably the time-stamp
    granularity is not fine enough.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Stefan Foerster at Feb 20, 2010 at 6:27 pm

    * Mark Sapiro <mark at msapiro.net>:
    On 2/20/2010 4:21 AM, Stefan Foerster wrote:
    So, without VERP or personalization, you should be seeing SMTP
    transactions that look like

    HELO
    response
    MAIL FROM
    response
    RCPT TO
    response
    (repeated for up to SMTP_MAX_RCPTS recipients)
    DATA
    response
    (message data)
    (MAIL FROM through DATA repeats until all recipients are delivered)
    QUIT
    Yes. I assume you wanted me to check if there were any errors in this
    dialogue - but there are none. Neither logfiles from the Python
    smtplib, nor the the Postfix logs. The TCP caputure didn't show any
    errors, too. And yes, I verified that there really isn't any VERP or
    personalization involved (again, I read Postfix logs ("nrcpt=<large
    number">) and TCP streams).
    And, if I understand what you're saying, the delay is in the RCPT
    TO/response loop and it occurs between the response and the next RCPT TO.
    Yes. From debuglevel(1) logs:

    Feb 20 19:03:15 2010 qrunner(7551): send: 'rcpt TO:<recipient1 at example.com>\r\n'
    Feb 20 19:03:15 2010 qrunner(7551): reply: '250 2.1.5 Ok\r\n'
    Feb 20 19:03:15 2010 qrunner(7551): reply: retcode (250); Msg: 2.1.5 Ok
    Feb 20 19:03:17 2010 qrunner(7551): send: 'rcpt TO:<recipient2 at example.com>\r\n'
    Feb 20 19:03:17 2010 qrunner(7551): reply: '250 2.1.5 Ok\r\n'
    Feb 20 19:03:17 2010 qrunner(7551): reply: retcode (250); Msg: 2.1.5 Ok

    If you want me to, I can gather detailed timing data with tcpdump
    and/or wireshark.
    I really don't have any idea where I coul start debugging, or how.

    Nor do I really.
    Humurous remark: "That's not what you want to hear from the guy who
    actually wrote the application!" ;-)

    I'm running out of ideas. My Postfix smtpd(8) for mailman looks like that:

    127.0.0.1:10031 inet n - - - - smtpd
    -o mynetworks7.0.0.0/8
    -o content_filter -o smtpd_proxy_filter -o receive_override_options=no_header_body_checks,no_address_mappings,no_unknown_recipient_checks
    -o smtpd_client_connection_count_limit=0
    -o smtpd_client_connection_rate_limit=0
    -o smtpd_error_sleep_time=0
    -o smtpd_soft_error_limit01
    -o smtpd_hard_error_limit00
    -o smtpd_restriction_classes -o smtpd_client_restrictions -o smtpd_helo_restrictions -o smtpd_sender_restrictions -o smtpd_recipient_restrictions=permit_mynetworks,reject
    -o smtpd_data_restrictions -o smtpd_end_of_data_restrictions -o smtpd_authorized_xforward_hosts7.0.0.0/8
    -o syslog_name=postfix-mm

    No magic involved here. My mm_cfg.py:

    MAILMAN_SITE_LIST = 'mailman'
    DEFAULT_URL_PATTERN = 'http://%s/mailman/'
    PRIVATE_ARCHIVE_URL = '/mailman/private'
    IMAGE_LOGOS = '/images/mailman/'
    DEFAULT_EMAIL_HOST = 'lists.example.com'
    DEFAULT_URL_HOST = 'lists.example.com'
    add_virtualhost(DEFAULT_URL_HOST, DEFAULT_EMAIL_HOST)
    DEFAULT_SERVER_LANGUAGE = 'en'
    DEFAULT_SEND_REMINDERS = 0
    USE_ENVELOPE_SENDER = 0
    MTA=None # Misnomer, suppresses alias output on newlist
    DEB_LISTMASTER='postmaster at example.net'
    SMTPPORT = 10031

    I am not able to find anything that is specific to this list, and I'm
    not sure where I could look further.


    Stefan
  • Mark Sapiro at Feb 20, 2010 at 8:17 pm

    On 2/20/2010 10:27 AM, Stefan Foerster wrote:
    Yes. From debuglevel(1) logs:

    Feb 20 19:03:15 2010 qrunner(7551): send: 'rcpt TO:<recipient1 at example.com>\r\n'
    Feb 20 19:03:15 2010 qrunner(7551): reply: '250 2.1.5 Ok\r\n'
    Feb 20 19:03:15 2010 qrunner(7551): reply: retcode (250); Msg: 2.1.5 Ok
    Feb 20 19:03:17 2010 qrunner(7551): send: 'rcpt TO:<recipient2 at example.com>\r\n'
    Feb 20 19:03:17 2010 qrunner(7551): reply: '250 2.1.5 Ok\r\n'
    Feb 20 19:03:17 2010 qrunner(7551): reply: retcode (250); Msg: 2.1.5 Ok

    So, in the above we see greater than 1 second between

    reply: retcode (250); Msg: 2.1.5 Ok

    and the next

    send: 'rcpt TO:<recipient2 at example.com>\r\n'

    but virtually nothing occurs between those two events. We are in the
    sendmail method in a for loop over the recipient list. The first of
    those two messages is written at the end of getreply() which returns to
    rcpt() which returns to the for loop which checks the status and calls
    rcpt() again with the next recipient. rcpt() calls putcmd() which calls
    send() which writes the second message before doing anything else. There
    are no system calls of any kind (other than writing the messages
    themselves, but the delay exists without logging) in between those two
    messages.

    If you want me to, I can gather detailed timing data with tcpdump
    and/or wireshark.

    Presumably it will just show the delay between the response to one RCPT
    TO and the sending of the next RCPT TO. The delay in the above log
    narrows it even further.

    And none of this is list specific, yet it only affects one list.

    You could try strace or ?? on the OutgoingRunner, but I don't know what
    that might show beyond what we already know.

    Does this delay occur uniformly over the entire list, or only within
    some group of recipients?

    You could try running OutgoingRunner with Python's trace module
    <http://docs.python.org/library/trace.html#command-line-usage>, e.g.

    python -m trace [trace opts] bin/qrunner --runner=OutgoingRunner:0:1

    To do this, you'd probably want to stop OutgoingRunner(s), post to the
    list and then stop Mailman so you have only the one message to this list
    in the out/ queue, and then run the trace as above, but I would only do
    this as a last ditch effort, because I'm not sure it would be helpful.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Stefan Foerster at Feb 20, 2010 at 8:56 pm

    * Mark Sapiro <mark at msapiro.net>:
    Does this delay occur uniformly over the entire list, or only within
    some group of recipients?
    It occurs for all recipients, more or less - sometimes, it gets about
    5 recipients done per second, but that's still far too slow.
    You could try running OutgoingRunner with Python's trace module
    <http://docs.python.org/library/trace.html#command-line-usage>, e.g.

    python -m trace [trace opts] bin/qrunner --runner=OutgoingRunner:0:1

    To do this, you'd probably want to stop OutgoingRunner(s), post to the
    list and then stop Mailman so you have only the one message to this list
    in the out/ queue, and then run the trace as above, but I would only do
    this as a last ditch effort, because I'm not sure it would be helpful.
    I fear I've got a decision to make here: To "fix" that problem, I'd
    normally simply export the recipient list and recreate the mailing
    list thereafter. But since we don't know what causes this behaviour, I
    can't be sure that my backups include all files I need to recreate
    that problem on a different machine for debugging purposes.

    So, if you are personally interested in this, I would talk to a lawyer
    to find a way how I can legally provide you with a copy of every file
    that is in any way related to this list.

    If you are not _that_ interested, I'd just go ahead and wipe that
    list (and cross fingers).

    Thank you for your time and your insightful comments.


    Stefan
  • Ralf Hildebrandt at Feb 20, 2010 at 9:12 pm

    * Stefan Foerster <cite+mailman-users at incertum.net>:

    I fear I've got a decision to make here: To "fix" that problem, I'd
    normally simply export the recipient list and recreate the mailing list
    thereafter.
    Is this guaranteed to help?
    Have you tried this?

    --
    Ralf Hildebrandt
    Gesch?ftsbereich IT | Abteilung Netzwerk
    Charit? - Universit?tsmedizin Berlin
    Campus Benjamin Franklin
    Hindenburgdamm 30 | D-12203 Berlin
    Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
    ralf.hildebrandt at charite.de | http://www.charite.de
  • Barry Warsaw at Feb 20, 2010 at 9:14 pm

    On Feb 20, 2010, at 09:56 PM, Stefan Foerster wrote:
    I fear I've got a decision to make here: To "fix" that problem, I'd
    normally simply export the recipient list and recreate the mailing
    list thereafter. But since we don't know what causes this behaviour, I
    can't be sure that my backups include all files I need to recreate
    that problem on a different machine for debugging purposes.

    So, if you are personally interested in this, I would talk to a lawyer
    to find a way how I can legally provide you with a copy of every file
    that is in any way related to this list.

    If you are not _that_ interested, I'd just go ahead and wipe that
    list (and cross fingers).

    Thank you for your time and your insightful comments.
    Have you tried any of the Postfix debugging strategies?

    http://www.postfix.org/DEBUG_README.html

    -Barry
    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: signature.asc
    Type: application/pgp-signature
    Size: 835 bytes
    Desc: not available
    URL: <http://mail.python.org/pipermail/mailman-users/attachments/20100220/a4a6b8b7/attachment.pgp>
  • Ralf Hildebrandt at Feb 20, 2010 at 9:17 pm

    * Barry Warsaw <barry at python.org>:

    Have you tried any of the Postfix debugging strategies?

    http://www.postfix.org/DEBUG_README.html
    Yes he did.
    Stefan usually knows what he's doing :)

    --
    Ralf Hildebrandt
    Gesch?ftsbereich IT | Abteilung Netzwerk
    Charit? - Universit?tsmedizin Berlin
    Campus Benjamin Franklin
    Hindenburgdamm 30 | D-12203 Berlin
    Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962
    ralf.hildebrandt at charite.de | http://www.charite.de
  • Barry Warsaw at Feb 20, 2010 at 9:20 pm

    On Feb 20, 2010, at 10:17 PM, Ralf Hildebrandt wrote:

    Have you tried any of the Postfix debugging strategies?

    http://www.postfix.org/DEBUG_README.html
    Yes he did.
    Stefan usually knows what he's doing :)
    Ah, sorry about that!

    culling-inbox-during-pycon-talk-ly y'rs,
    -Barry
    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: signature.asc
    Type: application/pgp-signature
    Size: 835 bytes
    Desc: not available
    URL: <http://mail.python.org/pipermail/mailman-users/attachments/20100220/80a76659/attachment.pgp>
  • Stefan Foerster at Feb 20, 2010 at 10:01 pm

    * Barry Warsaw <barry at python.org>:
    On Feb 20, 2010, at 09:56 PM, Stefan Foerster wrote:
    So, if you are personally interested in this, I would talk to a lawyer
    to find a way how I can legally provide you with a copy of every file
    that is in any way related to this list.

    If you are not _that_ interested, I'd just go ahead and wipe that
    list (and cross fingers).

    Thank you for your time and your insightful comments.
    Have you tried any of the Postfix debugging strategies?

    http://www.postfix.org/DEBUG_README.html
    As expected, Postfix is not the culprit. Delivery to smtp-sink is
    running at the speed of molasses, too.


    Stefan
  • Barry Warsaw at Feb 21, 2010 at 2:38 pm

    On Feb 20, 2010, at 11:01 PM, Stefan Foerster wrote:
    As expected, Postfix is not the culprit. Delivery to smtp-sink is
    running at the speed of molasses, too.
    Now this is getting interesting <wink>.

    http://mail.python.org/pipermail/mailman-users/2010-February/068829.html

    has some perplexing numbers. If you're really seeing a 2 second delay between
    the reading of one RCPT reply to the next, then this points to problems in
    Python or its smtplib module. I did a quick search through the Python bug
    tracker and nothing jumped out at me.

    As Mark said, Mailman basically just calls SMTP.sendmail() to send the message
    to each chunk of recipients. The part of that method that sends the RCPTs to
    Postfix is this code (in Py2.6):

    for each in to_addrs:
    (code,resp)=self.rcpt(each, rcpt_options)
    if (code != 250) and (code != 251):
    senderrs[each]=(code,resp)

    It's hard to see what would cause that loop to sit there between the 19:03:15
    retcode and 19:03:17 send. You're not even touching the socket between these
    calls. Looking at putcmd() and getreply() and the way they're called, I just
    don't see any opportunity for hanging. I suppose it's possible you're setting
    Python issues, but that doesn't really explain why it would affect only this
    list.

    I probably missed it but what platform are you running on? What version of
    Python?

    I see that you've worked around the problem, which of course only adds oddness.
    If you're still able and interested in debugging this, I can think of a couple
    of things to do. Let me know and I'll lay out a few ideas.

    -Barry

    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: signature.asc
    Type: application/pgp-signature
    Size: 835 bytes
    Desc: not available
    URL: <http://mail.python.org/pipermail/mailman-users/attachments/20100221/205c59b3/attachment.pgp>
  • Mark Sapiro at Feb 20, 2010 at 9:27 pm

    On 2/20/2010 12:56 PM, Stefan Foerster wrote:
    I fear I've got a decision to make here: To "fix" that problem, I'd
    normally simply export the recipient list and recreate the mailing
    list thereafter. But since we don't know what causes this behaviour, I
    can't be sure that my backups include all files I need to recreate
    that problem on a different machine for debugging purposes.

    Assuming your list doesn't use any custom MemberAdaptor, the
    lists/LISTNAME/config.pck file is the only list specific thing that
    could be involved. These get continuously updated, bet since the problem
    is persistent, any one since the problem started should do.

    OF course, if you just drop this config.pck into some other Mailman
    installation for testing, there's no guarantee you'd see the problem. At
    a minimum, you'd want the same Mailman version and Python version. I
    think you said you'd tried a different Postfix and it didn't change things.

    So, if you are personally interested in this, I would talk to a lawyer
    to find a way how I can legally provide you with a copy of every file
    that is in any way related to this list.

    As I said, I think it would just be the config.pck. Everything else is
    open source software, but I don't think I want it. It's not that I'm not
    curious because I definitely am, but I don't want to accidentally send
    mail to any of the list members. I suppose I could just create a pseudo
    MTA to listen on the SMTPPORT you use and just respond with 250 to every
    message.

    Actually, you could try that too and see what it does with your list.
    I'll make a little Python script for that.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Barry Warsaw at Feb 20, 2010 at 9:38 pm

    On Feb 20, 2010, at 01:27 PM, Mark Sapiro wrote:
    As I said, I think it would just be the config.pck. Everything else is
    open source software, but I don't think I want it. It's not that I'm not
    curious because I definitely am, but I don't want to accidentally send
    mail to any of the list members. I suppose I could just create a pseudo
    MTA to listen on the SMTPPORT you use and just respond with 250 to every
    message.

    Actually, you could try that too and see what it does with your list.
    I'll make a little Python script for that.
    Take a look at lazr.smtptest, which is what MM3 uses in its test framework.

    https://edge.launchpad.net/lazr.smtptest

    -Barry
    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: signature.asc
    Type: application/pgp-signature
    Size: 835 bytes
    Desc: not available
    URL: <http://mail.python.org/pipermail/mailman-users/attachments/20100220/1a46bee3/attachment.pgp>
  • Mark Sapiro at Feb 20, 2010 at 10:00 pm

    On 2/20/2010 1:38 PM, Barry Warsaw wrote:
    Take a look at lazr.smtptest, which is what MM3 uses in its test framework.

    https://edge.launchpad.net/lazr.smtptest

    Thanks Barry,

    That's helpful.

    - --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Brad Knowles at Feb 20, 2010 at 9:51 pm

    On Feb 20, 2010, at 3:27 PM, Mark Sapiro wrote:

    As I said, I think it would just be the config.pck. Everything else is
    open source software, but I don't think I want it. It's not that I'm not
    curious because I definitely am, but I don't want to accidentally send
    mail to any of the list members.
    Another test would be to break up the large list into a number of smaller sub-lists with an umbrella list. That would allow Mailman to have a lot more internal parallelism, and not get into lock synchronization issues over config.pck. You could also run multiple sets of qrunners, if you split things correctly according to the "powers of 2" rule.
    I suppose I could just create a pseudo
    MTA to listen on the SMTPPORT you use and just respond with 250 to every
    message.

    Actually, you could try that too and see what it does with your list.
    I'll make a little Python script for that.
    Much simpler solution here is to use the "smtpsink" program that Wietse supplies as part of the test harness for postfix.

    --
    Brad Knowles <bradknowles at shub-internet.org>
    LinkedIn Profile: <http://tinyurl.com/y8kpxu>
  • Stefan Foerster at Feb 20, 2010 at 10:15 pm

    * Brad Knowles <brad at shub-internet.org>:
    On Feb 20, 2010, at 3:27 PM, Mark Sapiro wrote:

    As I said, I think it would just be the config.pck. Everything
    else is open source software, but I don't think I want it. It's
    not that I'm not curious because I definitely am, but I don't want
    to accidentally send mail to any of the list members.
    Another test would be to break up the large list into a number of
    smaller sub-lists with an umbrella list. That would allow Mailman
    to have a lot more internal parallelism, and not get into lock
    synchronization issues over config.pck. You could also run multiple
    sets of qrunners, if you split things correctly according to the
    "powers of 2" rule.
    What is a "smaller sub-list"? The list in question does only hold 11k
    recipients, which is not exactly large. Some off my SVN announce lists
    are much larger.


    Stefan
  • Chr. von Stuckrad at Feb 20, 2010 at 11:05 pm
    An embedded and charset-unspecified text was scrubbed...
    Name: not available
    URL: <http://mail.python.org/pipermail/mailman-users/attachments/20100221/debf275a/attachment.ksh>
  • Brad Knowles at Feb 21, 2010 at 6:13 am

    On Feb 20, 2010, at 4:15 PM, Stefan Foerster wrote:

    What is a "smaller sub-list"? The list in question does only hold 11k
    recipients, which is not exactly large. Some off my SVN announce lists
    are much larger.
    Yeah, but an announce-only list that is larger doesn't really compare to a discussion list which is smaller. The smaller discussion list is likely to be much more active, and if you multiply the number of unique messages posted to the list by the number of subscribers, you may find that the smaller discussion list actually results in considerably more traffic than the larger announce-only list.

    Now, I'm not saying that this is definitely the case. And for just 11k users, it does seem unlikely. But this is a possibility that would be useful to eliminate.

    In your case, I think I'd probably first try the multiple queue-runners thing by doing powers-of-2 splits. Regretfully, this is not well documented, but I think there is one or two FAQ Wiki questions that discuss it.

    --
    Brad Knowles <bradknowles at shub-internet.org>
    LinkedIn Profile: <http://tinyurl.com/y8kpxu>
  • Stefan Foerster at Feb 20, 2010 at 10:10 pm

    * Mark Sapiro <mark at msapiro.net>:
    On 2/20/2010 12:56 PM, Stefan Foerster wrote:
    So, if you are personally interested in this, I would talk to a lawyer
    to find a way how I can legally provide you with a copy of every file
    that is in any way related to this list.
    As I said, I think it would just be the config.pck. Everything else is
    open source software, but I don't think I want it. It's not that I'm not
    curious because I definitely am, but I don't want to accidentally send
    mail to any of the list members. I suppose I could just create a pseudo
    MTA to listen on the SMTPPORT you use and just respond with 250 to every
    message.
    A plan! I will have a look at the wiki to see how I go about moving a
    list to another host. Then, tomorrow morning (it's 11pm here), I'll
    setup a VM, install the same set of packages and copy over anything
    that is closely mailman related.
    Actually, you could try that too and see what it does with your list.
    I'll make a little Python script for that.
    I did the testing with smtp-sink, which is running just fine. I'll
    report back if I can reproduce the problem on the virtual machine.

    Ralf just had the idea to cimpare the output from "config_list" to
    that of another announce-only list, but apart from owners, messages
    and the ususal stuff, there is absolutely no difference.


    Stefan
  • Stefan Foerster at Feb 21, 2010 at 10:15 am

    * Stefan Foerster <cite+mailman-users at incertum.net>:
    * Mark Sapiro <mark at msapiro.net>:
    As I said, I think it would just be the config.pck. Everything else is
    open source software, but I don't think I want it. It's not that I'm not
    curious because I definitely am, but I don't want to accidentally send
    mail to any of the list members. I suppose I could just create a pseudo
    MTA to listen on the SMTPPORT you use and just respond with 250 to every
    message.
    A plan! I will have a look at the wiki to see how I go about moving a
    list to another host. Then, tomorrow morning (it's 11pm here), I'll
    setup a VM, install the same set of packages and copy over anything
    that is closely mailman related.
    Bad news. I was not able to reproduce the problem on a VM, using
    backups from the day the problem first occured. And worse, this night,
    while I slept a troubled, disturbed sleep, dreaming of SMTP dialogues,
    the list roster changed (one new member)- and the problem is gone.

    I've been doing system administration tasks since 1997, and this still
    feels more like voodoo than science, sometimes.


    Stefan
  • Mark Sapiro at Feb 21, 2010 at 9:05 pm

    On 2/21/2010 2:15 AM, Stefan Foerster wrote:
    Bad news. I was not able to reproduce the problem on a VM, using
    backups from the day the problem first occured. And worse, this night,
    while I slept a troubled, disturbed sleep, dreaming of SMTP dialogues,
    the list roster changed (one new member)- and the problem is gone.

    Since I couldn't understand what possibly caused the problem in the
    first place, I'm not totally surprised.

    I've been doing system administration tasks since 1997, and this still
    feels more like voodoo than science, sometimes.

    Yes, it does, but I've found that there usually is an explanation. It's
    just that finding it may not be easy.

    You could try taking the problem list's config.pck from the backup and
    dropping that into a lists/ directory with a different name (effectively
    creating a new list with the exact configuration of the old one).

    Then you could install this withlist script in Mailman's bin/ directory
    as bin/test_smtp.py

    -----------------------------------------------------------
    from Mailman import mm_cfg
    mm_cfg.SMTPPORT = 10123 # or whatever you want
    from Mailman import Message
    from Mailman.Handlers import CalcRecips
    from Mailman.Handlers import SMTPDirect

    def test_smtp(mlist):
    msg = Message.Message()
    msg['From'] = 'the usual poster to the list'
    msg.set_payload('message body')
    msgdata = {}
    CalcRecips.process(mlist, msg, msgdata)
    SMTPDirect.process(mlist, msg, msgdata)
    -----------------------------------------------------------

    And then run smtp-sink on the port defined in the script and run

    bin/withlist -r test_smtp listname

    where listname is the name of the new lists/ directory into which you
    put the old config.pck. This will short circuit a lot of Mailman stuff
    and strip it down to building the recipient list and sending the mail to
    the smtp-sink port.

    Possibly you've already done something similar, but this would give you
    a low impact way to determine if you can duplicate the problem with the
    old config.pck.

    Unfortunately, I don't have any good ideas as to how to proceed from
    there, even if this does duplicate the problem, but Barry indicated he
    has a couple of ideas.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Stefan Foerster at Mar 10, 2010 at 4:51 am

    * Mark Sapiro <mark at msapiro.net>:
    On 2/21/2010 2:15 AM, Stefan Foerster wrote:

    Bad news. I was not able to reproduce the problem on a VM, using
    backups from the day the problem first occured. And worse, this night,
    while I slept a troubled, disturbed sleep, dreaming of SMTP dialogues,
    the list roster changed (one new member)- and the problem is gone.
    Since I couldn't understand what possibly caused the problem in the
    first place, I'm not totally surprised.
    Good news (kinda) - another list on that server just started to slow
    down, and this time, it is a very unimportant and small list (472
    members, 466 of them have mail delivery enabled), so I can take all
    the time in the world to try and debug this issue.

    [instructions for list duplication /SMTP redirection]
    Unfortunately, I don't have any good ideas as to how to proceed from
    there, even if this does duplicate the problem, but Barry indicated he
    has a couple of ideas.
    Well, unfortunately, this doesn't reproduce the problem. Neither does
    stopping Mailman and copying every single file to another server.
    However, restarting Mailman (something I don't do very often) does
    _not_ solve the problem, either.

    Do you think I can drop Barry a PM off-list and ask him for further
    advice if he doesn't read this? I'm really interested in debugging
    this, and as I said, this time I really don't care about the list
    delivery being slow.


    Stefan
  • Mark Sapiro at Mar 10, 2010 at 3:11 pm

    Stefan Foerster wrote:
    * Mark Sapiro <mark at msapiro.net>:

    [instructions for list duplication /SMTP redirection]
    Unfortunately, I don't have any good ideas as to how to proceed from
    there, even if this does duplicate the problem, but Barry indicated he
    has a couple of ideas.
    Well, unfortunately, this doesn't reproduce the problem. Neither does
    stopping Mailman and copying every single file to another server.
    However, restarting Mailman (something I don't do very often) does
    _not_ solve the problem, either.

    Can you update/upgrade or simply reinstall Python on this server? The
    delays you observed _must_ be occurring in the Python interpreter
    itself, but this seems _impossible_ since the interpreter shouldn't be
    affected by which list or a change in list membership.

    I wonder if there could somehow be some interaction through the file
    system.

    Do you think I can drop Barry a PM off-list and ask him for further
    advice if he doesn't read this? I'm really interested in debugging
    this, and as I said, this time I really don't care about the list
    delivery being slow.

    Barry is often on the #mailman irc channel at freenode.net. It might be
    best to ping him there.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Barry Warsaw at Mar 18, 2010 at 2:25 pm

    On Mar 10, 2010, at 05:51 AM, Stefan Foerster wrote:
    Good news (kinda) - another list on that server just started to slow
    down, and this time, it is a very unimportant and small list (472
    members, 466 of them have mail delivery enabled), so I can take all
    the time in the world to try and debug this issue.
    I agree with Mark that this sounds like a problem with the Python interpreter.
    I just don't see what could be causing Mailman to slow down. I think if you
    want to continue to debug this, it will involve hacking SMTPDirect.py or
    replacing it with a simpler but instrumented handler for the list in question.
    Do you want to go down that route? (Installing say Python 2.6.5 and
    rebuilding Mailman might be an easier first step.)
    Do you think I can drop Barry a PM off-list and ask him for further
    advice if he doesn't read this? I'm really interested in debugging
    this, and as I said, this time I really don't care about the list
    delivery being slow.
    I read this list, but usually just skim it and sometimes it can take a long
    while to respond.

    -Barry
    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: signature.asc
    Type: application/pgp-signature
    Size: 836 bytes
    Desc: not available
    URL: <http://mail.python.org/pipermail/mailman-users/attachments/20100318/27cd6891/attachment.pgp>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedFeb 20, '10 at 12:21p
activeMar 18, '10 at 2:25p
posts24
users7
websitelist.org

People

Translate

site design / logo © 2022 Grokbase