FAQ
We have a problem where in recent weeks we have seen mailman stall on sending emails. The emails end up in the archives and get properly placed in qfiles/out but simply never get sent. A restart of the service "fixes" the problem. We have also upgraded to the latest mailman to make sure that was not an issue.

In the log files, we get the following:

Aug 08 09:44:41 2006 (11723) Cannot connect to SMTP server localhost on port smtp

Once this occurs, outgoing mail ceases until a restart of the service/runners.

I believe the box is hitting it's limit of sendmail services and refusing service and that the mailman service is not gracefully retrying later, either by design or a bug. Is this expected behavior or is there a setting I have missed?

Any thoughts?

Regards,
KAM

Search Discussions

  • Brad Knowles at Aug 9, 2006 at 2:20 am

    At 6:37 PM -0400 2006-08-08, Kevin A. McGrail wrote:

    Once this occurs, outgoing mail ceases until a restart of the
    service/runners.
    Have you looked at the section of the FAQ discussing performance
    tuning of your system? See FAQs 6.3, 6.6, and 6.8.
    I believe the box is hitting it's limit of sendmail services and refusing
    service and that the mailman service is not gracefully retrying later,
    either by design or a bug. Is this expected behavior or is there a setting
    I have missed?
    That's a reasonable conclusion, but with proper tuning of the MTA and
    configuration of Mailman, you shouldn't reach that point. Start with
    the FAQ entries mentioned above.

    --
    Brad Knowles, <brad at stop.mail-abuse.org>

    "Those who would give up essential Liberty, to purchase a little
    temporary Safety, deserve neither Liberty nor Safety."

    -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
    Assembly to the Governor, November 11, 1755

    Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.
  • Kevin A. McGrail at Aug 9, 2006 at 4:39 am

    Have you looked at the section of the FAQ discussing performance tuning of
    your system? See FAQs 6.3, 6.6, and 6.8.
    Thanks. I've made some tweaks in line with these FAQs and have some
    comments.

    FAQ 6.6 mentions SMTP_MAX_RCPTS = 10 but 6.3 mentions 2-5. I'm using VERP
    so I believe this is irrelevant to my installation but it might be good to
    clarify for consistency.

    Also FAQ 6.6 mentions changes because mailman didn't used to use a FIFO
    queue. The FAQ specifies this was a target for 2.1.X mailman and to my
    knowledge QRUNNER_PROCESS_LIFETIME & QRUNNER_MAX_MESSAGES were remove from
    Default.py.
    I believe the box is hitting it's limit of sendmail services and
    refusing
    service and that the mailman service is not gracefully retrying later,
    either by design or a bug. Is this expected behavior or is there a
    setting
    I have missed?
    That's a reasonable conclusion, but with proper tuning of the MTA and
    configuration of Mailman, you shouldn't reach that point. Start with the
    FAQ entries mentioned above.
    The box is an incoming mail server as well so while we can tweak things to
    make running out of child daemons minimal, with the dictionary
    attacks+spammers+normal mail traffic, I see no way that I can guarantee that
    the box will not run out of connections available. Am I correct that if
    this occurs, the qfile/out runner will stall?

    Thanks again,
    KAM
  • Brad Knowles at Aug 9, 2006 at 5:42 am

    At 12:33 AM -0400 2006-08-09, Kevin A. McGrail wrote:

    FAQ 6.6 mentions SMTP_MAX_RCPTS = 10 but 6.3 mentions 2-5. I'm using VERP
    so I believe this is irrelevant to my installation but it might be good to
    clarify for consistency.
    FAQ 6.6 is probably a little older, and appears to need to be
    updated. You are correct, that if you are VERPing everything, then
    this particular parameter is not relevant to your site.
    Also FAQ 6.6 mentions changes because mailman didn't used to use a FIFO
    queue. The FAQ specifies this was a target for 2.1.X mailman and to my
    knowledge QRUNNER_PROCESS_LIFETIME & QRUNNER_MAX_MESSAGES were remove
    from Default.py.
    I'm not familiar with the code at that depth, so I would have to
    leave those modifications to the FAQ to someone who could answer
    those questions.

    Mark, are you listening?
    The box is an incoming mail server as well so while we can tweak things
    to make running out of child daemons minimal, with the dictionary
    attacks+spammers+normal mail traffic, I see no way that I can guarantee
    that the box will not run out of connections available. Am I correct that
    if this occurs, the qfile/out runner will stall?
    For a large site, you want to split incoming and outgoing mail
    services onto separate clusters of machines.

    Moreover, if you're doing any amount of anti-spam processing or
    anti-virus scanning, then you'll probably want to run multiple
    different instances of your MTA on your machines. The primary
    instance would be running on port 25 on all interfaces, with all
    scanning intact. The secondary instance would be listening to some
    other port only on the loopback (127.0.0.1) interface, and would be
    used exclusively for outbound e-mail from that server. You would
    want to make sure that all output from Mailman was directed at this
    second instance of your MTA, so that you don't go through all that
    scanning a second time, for all outbound mail as well as all your
    inbound mail. On this second instance, you also generally want to
    remove any kind of resource limiting that you may have in place,
    because you have presumably done all that sort of stuff on the
    primary instance.

    IIRC, these issues are discussed in the FAQ under the respective
    "performance tuning" sections, but I may be wrong. If so, please let
    me know I'll try to update the relevant FAQ entry to be more
    correct/up-to-date.


    Even after you've done all of this, there is still the chance that
    your MTA may run out of available connections. If that happens, I
    don't see any other way to resolve this issue than to monitor the
    server(s) closely (using tools like rrdtool, munin, bb4, nagios,
    etc...), and to use mailmanctl to restart mailman itself and any
    stalled queue runners.

    --
    Brad Knowles, <brad at stop.mail-abuse.org>

    "Those who would give up essential Liberty, to purchase a little
    temporary Safety, deserve neither Liberty nor Safety."

    -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
    Assembly to the Governor, November 11, 1755

    Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.
  • Brad Knowles at Aug 9, 2006 at 5:42 am

    At 11:47 PM -0500 2006-08-08, Brad Knowles wrote:

    Moreover, if you're doing any amount of anti-spam processing or
    anti-virus scanning, then you'll probably want to run multiple
    different instances of your MTA on your machines. The primary
    instance would be running on port 25 on all interfaces, with all
    scanning intact. The secondary instance would be listening to
    some other port only on the loopback (127.0.0.1) interface, and
    would be used exclusively for outbound e-mail from that server.
    Sorry, I should have been a little more clear -- this second instance
    does not do any scanning of any sort, and has all checks turned off
    for things like looking at the reverse DNS for the incoming
    connections, etc.... In other words, the one and only thing it is
    good at is accepting mail as quickly as possible from other programs
    on the system, and then working to deliver that as quickly as
    possible to the remote recipients.

    By the time a mail message reaches this second instance of your MTA,
    all anti-spam processing and anti-virus scanning, etc... should
    already have been done on input, and there shouldn't be anything else
    to scan for on output. That's why it can be tuned for maximum
    acceptance speed.


    In addition, if you're running a really large mailing list system,
    you will want to off-load all outgoing e-mail on a cluster of
    secondary machines at your site (or maybe provided by your ISP), so
    that your mailing list server can dump things onto other systems as
    quickly as possible. In that case, you will probably also want to
    pre-process all the incoming messages on a separate cluster of
    machines, so that the only thing the mailing list server has to worry
    about is accepting mail messages from the front-end inbound mail
    servers, handling web user interface interaction with the
    subscribers, moderators, and list owners, and transmitting approved
    messages as quickly as possible to the cluster of outbound mail
    handlers. Except for the web interaction stuff, pretty much all
    interaction with the outside world is handled by other machines.

    You can push this even further, by setting up a reverse-proxy system
    in front of the web user interface, and the next step would be to
    completely isolate the back-end mail handling facilities on a
    completely separate machine, which shares the /usr/local/mailman
    directory structure (or wherever your OS puts the Mailman files) via
    NFS to one or more front-end servers.


    Believe me, you can scale this thing to amazing heights, if you split
    the functionality correctly onto separate clusters of machines. It
    does take some knowledge of how to build and configure
    higher-performance web and mail clusters, but that's not too hard to
    come by -- we've put as much information as we can into the FAQs, and
    if there's anything not already covered there, we can try to put in
    some more.

    --
    Brad Knowles, <brad at stop.mail-abuse.org>

    "Those who would give up essential Liberty, to purchase a little
    temporary Safety, deserve neither Liberty nor Safety."

    -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
    Assembly to the Governor, November 11, 1755

    Founding Individual Sponsor of LOPSA. See <http://www.lopsa.org/>.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedAug 8, '06 at 10:37p
activeAug 9, '06 at 5:42a
posts5
users2
websitelist.org

2 users in discussion

Brad Knowles: 3 posts Kevin A. McGrail: 2 posts

People

Translate

site design / logo © 2022 Grokbase