FAQ
I'm hoping someone has some ideas for fixing a problem with very
slowwwww outgoing mail delivery for one particular list I host. I'm
running Mailman 2.1.5 and Exim 4.40. I host a couple dozen Mailman
lists on my Linux server, most are less than 300 members except for
one list with about 1,100 members, about 850 set to individual emails
and 250 on digest.

I've hosted this list (call it ABC list) for almost two years now and
up until about a month ago, outgoing mail for ABC list (and all other
outgoing mail, list) got processed very quickly, getting delivered to
the recipient's server at a rate of about 1,000-1,200 per hour. The
ABC list will get posts in several messages, usually two to four per
topic and once a month, there's a posting that's 12 to 14 messages.

About a month ago, at a time where there were NO updates to anything
on my server, hadn't been any for about two weeks prior, suddenly
outgoing mail on the ABC list *only* slowed down to a quarter of the
usual speed, processing about 300 messages per hour, even if there
were no other messages in the Exim queue. The messages get processed
by Mailman to Exim at the usual speed, takes about ten minutes for
Exim to accept 850 messages and queue them and start delivering them
but instead of being sent out 1,000-1,200 per hour, suddenly they
stick in Exim, even ones for delivery to local email addresses,
delivering only about 300 per hour.

All other lists and private mail process at the normal faster levels.
If there's outgoing messages from other Mailman lists, those messages
will get processed and sent out while the messages for ABC list
languish in the Exim queue. (Note: most all of the lists I host, and
especially the busier ones, including ABC list, are set in Mailman to
VERP all messages, it's been set like that for over a year.) Forcing
additional Exim queue runners doesn't do anything but increase the
load on my server as it tries to process even more very slow messages.

About two weeks ago, one of my users created a new list (call it XYZ
list) and directly added about 950 users in one go (she was creating a
backup list for Yahoogroup) and she neglected to turn off the notices
to the listowner of new members and she didn't turn off the welcome
message either so approximately 1,900 messages got sent out
immediately. At the same time, there were about 10,000 messages in
the Exim queue for ABC list (the 12 part monthly posting), processing
at their very slow pace. The individual welcome messages and new
members for XYZ list notices flew right on past, the 950+ welcome
messages and the new member notices went thru the Exim queue in about
20 minutes with just a handful left waiting on problems at the
recipient's server.

At first glance I would think this is strictly an Exim problem but
since it's affecting a single Mailman list, I keep coming back to
something going wrong between the two. I have checked Exim logs and
Mailman logs, can't see anything out of place. I've gone thru the
Mailman FAQ and also the Exim/Mailman HowTo and tweaked things but
still no difference, every other list flies on by while the ABC list
plods along slowly.

This problem applies to all the outgoing messages on the ABC list,
doesn't matter what receiving ISP, they all get treated just a slowly.
I've checked that in the logs and watched it in the queue, even
messages destined for other users I host will sit and wait. I've
watched the individual delivery attempts, when I force delivery of one
of the ABC list messages, it'll take at least 30 seconds before Exim
even attempts to lookup the recipient's ISP but once it gets to that
point, it flies right on thru, speedy as usual. It's that 30-45
second delay as it's pulling up the message (or whatever) that I can't
determine the cause. If it was all mail or even all Mailman mail, I'd
know it was some problem with Exim or dns (which I checked and changed
anyway, didn't make any difference) but it's just this one list, no
other lists and other non-list mail is not affected at all.

The only thing I can think of that I haven't tried yet is exporting
all the list settings and member list (there's no archives thankfully)
and deleting the list and re-creating it but I have no idea if that
would help...any suggestions, hints, tips, wild ideas, anything at
all, is very welcome.

--
hth,
Stephanie

Links blog: http://alice.ttlg.net/links/
Glenfinnan Web Hosting: http://www.glenfinnan.net/

The machine does not isolate man from the great problems of nature but
plunges him more deeply into them.
-- Antoine De Saint-Exupery [Wind, Sand, Stars]

Search Discussions

  • Mark Sapiro at Mar 28, 2005 at 8:40 pm

    Stephanie wrote:
    At first glance I would think this is strictly an Exim problem but
    since it's affecting a single Mailman list, I keep coming back to
    something going wrong between the two.
    I agree this certainly seems to be a strictly Exim problem. See
    http://mail.python.org/pipermail/mailman-users/2005-January/041815.html

    This addresses delays within Mailman, but the underlying problem might
    apply to Exim queues as well as Mailman queues. I have no idea really,
    but it might be worth a look.

    --
    Mark Sapiro <msapiro at value.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • John W. Baxter at Mar 29, 2005 at 3:44 pm

    On 3/28/2005 12:40, "Mark Sapiro" wrote:

    Stephanie wrote:
    At first glance I would think this is strictly an Exim problem but
    since it's affecting a single Mailman list, I keep coming back to
    something going wrong between the two.
    I agree this certainly seems to be a strictly Exim problem. See
    http://mail.python.org/pipermail/mailman-users/2005-January/041815.html

    This addresses delays within Mailman, but the underlying problem might
    apply to Exim queues as well as Mailman queues. I have no idea really,
    but it might be worth a look.
    The environment was earlier in the thread said to be unchanged. I would ask
    Stephanie whether that environment extends to the local caching name server
    still working? Changes to firewalling (specifically whether Ident has been
    foolishly dropped (instead of rejected) in a router or firewall, if Exim is
    set up to use it)?

    --John
  • Stephanie at Mar 30, 2005 at 4:51 am

    John W. Baxter wrote:
    On 3/28/2005 12:40, "Mark Sapiro" wrote:

    The environment was earlier in the thread said to be unchanged. I would ask
    Stephanie whether that environment extends to the local caching name server
    still working? Changes to firewalling (specifically whether Ident has been
    foolishly dropped (instead of rejected) in a router or firewall, if Exim is
    set up to use it)?
    The firewall I use is APF and other than adding some ROKSO spammer IP
    addresses and the Spamhaus DROP list to it about once a week, there
    hadn't been any changes to it. I'm not using ident in Exim either.
    Nothing else on the server had changed or been updated except for my
    normal maintenance of spam filters for Exim. Mailman hadn't had any
    changes at all for months, I installed the security patch from
    February 10th a few days later and that's it. I use cPanel/WHM for
    managing my hosting customers but that hadn't had any updates in a
    week, ten days before the mail delivery slowdown started.

    As for DNS, Bind seems to be running fine - nameserver issues was the
    first thing I thought of and after doing some checking, my server was
    using external name servers hosted by my server host. I changed that
    to use my local nameserver but there was no change to the mail
    delivery performance for that list.

    I did uncover one bit of info today, this list is on a domain that
    only hosts one other list (and practically nothing else, one tiny
    website with very little activity and both lists do not keep Mailman
    archives) and that other list is also having the same slow mail
    delivery. It hadn't had any posts in about a month and a message was
    posted today, processed by Mailman and sent to Exim at 16:38 pm EST
    today and it took 50 minutes for the message to go out from Exim to
    the 226 members on individual mail delivery. That pretty much the
    same delivery stats as on the other larger list.

    So now I suspect it's related to that domain. I'm going to do some
    digging in its hosting account settings, see if something got set
    oddly for it. It's one of my domains and for my own, I usually give
    them all options and features with very few restrictions but I may
    have mucked up something without realizing it.

    Jeremy wrote:
    What's your system activity like during this 30 second pause? If it's
    stuck in kernel or doing lots of disk I/O then I'd suspect the
    filesystem directory structure..... Can you shut down for long enough
    to copy the exim spool to a new tree and then rename it back into place?
    I thought about the spool directory - Exim was set to
    "split_spool_directory = yes", had been ever since the server was set
    up in fall of 2003. I changed it to "no", that made things worse (far
    too many files in one folder, so I changed it back to "yes". I'll try
    your suggestion this weekend when traffic is lower, thanks much!

    --
    hth,
    Stephanie

    Links blog: http://alice.ttlg.net/links/
    Glenfinnan Web Hosting: http://www.glenfinnan.net/

    The machine does not isolate man from the great problems of nature but
    plunges him more deeply into them.
    -- Antoine De Saint-Exupery [Wind, Sand, Stars]
  • Nigel Metheringham at Mar 30, 2005 at 8:08 am

    On Tue, 2005-03-29 at 22:51 -0600, Stephanie wrote:
    The firewall I use is APF and other than adding some ROKSO spammer IP
    addresses and the Spamhaus DROP list to it about once a week, there
    hadn't been any changes to it. I'm not using ident in Exim either.
    If other sites are using ident, and your firewall drops ident packets,
    then you would get a 30 second delay on many outgoing connection
    attempts.
    I did uncover one bit of info today, this list is on a domain that
    only hosts one other list (and practically nothing else, one tiny
    website with very little activity and both lists do not keep Mailman
    archives) and that other list is also having the same slow mail
    delivery. It hadn't had any posts in about a month and a message was
    posted today, processed by Mailman and sent to Exim at 16:38 pm EST
    today and it took 50 minutes for the message to go out from Exim to
    the 226 members on individual mail delivery. That pretty much the
    same delivery stats as on the other larger list.
    In that case I would be looking very carefully at DNS - if your
    externally visible DNS for that domain is odd you may find that many
    systems you are sending mail to have a long delay on basic verification
    - which will slow down your outgoing speed.

    Look at your log entries and work out where the delay is - if there is a
    huge delay between reception and first delivery then the problem is
    likely somewhere in your routing. If the delay is instead spread out
    between deliveries then there is something, most likely DNS related,
    thats slowing down the other systems taking mail from you.

    Nigel.

    --
    [ Nigel Metheringham Nigel.Metheringham at InTechnology.co.uk ]
    [ - Comments in this message are my own and not ITO opinion/policy - ]
  • Stephanie at Mar 30, 2005 at 3:50 pm

    On Wed, 30 Mar 2005 09:08:43 +0100, Nigel Metheringham wrote:
    On Tue, 2005-03-29 at 22:51 -0600, Stephanie wrote:
    The firewall I use is APF and other than adding some ROKSO spammer IP
    addresses and the Spamhaus DROP list to it about once a week, there
    hadn't been any changes to it. I'm not using ident in Exim either.
    If other sites are using ident, and your firewall drops ident packets,
    then you would get a 30 second delay on many outgoing connection
    attempts.
    Which would affect all other outgoing mail on my server, right? The
    firewall isn't dropping ident tho and all other list mail and private
    mail goes right out at about triple the speed of this one list's
    outgoing mail. The list in question has members from all the major
    ISPs, AOL, Earthlink, Comcast, Roadrunner, Verizon and several dozen
    small domains as well. All the mail from that list has the same slow
    delivery time, I've watched the queues carefully to see if it's one or
    two or even three particular ISPs but it's not, it's all across the
    board.
    In that case I would be looking very carefully at DNS - if your
    externally visible DNS for that domain is odd you may find that many
    systems you are sending mail to have a long delay on basic verification
    - which will slow down your outgoing speed.
    Yes, that's my thought now too since it's affecting both lists hosted
    on the same domain and nothing else. I'm going to look at that
    domain's DNS tonight (have to go to $DayJob now) and see if there's
    anything wrong there that may be causing this.

    --
    hth,
    Stephanie

    Links blog: http://alice.ttlg.net/links/
    Glenfinnan Web Hosting: http://www.glenfinnan.net/

    The machine does not isolate man from the great problems of nature but
    plunges him more deeply into them.
    -- Antoine De Saint-Exupery [Wind, Sand, Stars]
  • Philip Hazel at Mar 30, 2005 at 3:59 pm

    On Wed, 30 Mar 2005, Stephanie wrote:

    If other sites are using ident, and your firewall drops ident packets,
    then you would get a 30 second delay on many outgoing connection
    attempts.
    Which would affect all other outgoing mail on my server, right?
    No, only outgoing mail to sites that use ident.

    --
    Philip Hazel University of Cambridge Computing Service,
    ph10 at cus.cam.ac.uk Cambridge, England. Phone: +44 1223 334714.
    Get the Exim 4 book: http://www.uit.co.uk/exim-book
  • Brad Knowles at Mar 30, 2005 at 5:22 pm

    At 9:50 AM -0600 2005-03-30, Stephanie wrote:

    If other sites are using ident, and your firewall drops ident packets,
    then you would get a 30 second delay on many outgoing connection
    attempts.
    Which would affect all other outgoing mail on my server, right?
    If the problem was reverse DNS or ident, or somesuch, then it
    should affect everything on the same server. Unless you're serving
    that data inside of different virtual servers, different instances of
    the same software (or different software), etc....
    In that case I would be looking very carefully at DNS - if your
    externally visible DNS for that domain is odd you may find that many
    systems you are sending mail to have a long delay on basic verification
    - which will slow down your outgoing speed.
    Yes, that's my thought now too since it's affecting both lists hosted
    on the same domain and nothing else. I'm going to look at that
    domain's DNS tonight (have to go to $DayJob now) and see if there's
    anything wrong there that may be causing this.
    It's starting to look more like a DNS problem that is specific to
    that domain, but not necessarily across the entire machine.

    --
    Brad Knowles, <brad at stop.mail-abuse.org>

    "Those who would give up essential Liberty, to purchase a little
    temporary Safety, deserve neither Liberty nor Safety."

    -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
    Assembly to the Governor, November 11, 1755

    SAGE member since 1995. See <http://www.sage.org/> for more info.
  • Brad Knowles at Mar 30, 2005 at 2:23 pm

    At 10:51 PM -0600 2005-03-29, Stephanie wrote:

    So now I suspect it's related to that domain. I'm going to do some
    digging in its hosting account settings, see if something got set
    oddly for it.
    You may have reverse DNS problems for that IP address, or lame
    delegation problems for the domain, or any number of other
    DNS-related issues. There are lots of ways to cause things to slow
    down if your DNS is messed up.

    --
    Brad Knowles, <brad at stop.mail-abuse.org>

    "Those who would give up essential Liberty, to purchase a little
    temporary Safety, deserve neither Liberty nor Safety."

    -- Benjamin Franklin (1706-1790), reply of the Pennsylvania
    Assembly to the Governor, November 11, 1755

    SAGE member since 1995. See <http://www.sage.org/> for more info.
  • Stephanie at Mar 30, 2005 at 3:45 pm

    On Wed, 30 Mar 2005 16:23:47 +0200, Brad Knowles wrote:
    At 10:51 PM -0600 2005-03-29, Stephanie wrote:

    So now I suspect it's related to that domain. I'm going to do some
    digging in its hosting account settings, see if something got set
    oddly for it.
    You may have reverse DNS problems for that IP address, or lame
    delegation problems for the domain, or any number of other
    DNS-related issues. There are lots of ways to cause things to slow
    down if your DNS is messed up.
    Which would affect ALL mail going out from my server, not just one
    list, right? DNS issues was my first thought when this initially
    cropped up but mail from other lists I host and all private mail
    continues to fly right on thru at about triple the speed of this one
    list's outgoing mail.

    And I do have proper rDNS set up, that was something I got done as
    soon as the server was set up almost two years ago and nothing has
    changed there.

    --
    hth,
    Stephanie

    Links blog: http://alice.ttlg.net/links/
    Glenfinnan Web Hosting: http://www.glenfinnan.net/

    The machine does not isolate man from the great problems of nature but
    plunges him more deeply into them.
    -- Antoine De Saint-Exupery [Wind, Sand, Stars]
  • Heather Madrone at Mar 30, 2005 at 5:45 pm

    At 9:45 AM -0600 3/30/05, Stephanie wrote:
    And I do have proper rDNS set up, that was something I got done as
    soon as the server was set up almost two years ago and nothing has
    changed there.
    I'd check it though, because it's possible that someone else broke it
    for you. Reverse DNS lookups to my server were failing last week
    because someone else had an error and so the lookup was ambiguous.

    If you go to <www.dnsstuff.com> or a similar site, you can see what
    the world sees.

    If you're having reverse DNS problems, you should also have a lot of
    refused deliveries for those lists, and it doesn't sound like you do.

    --
    Heather Madrone (heather at madrone.com) http://www.madrone.com

    A rolling stone gathers no mass.
  • Stephanie at Mar 30, 2005 at 11:00 pm

    Heather Madrone wrote:
    If you go to <www.dnsstuff.com> or a similar site, you can see what
    the world sees.
    Aha! I had to run out of the house this morning so I didn't even think
    of trying DNSstuff! Thanks for the reminder and since it shows "No
    PTR records exist for brillig.net" for PTR lookup, obviously there is
    something wrong there. I'll fix that this as soon as I get home!
    If you're having reverse DNS problems, you should also have a lot of
    refused deliveries for those lists, and it doesn't sound like you do.
    They probably got recorded as bounces - and it may be part of the
    reason there's been some problem with deliveries to the 4 or 5 Verizon
    members (which I had written off as Verizon's bugginess).

    --
    hth,
    Stephanie

    Links blog: http://alice.ttlg.net/links/
    Glenfinnan Web Hosting: http://www.glenfinnan.net/

    The machine does not isolate man from the great problems of nature but
    plunges him more deeply into them.
    -- Antoine De Saint-Exupery [Wind, Sand, Stars]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedMar 28, '05 at 7:27p
activeMar 30, '05 at 11:00p
posts12
users7
websitelist.org

People

Translate

site design / logo © 2022 Grokbase