FAQ
Hey folks,

I've been running a FreeBSD / Postfix / Mailman list server for some 7
or 8 years now. The last time I updated FreeBSD was about 3 years
ago so it is only at 6.0. That was also the last time I updated
Postfix and Mailman until this recent spat of problems which prompted
me to upgrade both to the latest stable versions (as of yesterday).
So 2.1.13 in the case of Mailman.

Things have been running fine on this server for years with nary an
issue. I have not made any changes recently, so it was quite
troublesome when this problem started a few days ago. My lists
started experiencing 10 and 12 hour delays between mail coming in, and
going back out again. If I look at message headers I can see Postfix
gets the message, then hands off to mailman, then it sits there for
hours on end, and finally gets sent out.

I should mention that this is a pretty lightly loaded system. The
lists are basically personal hobby lists - the busiest of which gets
maybe 100 to 200 emails a day. It is also a web server but traffic is
pretty light. Load average is usually under 1.

I spent a good 12 to 14 hours yesterday looking into this - went
through some of the archives for this list, as well as the FAQ wiki.
Found a few similar problems but nothing just the same.

I look in the Mailman logs dir. Nothing really onerous that I see.
I look in smtp and see what looks like normal processing. e.g. one of
my lists looks like it had a number of messages just a few minutes
ago, but I have not seen them yet where normally I would expect to see
them almost instantly.

Jan 04 12:08:18 2010 (85805)
<3f1854170912301503l375e2259k439e633f55b8bffa at mail.gmail.com> smtp to
brewers for 3 recips, completed in 0.055 seconds
Jan 04 12:08:18 2010 (85805)
<3f1854170912301511m39e7eb9djc157df16d753bef1 at mail.gmail.com> smtp to
brewers for 3 recips, completed in 0.041 seconds
Jan 04 12:08:18 2010 (85805)
<6e40ba2b0912301514j5bbd84b5maef5a93e4c46b11 at mail.gmail.com> smtp to
brewers for 3 recips, completed in 0.053 seconds
Jan 04 12:08:18 2010 (85805)
<54a6f3cb0912301520r2dd77b03i5ebd8492b4af5bbc at mail.gmail.com> smtp to
brewers for 3 recips, completed in 0.040 seconds
Jan 04 12:08:18 2010 (85805)
<b9ff8c7d0912301529r1411fbdtb1860e06ef539bb0 at mail.gmail.com> smtp to
brewers for 3 recips, completed in 0.050 seconds
Jan 04 12:08:18 2010 (85805)
<8d0a26d70912301627o1571ababscf8396703e9bbb27 at mail.gmail.com> smtp to
brewers for 3 recips, completed in 0.052 seconds

smtp-failure seems to contain not much else other than reports of
bogus email addresses. I've used this as an opportunity to weed out
most of them, but there are still messages that had been queued up for
them so I'm not sure how long it will take those to fail delivery.
For example if I "tail -10000" (ten thousand) that file and grep out
the 5 main bogus email addresses, I get one single line :

Jan 04 12:07:50 2010 (50034) Low level smtp error: (4, 'Interrupted
system call'), msgid:
<b100e8e71001021626l470dc99fmcea7aceba072823b at mail.gmail.com>

But i have been unable to find out what this means. But given that I
only see it once I am not too worried about it really.

Can someone tell me where to look as to why stuff is getting backed up
in the outgoing queue?



--
?Don't eat anything you've ever seen advertised on TV?
- Michael Pollan, author of "In Defense of Food"

Search Discussions

  • Mark Sapiro at Jan 4, 2010 at 6:05 pm

    Alan McKay wrote:
    I look in the Mailman logs dir. Nothing really onerous that I see.
    I look in smtp and see what looks like normal processing. e.g. one of
    my lists looks like it had a number of messages just a few minutes
    ago, but I have not seen them yet where normally I would expect to see
    them almost instantly.

    Jan 04 12:08:18 2010 (85805)
    <3f1854170912301503l375e2259k439e633f55b8bffa at mail.gmail.com> smtp to
    brewers for 3 recips, completed in 0.055 seconds
    Jan 04 12:08:18 2010 (85805)
    <3f1854170912301511m39e7eb9djc157df16d753bef1 at mail.gmail.com> smtp to
    brewers for 3 recips, completed in 0.041 seconds
    Jan 04 12:08:18 2010 (85805)
    <6e40ba2b0912301514j5bbd84b5maef5a93e4c46b11 at mail.gmail.com> smtp to
    brewers for 3 recips, completed in 0.053 seconds
    Jan 04 12:08:18 2010 (85805)
    <54a6f3cb0912301520r2dd77b03i5ebd8492b4af5bbc at mail.gmail.com> smtp to
    brewers for 3 recips, completed in 0.040 seconds
    Jan 04 12:08:18 2010 (85805)
    <b9ff8c7d0912301529r1411fbdtb1860e06ef539bb0 at mail.gmail.com> smtp to
    brewers for 3 recips, completed in 0.050 seconds
    Jan 04 12:08:18 2010 (85805)
    <8d0a26d70912301627o1571ababscf8396703e9bbb27 at mail.gmail.com> smtp to
    brewers for 3 recips, completed in 0.052 seconds

    Those messages have been delivered to the MTA. If the list has just 3
    eligible recipients, that looks backlogged but otherwise normal. If
    the messages haven't been delibvered, check the MTA (mailq or
    whatever).

    smtp-failure seems to contain not much else other than reports of
    bogus email addresses. I've used this as an opportunity to weed out
    most of them, but there are still messages that had been queued up for
    them so I'm not sure how long it will take those to fail delivery.
    For example if I "tail -10000" (ten thousand) that file and grep out
    the 5 main bogus email addresses, I get one single line :

    Jan 04 12:07:50 2010 (50034) Low level smtp error: (4, 'Interrupted
    system call'), msgid:
    <b100e8e71001021626l470dc99fmcea7aceba072823b at mail.gmail.com>

    But i have been unable to find out what this means. But given that I
    only see it once I am not too worried about it really.

    Can someone tell me where to look as to why stuff is getting backed up
    in the outgoing queue?

    Your MTA is checking too much during SMTP from Mailman. It shouldn't be
    doing address verification or even domain verification on remote
    recipients.

    What is in Mailman's out/ and retry/ queues at this point? The out/
    queue is almost certainly backlogged.

    You could try moving all the out/ and retry/ queue entries aside and
    then removing the bad addresses from them with the script at
    <http://www.msapiro.net/scripts/remove_recips> and then replacing the
    entries a few at a time.

    If the system is still slow, search the faq at
    <http://wiki.list.org/x/AgA3> for "performance".

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Alan McKay at Jan 4, 2010 at 6:49 pm

    On Mon, Jan 4, 2010 at 1:05 PM, Mark Sapiro wrote:
    Those messages have been delivered to the MTA. If the list has just 3
    eligible recipients, that looks backlogged but otherwise normal. If
    the messages haven't been delibvered, check the MTA (mailq or
    whatever).
    Here it is almost 2 hours later and those messages are still not out yet!

    I forgot to mention in all of this that regular email to and from the
    system works fine. Emails to and from this Gmail account go back and
    forth in just a few seconds.
    Your MTA is checking too much during SMTP from Mailman. It shouldn't be
    doing address verification or even domain verification on remote
    recipients.
    Hmmm, now, this is one thing that does change on my system. I use
    Postfix and do helo checking with a file of spammers to reject. I
    add a few entries to that file every month (maybe 3 or 4 at most). At
    present there are only 72 entries in that file. I also do general
    RBL checking which could well be also taking place from MM to Postfix.
    I'll have to go have a look. But it has been like this for years -
    at least 3 years since I last went through my Postfix config and
    tweaked it. The only change since then is spammer domains going into
    my helo_access file.
    What is in Mailman's out/ and retry/ queues at this point? The out/
    queue is almost certainly backlogged.
    mailman at heimat$ ls qfiles/in/ | wc -l
    8
    mailman at heimat$ ls qfiles/out/ | wc -l
    332
    You could try moving all the out/ and retry/ queue entries aside and
    then removing the bad addresses from them with the script at
    <http://www.msapiro.net/scripts/remove_recips> and then replacing the
    entries a few at a time.
    OK, I'll give that a try.
    If the system is still slow, search the faq at
    <http://wiki.list.org/x/AgA3> for "performance".
    Will do. I had been searching on a number of terms, but not that one.

    Thanks for your time!


    --
    ?Don't eat anything you've ever seen advertised on TV?
    - Michael Pollan, author of "In Defense of Food"
  • Mark Sapiro at Jan 4, 2010 at 7:08 pm

    Alan McKay wrote:
    On Mon, Jan 4, 2010 at 1:05 PM, Mark Sapiro wrote:
    Those messages have been delivered to the MTA. If the list has just 3
    eligible recipients, that looks backlogged but otherwise normal. If
    the messages haven't been delibvered, check the MTA (mailq or
    whatever).
    Here it is almost 2 hours later and those messages are still not out yet!

    The specific ones in the smtp log were delivered to the MTA. They're
    either still queued in the MTA or they were delivered. Of course, they
    were the oldest of the 332 or whatever entries in the out queue.

    What is in Mailman's out/ and retry/ queues at this point? The out/
    queue is almost certainly backlogged.
    mailman at heimat$ ls qfiles/in/ | wc -l
    8

    I asked about retry/, not in/, but unless your lists are getting mail
    bombed or there is some kind of mail loop, 8 messages in the in/ queue
    is a lot.

    mailman at heimat$ ls qfiles/out/ | wc -l
    332

    How old is the oldest of these?

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Alan McKay at Jan 4, 2010 at 7:33 pm
    Well, I seem to have gotten it going again with some tweaks in my
    postfix. But that is still a mystery to me since it has worked fine
    like this for years.

    Thanks for your ears!

    Most notable below is my "new" IP vs my "old" IP. I changed ISPs a
    good 3 years ago now and evidently forgot to update my postfix config.
    But as mentioned, it has worked fine since then.

    < new
    old
    < mynetworks = 172.30.99.0/24, 127.0.0.0/8, 206.248.138.32/32
    ---
    mynetworks = 172.30.99.0/24, 127.0.0.0/8, 72.1.199.184/32
    244c244
    < #local_recipient_maps =
    ---
    local_recipient_maps =
    261c261
    < in_flow_delay = 0
    ---
    #in_flow_delay = 1s
    478,479c478,479
    < local_destination_concurrency_limit = 2
    < default_destination_concurrency_limit = 4
    ---
    #local_destination_concurrency_limit = 2
    #default_destination_concurrency_limit = 10
    583c583
    < smtpd_recipient_restrictions = permit_mynetworks,
    ---
    smtpd_recipient_restrictions =
    631,632d630
    < html_directory = no
    < data_directory = /var/lib/postfix


    --
    ?Don't eat anything you've ever seen advertised on TV?
    - Michael Pollan, author of "In Defense of Food"
  • Mark Sapiro at Jan 4, 2010 at 7:43 pm

    Alan McKay wrote:
    Well, I seem to have gotten it going again with some tweaks in my
    postfix. But that is still a mystery to me since it has worked fine
    like this for years.

    It is possible that it has been marginal for some time and just some
    little extra pushed it over the edge into a seriously backlogged
    state. Once a backlog starts, it tends to get worse because people
    don't see their posts right away and repost.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedJan 4, '10 at 5:16p
activeJan 4, '10 at 7:43p
posts6
users2
websitelist.org

2 users in discussion

Mark Sapiro: 3 posts Alan McKay: 3 posts

People

Translate

site design / logo © 2023 Grokbase