FAQ
Hi,

I'm having some problems with my lists recently. I see in my mail log
that "mailman post" has been called. However, when I look in the
logs/post file, it's been almost 24 hours since anything has been
written there.

I'm running mailman on Ubuntu Hardy, and another weird thing is that
whenever I stop mailman, it always leaves at least one process hanging
around. I have to forcefully kill it. After I stop it, I still see:

list 3833 0.0 0.4 83076 7540 ? Ss 08:29
0:00 /usr/bin/python /usr/lib/mailman/bin/mailmanctl -s -q start
list 3842 0.1 2.5 105372 38148 ? S 08:29
0:01 /usr/bin/python /var/lib/mailman/bin/qrunner
--runner=OutgoingRunner:0:1 -s

Even more strange when I reboot the machine, I'll see 2 entire sets of
mailman processes, almost as if the mailman start had been called twice.

In any case, right now it seems that mailman has stopped accepting
posts. Is there a way to get more debug from mailman when "mailman post"
is called? I don't see any errors, yet I don't see the post log file
updating.

Ricardo

Search Discussions

  • Mark Sapiro at Nov 28, 2008 at 5:43 pm

    Ricardo Kleemann wrote:
    I'm having some problems with my lists recently. I see in my mail log
    that "mailman post" has been called. However, when I look in the
    logs/post file, it's been almost 24 hours since anything has been
    written there.

    The post log is written by SMTPDirect (under control of OutgoingRunner)
    when the outgoing message is delivered to the MTA.
    I'm running mailman on Ubuntu Hardy, and another weird thing is that
    whenever I stop mailman, it always leaves at least one process hanging
    around. I have to forcefully kill it. After I stop it, I still see:

    list 3833 0.0 0.4 83076 7540 ? Ss 08:29
    0:00 /usr/bin/python /usr/lib/mailman/bin/mailmanctl -s -q start
    list 3842 0.1 2.5 105372 38148 ? S 08:29
    0:01 /usr/bin/python /var/lib/mailman/bin/qrunner
    --runner=OutgoingRunner:0:1 -s

    It appears that SMTPDirect (actually the underlying Python smtplib) is
    hung waiting for a response from the MTA that isn't coming.

    Even more strange when I reboot the machine, I'll see 2 entire sets of
    mailman processes, almost as if the mailman start had been called twice.

    It seems like you have two init scripts for Mailman.

    In any case, right now it seems that mailman has stopped accepting
    posts. Is there a way to get more debug from mailman when "mailman post"
    is called? I don't see any errors, yet I don't see the post log file
    updating.

    "mailman post" just puts the message in the in/ queue. I assume from
    what you say above that it gets processed by IncomingRunner and even
    archived and the problem is in OutgoingRunner.

    See the FAQs at <http://wiki.list.org/x/A4E9> and
    <http://wiki.list.org/x/-IA9>.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Ricardo Kleemann at Nov 28, 2008 at 6:05 pm
    Hello Mark,

    Thanks for your reply.

    I'm running mailman on Ubuntu Hardy, and another weird thing is that
    whenever I stop mailman, it always leaves at least one process hanging
    around. I have to forcefully kill it. After I stop it, I still see:

    list 3833 0.0 0.4 83076 7540 ? Ss 08:29
    0:00 /usr/bin/python /usr/lib/mailman/bin/mailmanctl -s -q start
    list 3842 0.1 2.5 105372 38148 ? S 08:29
    0:01 /usr/bin/python /var/lib/mailman/bin/qrunner
    --runner=OutgoingRunner:0:1 -s

    It appears that SMTPDirect (actually the underlying Python smtplib) is
    hung waiting for a response from the MTA that isn't coming.
    But the strange thing here is that I have no issues at all connecting to
    the smtp server at localhost. Defaults.py has the standard config, using
    localhost with SMTPDirect.

    Whatever the OutgoingRunner is stuck on, it's definitely stuck. It won't
    go away unless I do a kill -9
    Even more strange when I reboot the machine, I'll see 2 entire sets of
    mailman processes, almost as if the mailman start had been called twice.

    It seems like you have two init scripts for Mailman.
    I would have thought so... but there's only 1 script under /etc/init.d/
    and no other scripts there reference mailman
    In any case, right now it seems that mailman has stopped accepting
    posts. Is there a way to get more debug from mailman when "mailman post"
    is called? I don't see any errors, yet I don't see the post log file
    updating.

    "mailman post" just puts the message in the in/ queue. I assume from
    what you say above that it gets processed by IncomingRunner and even
    archived and the problem is in OutgoingRunner.

    See the FAQs at <http://wiki.list.org/x/A4E9> and
    <http://wiki.list.org/x/-IA9>.
    I can see that the in queue is probably working. There are currently 129
    files in the out/ queue.

    There are 2 OutgoingRunner processes and apparently BOTH of them are
    doing something because strace does show some activity

    Process 14078 attached - interrupt to quit
    recvfrom(7, "250 Ok. 0000000049303165.000045C"..., 8192, 0, NULL, NULL)
    = 35
    sendto(7, "mail FROM:<reflexoes3-bounces at li"..., 65, 0, NULL, 0) = 65
    recvfrom(7, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(7, "rcpt TO:<chitopan_2000 at yahoo.com"..., 35, 0, NULL, 0) = 35
    recvfrom(7, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(7, "rcpt TO:<bernadete.sorianoadv at gm"..., 42, 0, NULL, 0) = 42
    recvfrom(7, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(7, "rcpt TO:<iviv_8a at hotmail.com>\r\n", 31, 0, NULL, 0) = 31
    recvfrom(7, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(7, "data\r\n", 6, 0, NULL, 0) = 6
    recvfrom(7, "354 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(7, "Received: from sr05-01.mta.terra"..., 8256, 0, NULL, 0) =
    8256
    recvfrom(7, "250 Ok. 0000000049303172.000045D"..., 8192, 0, NULL, NULL)
    = 35


    root at email1:/var/lib/mailman# strace -p13800
    Process 13800 attached - interrupt to quit
    recvfrom(8, "250 Ok. 00000000493031A7.0000464"..., 8192, 0, NULL, NULL)
    = 35
    sendto(8, "mail FROM:<reflexoes3-bounces at li"..., 65, 0, NULL, 0) = 65
    recvfrom(8, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(8, "rcpt TO:<fesayao at gmail.com>\r\n", 29, 0, NULL, 0) = 29
    recvfrom(8, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(8, "rcpt TO:<bruno8d at hotmail.com>\r\n", 31, 0, NULL, 0) = 31
    recvfrom(8, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(8, "rcpt TO:<evangelismoemissoes at hot"..., 43, 0, NULL, 0) = 43
    recvfrom(8, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(8, "data\r\n", 6, 0, NULL, 0) = 6
    recvfrom(8, "354 Ok.\r\n", 8192, 0, NULL, NULL) = 9


    Yet even though it's being processed, the logs/post file isn't getting
    updated, and the number of messages in out/ doesn't decrease.

    Could this be because mailman is processing a very large list (20,000
    members) and it is just stuck on processing one message, while the other
    messages wait around?

    But I've been handling these lists for a long time and never had these
    problems. Mailman doesn't seem to be getting much cpu usage.
  • Mark Sapiro at Nov 28, 2008 at 6:27 pm

    Ricardo Kleemann wrote:
    Hello Mark,

    Thanks for your reply.

    I'm running mailman on Ubuntu Hardy, and another weird thing is that
    whenever I stop mailman, it always leaves at least one process hanging
    around. I have to forcefully kill it. After I stop it, I still see:

    list 3833 0.0 0.4 83076 7540 ? Ss 08:29
    0:00 /usr/bin/python /usr/lib/mailman/bin/mailmanctl -s -q start
    list 3842 0.1 2.5 105372 38148 ? S 08:29
    0:01 /usr/bin/python /var/lib/mailman/bin/qrunner
    --runner=OutgoingRunner:0:1 -s

    Note that the above "--runner=OutgoingRunner:0:1" indicates that
    OutgoingRunner is not sliced - more below.

    It appears that SMTPDirect (actually the underlying Python smtplib) is
    hung waiting for a response from the MTA that isn't coming.
    But the strange thing here is that I have no issues at all connecting to
    the smtp server at localhost. Defaults.py has the standard config, using
    localhost with SMTPDirect.

    What do you see in Mailman's smtp-failure log.

    Whatever the OutgoingRunner is stuck on, it's definitely stuck. It won't
    go away unless I do a kill -9
    Even more strange when I reboot the machine, I'll see 2 entire sets of
    mailman processes, almost as if the mailman start had been called twice.

    It seems like you have two init scripts for Mailman.
    I would have thought so... but there's only 1 script under /etc/init.d/
    and no other scripts there reference mailman

    Well, something is starting Mailman twice.

    In any case, right now it seems that mailman has stopped accepting
    posts. Is there a way to get more debug from mailman when "mailman post"
    is called? I don't see any errors, yet I don't see the post log file
    updating.

    "mailman post" just puts the message in the in/ queue. I assume from
    what you say above that it gets processed by IncomingRunner and even
    archived and the problem is in OutgoingRunner.

    See the FAQs at <http://wiki.list.org/x/A4E9> and
    <http://wiki.list.org/x/-IA9>.
    I can see that the in queue is probably working. There are currently 129
    files in the out/ queue.

    There are 2 OutgoingRunner processes and apparently BOTH of them are
    doing something because strace does show some activity

    Since OutgoingRunner is not sliced, there should be only one
    OutgoingRunner process. This needs to be corrected. See
    <http://wiki.list.org/x/_4A9>.

    Process 14078 attached - interrupt to quit
    recvfrom(7, "250 Ok. 0000000049303165.000045C"..., 8192, 0, NULL, NULL)
    = 35
    sendto(7, "mail FROM:<reflexoes3-bounces at li"..., 65, 0, NULL, 0) = 65
    recvfrom(7, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(7, "rcpt TO:<chitopan_2000 at yahoo.com"..., 35, 0, NULL, 0) = 35
    recvfrom(7, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(7, "rcpt TO:<bernadete.sorianoadv at gm"..., 42, 0, NULL, 0) = 42
    recvfrom(7, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(7, "rcpt TO:<iviv_8a at hotmail.com>\r\n", 31, 0, NULL, 0) = 31
    recvfrom(7, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(7, "data\r\n", 6, 0, NULL, 0) = 6
    recvfrom(7, "354 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(7, "Received: from sr05-01.mta.terra"..., 8256, 0, NULL, 0) =
    8256
    recvfrom(7, "250 Ok. 0000000049303172.000045D"..., 8192, 0, NULL, NULL)
    = 35


    root at email1:/var/lib/mailman# strace -p13800
    Process 13800 attached - interrupt to quit
    recvfrom(8, "250 Ok. 00000000493031A7.0000464"..., 8192, 0, NULL, NULL)
    = 35
    sendto(8, "mail FROM:<reflexoes3-bounces at li"..., 65, 0, NULL, 0) = 65
    recvfrom(8, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(8, "rcpt TO:<fesayao at gmail.com>\r\n", 29, 0, NULL, 0) = 29
    recvfrom(8, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(8, "rcpt TO:<bruno8d at hotmail.com>\r\n", 31, 0, NULL, 0) = 31
    recvfrom(8, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(8, "rcpt TO:<evangelismoemissoes at hot"..., 43, 0, NULL, 0) = 43
    recvfrom(8, "250 Ok.\r\n", 8192, 0, NULL, NULL) = 9
    sendto(8, "data\r\n", 6, 0, NULL, 0) = 6
    recvfrom(8, "354 Ok.\r\n", 8192, 0, NULL, NULL) = 9


    Yet even though it's being processed, the logs/post file isn't getting
    updated, and the number of messages in out/ doesn't decrease.

    So OutgoingRunner never finishes delivery of even one post. However, I
    see 'data' commands in the traces above, so presumably, some
    recipients are being delivered. Then when you kill OutgoingRunner and
    restart it, It recovers the .bak file from the queue and redelivers to
    the same recipients who by now have received multiple copies of the
    message.

    Could this be because mailman is processing a very large list (20,000
    members) and it is just stuck on processing one message, while the other
    messages wait around?

    It could be, but this would indicate that you need to do something to
    allow SMTP between Mailman and the MTA to proceed faster. Search the
    FAQ for "tuning".
    But I've been handling these lists for a long time and never had these
    problems. Mailman doesn't seem to be getting much cpu usage.
    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Ricardo Kleemann at Nov 28, 2008 at 6:23 pm
    I have some more data on this...

    I enabled the debug in SMTPDirect.py, and it shows that there really
    aren't any problems. It shows that slowly a message is being sent out.

    I followed the performance tuning suggestions that had the MAX_RCPTS in
    Defaults.py at an optimal value from 2-5, so I set it to 3.

    But does that mean that mailman is going to simply get stuck on one
    single message distribution and won't process any others until this one
    is finished?

    I used to have these lists on another server and over there I had the
    MAX_RCPTS set to a high number, but my mail server is set to reject
    above 25 rcpts anyway so the end result that at max it would handle 25
    rcpts. I remember that whenever a message arrived for the list (again
    20,000 members) on the other server, the load average on the server
    would go pretty high as it processed the list.

    But now on this new server I never see the load avg go up. Is this
    because of the MAX_RCPTS setting? What else would keep mailman from
    efficiently handling the messages?

    The OutgoingRunner is just sitting there slowly distributing the message
    and never seems to get to the next one.
    On Fri, 2008-11-28 at 09:43 -0800, Mark Sapiro wrote:
    Ricardo Kleemann wrote:
    I'm having some problems with my lists recently. I see in my mail log
    that "mailman post" has been called. However, when I look in the
    logs/post file, it's been almost 24 hours since anything has been
    written there.

    The post log is written by SMTPDirect (under control of OutgoingRunner)
    when the outgoing message is delivered to the MTA.
    I'm running mailman on Ubuntu Hardy, and another weird thing is that
    whenever I stop mailman, it always leaves at least one process hanging
    around. I have to forcefully kill it. After I stop it, I still see:

    list 3833 0.0 0.4 83076 7540 ? Ss 08:29
    0:00 /usr/bin/python /usr/lib/mailman/bin/mailmanctl -s -q start
    list 3842 0.1 2.5 105372 38148 ? S 08:29
    0:01 /usr/bin/python /var/lib/mailman/bin/qrunner
    --runner=OutgoingRunner:0:1 -s

    It appears that SMTPDirect (actually the underlying Python smtplib) is
    hung waiting for a response from the MTA that isn't coming.

    Even more strange when I reboot the machine, I'll see 2 entire sets of
    mailman processes, almost as if the mailman start had been called twice.

    It seems like you have two init scripts for Mailman.

    In any case, right now it seems that mailman has stopped accepting
    posts. Is there a way to get more debug from mailman when "mailman post"
    is called? I don't see any errors, yet I don't see the post log file
    updating.

    "mailman post" just puts the message in the in/ queue. I assume from
    what you say above that it gets processed by IncomingRunner and even
    archived and the problem is in OutgoingRunner.

    See the FAQs at <http://wiki.list.org/x/A4E9> and
    <http://wiki.list.org/x/-IA9>.
  • Mark Sapiro at Nov 28, 2008 at 6:35 pm

    Ricardo Kleemann wrote:
    I have some more data on this...

    I enabled the debug in SMTPDirect.py, and it shows that there really
    aren't any problems. It shows that slowly a message is being sent out.

    So your real issue is why is it proceeding so slowly.

    I followed the performance tuning suggestions that had the MAX_RCPTS in
    Defaults.py at an optimal value from 2-5, so I set it to 3.

    Is the MTA doing DNS verification on incoming mail from Mailman? Are
    you having some DNS issue?

    But does that mean that mailman is going to simply get stuck on one
    single message distribution and won't process any others until this one
    is finished?

    Yes.

    I used to have these lists on another server and over there I had the
    MAX_RCPTS set to a high number, but my mail server is set to reject
    above 25 rcpts anyway so the end result that at max it would handle 25
    rcpts. I remember that whenever a message arrived for the list (again
    20,000 members) on the other server, the load average on the server
    would go pretty high as it processed the list.

    But now on this new server I never see the load avg go up. Is this
    because of the MAX_RCPTS setting? What else would keep mailman from
    efficiently handling the messages?

    Slow response from the MTA. Even with SMTP_MAX_RCPTS set to 3, you
    should be delivering on the order of 100 or more recipients per
    second. What do you see in Mailman's smtp log for processing tomes for
    messages. How do the latest ones compare to those from days or a week
    ago?

    The OutgoingRunner is just sitting there slowly distributing the message
    and never seems to get to the next one.

    It will when it finishes this one.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Ricardo Kleemann at Dec 1, 2008 at 7:50 pm
    Mark, thanks for your help.

    For the most part I did get around the SMTP issues last week, I disabled
    ident lookup on the smtp server. But it seems to be back today and I don't
    quite understand what could be the problem.

    I wrote a tiny perl script to test the connection to the server via
    localhost and I can consistently do 500 connections in 30 seconds. I don't
    see any other evidence of the smtp hanging up.

    But if I do an strace on the outgoing runner, it basically is working very
    slowly through a large memberlist. It will process a certain number, then
    hang a few seconds before it processes the next batch. The server is not
    loaded and I don't see any other evidence of problems.

    Do you have a small python script I can run to test out the localhost smtp
    that can maybe output some diagnostic information?

    Thanks
    Ricardo

    ----- Original Message -----
    From: "Mark Sapiro" <mark at msapiro.net>
    To: "Ricardo Kleemann" <ricardo at americasnet.com>
    Cc: "mailman-users" <mailman-users at python.org>
    Sent: Friday, November 28, 2008 10:35 AM
    Subject: Re: [Mailman-Users] when does logs/post get updated?

    Ricardo Kleemann wrote:
    I have some more data on this...

    I enabled the debug in SMTPDirect.py, and it shows that there really
    aren't any problems. It shows that slowly a message is being sent out.

    So your real issue is why is it proceeding so slowly.

    I followed the performance tuning suggestions that had the MAX_RCPTS in
    Defaults.py at an optimal value from 2-5, so I set it to 3.

    Is the MTA doing DNS verification on incoming mail from Mailman? Are
    you having some DNS issue?

    But does that mean that mailman is going to simply get stuck on one
    single message distribution and won't process any others until this one
    is finished?

    Yes.

    I used to have these lists on another server and over there I had the
    MAX_RCPTS set to a high number, but my mail server is set to reject
    above 25 rcpts anyway so the end result that at max it would handle 25
    rcpts. I remember that whenever a message arrived for the list (again
    20,000 members) on the other server, the load average on the server
    would go pretty high as it processed the list.

    But now on this new server I never see the load avg go up. Is this
    because of the MAX_RCPTS setting? What else would keep mailman from
    efficiently handling the messages?

    Slow response from the MTA. Even with SMTP_MAX_RCPTS set to 3, you
    should be delivering on the order of 100 or more recipients per
    second. What do you see in Mailman's smtp log for processing tomes for
    messages. How do the latest ones compare to those from days or a week
    ago?

    The OutgoingRunner is just sitting there slowly distributing the message
    and never seems to get to the next one.

    It will when it finishes this one.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Mark Sapiro at Dec 1, 2008 at 10:46 pm

    Ricardo Kleemann wrote:
    I wrote a tiny perl script to test the connection to the server via
    localhost and I can consistently do 500 connections in 30 seconds. I don't
    see any other evidence of the smtp hanging up.

    The connection rate is generally not an issue as Mailman normall does
    all the transactions over a single connection. I.e. it connects, does
    a "MAIL FROM" one or more "RCPT TO", and "DATA", and then begins
    another "MAIL FROM", etc transaction without quitting or
    disconnectiing.

    But if I do an strace on the outgoing runner, it basically is working very
    slowly through a large memberlist. It will process a certain number, then
    hang a few seconds before it processes the next batch. The server is not
    loaded and I don't see any other evidence of problems.

    If there are significant delays between sending RCPT TO and the
    responce from the server, the server is probably doing DNS verifies on
    the recipients. This kills performance on large lists.

    What's it doing when it "hangs for a few seconds"? Presumably it is
    either sending the message content or waiting for a reply from the MTA.

    Do you have a small python script I can run to test out the localhost smtp
    that can maybe output some diagnostic information?

    No. I don't have any such script. As I said in previous posts
    Even with SMTP_MAX_RCPTS set to 3, you
    should be delivering on the order of 100 or more recipients per
    second. What do you see in Mailman's smtp log for processing tomes for
    messages. How do the latest ones compare to those from days or a week
    ago? and
    Search the FAQ <http://wiki.list.org/x/AgA3> for "tuning".
    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Brad Knowles at Dec 2, 2008 at 6:08 am

    on 12/1/08 1:50 PM, Ricardo Kleemann said:

    But if I do an strace on the outgoing runner, it basically is working
    very slowly through a large memberlist. It will process a certain
    number, then hang a few seconds before it processes the next batch. The
    server is not loaded and I don't see any other evidence of problems.
    That sounds like your MTA is doing DNS lookups for each and every
    envelope recipient as you try to transmit mail. That needs to be fixed,
    and is one of the many "performance" tuning techniques that is discussed
    in the FAQ Wiki.

    --
    Brad Knowles <brad at shub-internet.org>
    LinkedIn Profile: <http://tinyurl.com/y8kpxu>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedNov 28, '08 at 4:54p
activeDec 2, '08 at 6:08a
posts9
users3
websitelist.org

People

Translate

site design / logo © 2022 Grokbase