For some reason our Mailman installation has gotten very slow. One
email distro took about 17 hours according to the user. There is data
flow as I can see by running the "mmdsr" stat script. I googled and
found something about qfiles can grow too large but don't have any in
my /var/lib/mailman directory. I also went through a checklist I found
online to tell if Mailman was hung and didn't see anything that this
matched.

The version of Mailman we are running is:

mailman-2.1.9-4.el5 on Red Hat Enterprise Linux Server release 5.2
(Tikanga).



Also bouncing mailman "appears" to make it run faster for a short while
when looking at the maillog but I could be imaging things.



I saw some of these errors from the mmdsr script:

mailman.2010_09_26: 1 Failed to unlink backup file:
/var/spool/mailman/virgin/1285535430.9693439+c15f2ea8cc51d4a0768ac9501e2
cb46604bca247.bak

mailman.2010_09_26: 1 Failed to unlink backup file:
/var/spool/mailman/in/1285358418.500119+88f4ebd1cd94a4b61b2422cf92d196b8
e30b7c3d.bak

mailman.2010_09_26: 1 Failed to unlink backup file:
/var/spool/mailman/in/1285358403.1655681+02fde9a4b5b52413bbf130b5591edc8
55b3697b4.bak

mailman.2010_09_26: 1 Failed to unlink backup file:
/var/spool/mailman/archive/1285357860.2852571+261eec64e3bbc89300e941420b
db7bbbbc63024d.bak

mailman.2010_09_26: 1 Failed to unlink backup file:
/var/spool/mailman/archive/1285357858.8509741+1978e52ebf4961eae874fd3298
3473ec5b928a92.bak

mailman.2010_09_26: 1 Failed to unlink backup file:
/var/spool/mailman/archive/1285357851.4396069+6c0356985765abcdb511eaf973
a1a08af2f411eb.bak

mailman.2010_09_26: 1 Failed to unlink backup file:
/var/spool/mailman/archive/1285357851.2718539+65d7680e2649c275aff5be0bf4
d447732180726a.bak





It appears those files do not exist now. The Hourly summary of Posts,
Post count by List, and Post Count by Sender is in the low thousands or
hundreds. I bounced Mailman multiple times as well as bouncing postfix
thinking the system was hung before investigating the stats. I'm going
to go home now as I've been fighting this all day but will check back
later.



Any help is truly appreciated,

Troy

Search Discussions

  • Mark Sapiro at Sep 27, 2010 at 2:22 am

    Troy Campbell wrote:
    For some reason our Mailman installation has gotten very slow. One
    email distro took about 17 hours according to the user. There is data
    flow as I can see by running the "mmdsr" stat script. I googled and
    found something about qfiles can grow too large but don't have any in
    my /var/lib/mailman directory. I also went through a checklist I found
    online to tell if Mailman was hung and didn't see anything that this
    matched.

    The version of Mailman we are running is:

    mailman-2.1.9-4.el5 on Red Hat Enterprise Linux Server release 5.2
    (Tikanga).

    In RedHat, queues are in var/spool/mailman/. See the FAQ at
    <http://wiki.list.org/x/KYCB>.

    Also bouncing mailman "appears" to make it run faster for a short while
    when looking at the maillog but I could be imaging things.

    What does this (bouncing mailman) mean?

    I saw some of these errors from the mmdsr script:

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/virgin/1285535430.9693439+c15f2ea8cc51d4a0768ac9501e2
    cb46604bca247.bak

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/in/1285358418.500119+88f4ebd1cd94a4b61b2422cf92d196b8
    e30b7c3d.bak

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/in/1285358403.1655681+02fde9a4b5b52413bbf130b5591edc8
    55b3697b4.bak

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/archive/1285357860.2852571+261eec64e3bbc89300e941420b
    db7bbbbc63024d.bak

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/archive/1285357858.8509741+1978e52ebf4961eae874fd3298
    3473ec5b928a92.bak

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/archive/1285357851.4396069+6c0356985765abcdb511eaf973
    a1a08af2f411eb.bak

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/archive/1285357851.2718539+65d7680e2649c275aff5be0bf4
    d447732180726a.bak





    It appears those files do not exist now. The Hourly summary of Posts,
    Post count by List, and Post Count by Sender is in the low thousands or
    hundreds. I bounced Mailman multiple times as well as bouncing postfix
    thinking the system was hung before investigating the stats. I'm going
    to go home now as I've been fighting this all day but will check back
    later.

    Do you have multiple copies of the qrunners running? That could account
    for the errors you see. See the FAQ at <http://wiki.list.org/x/_4A9>.

    Is Mailman's out/ queue backlogged? The symptom of this is several to
    many files in (in your case) /var/spool/mailman/out/ and entries in
    Mailman's smtp log with time stamps equal to the time stamp of the
    prior entry + the processing time for this entry.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Troy Campbell at Sep 27, 2010 at 2:55 am
    Thanks Mark for the reply... what I meant by "bouncing" was
    "restarting"...sorry for the slang. The emails I sent out to the list
    but it took about 3 hours. There is nothing in the "out" directory
    right now but there are 187 ".pck" files in the "in" directory if that
    means anything and 7624 ".pck" files in the "archive" directory.

    I restarted mailman carefully to verify that all processes stopped.

    I'm not exactly sure what to look for in the smtp log, which is
    /var/log/maillog in my case.

    Thanks again,
    Troy

    -----Original Message-----
    From: Mark Sapiro [mailto:mark at msapiro.net]
    Sent: Sunday, September 26, 2010 8:22 PM
    To: Troy Campbell; mailman-users at python.org
    Subject: Re: [Mailman-Users] mailman is very slow...

    Troy Campbell wrote:
    For some reason our Mailman installation has gotten very slow. One
    email distro took about 17 hours according to the user. There is data
    flow as I can see by running the "mmdsr" stat script. I googled and
    found something about qfiles can grow too large but don't have any in
    my /var/lib/mailman directory. I also went through a checklist I found
    online to tell if Mailman was hung and didn't see anything that this
    matched.

    The version of Mailman we are running is:

    mailman-2.1.9-4.el5 on Red Hat Enterprise Linux Server release 5.2
    (Tikanga).

    In RedHat, queues are in var/spool/mailman/. See the FAQ at
    <http://wiki.list.org/x/KYCB>.

    Also bouncing mailman "appears" to make it run faster for a short while
    when looking at the maillog but I could be imaging things.

    What does this (bouncing mailman) mean?

    I saw some of these errors from the mmdsr script:

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/virgin/1285535430.9693439+c15f2ea8cc51d4a0768ac9501e 2
    cb46604bca247.bak

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/in/1285358418.500119+88f4ebd1cd94a4b61b2422cf92d196b 8
    e30b7c3d.bak

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/in/1285358403.1655681+02fde9a4b5b52413bbf130b5591edc 8
    55b3697b4.bak

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/archive/1285357860.2852571+261eec64e3bbc89300e941420 b
    db7bbbbc63024d.bak

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/archive/1285357858.8509741+1978e52ebf4961eae874fd329 8
    3473ec5b928a92.bak

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/archive/1285357851.4396069+6c0356985765abcdb511eaf97 3
    a1a08af2f411eb.bak

    mailman.2010_09_26: 1 Failed to unlink backup file:
    /var/spool/mailman/archive/1285357851.2718539+65d7680e2649c275aff5be0bf 4
    d447732180726a.bak





    It appears those files do not exist now. The Hourly summary of Posts,
    Post count by List, and Post Count by Sender is in the low thousands or
    hundreds. I bounced Mailman multiple times as well as bouncing postfix
    thinking the system was hung before investigating the stats. I'm going
    to go home now as I've been fighting this all day but will check back
    later.

    Do you have multiple copies of the qrunners running? That could account
    for the errors you see. See the FAQ at <http://wiki.list.org/x/_4A9>.

    Is Mailman's out/ queue backlogged? The symptom of this is several to
    many files in (in your case) /var/spool/mailman/out/ and entries in
    Mailman's smtp log with time stamps equal to the time stamp of the
    prior entry + the processing time for this entry.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Mark Sapiro at Sep 27, 2010 at 3:39 am

    Troy Campbell wrote:
    Thanks Mark for the reply... what I meant by "bouncing" was
    "restarting"...sorry for the slang. The emails I sent out to the list
    but it took about 3 hours. There is nothing in the "out" directory
    right now but there are 187 ".pck" files in the "in" directory if that
    means anything and 7624 ".pck" files in the "archive" directory.

    I restarted mailman carefully to verify that all processes stopped.

    I'm not exactly sure what to look for in the smtp log, which is
    /var/log/maillog in my case.

    Mailman's 'smtp' log is with Mailman's other logs in /var/log/mailman/,
    but this is not the issue if you have no files in the out/ queue.

    The large number of files in the in/ and archive/ queues indicates you
    have a mail loop of some sort or you are the victim of a DOS attack.

    Stop Mailman. Move those queues aside in their entirety (e.g. mv
    /var/spool/mailman/in somewhere/else), and examine the messages with
    bin/show_qfiles.

    See if more /var/spool/mailman/in/ message files are created with
    Mailman stopped (new posts will create them even with Mailman stopped)

    Once you figure out what's going on, start Mailman.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Troy Campbell at Sep 27, 2010 at 4:13 am
    The "in" directory went empty soon after I created the "null" list (just
    didn't add any members)..didn't even have to stop mailman. I'm looking
    at the "archive" directory now trying to figure out why those files are
    there. It looks like it's one list over and over (different one than
    what was generating the "deferred" above). Is there anything to look at
    in particular in the message. Why are they ending up here? Side note,
    it's kind of odd I can't get into the list through the Web interface
    using the site password but can see its contents using the command line.
    I wonder if the list is corrupt somehow and should be recreated?

    -----Original Message-----
    From: Mark Sapiro [mailto:mark at msapiro.net]
    Sent: Sunday, September 26, 2010 9:40 PM
    To: Troy Campbell; mailman-users at python.org
    Subject: RE: [Mailman-Users] mailman is very slow...

    Troy Campbell wrote:
    Thanks Mark for the reply... what I meant by "bouncing" was
    "restarting"...sorry for the slang. The emails I sent out to the list
    but it took about 3 hours. There is nothing in the "out" directory
    right now but there are 187 ".pck" files in the "in" directory if that
    means anything and 7624 ".pck" files in the "archive" directory.

    I restarted mailman carefully to verify that all processes stopped.

    I'm not exactly sure what to look for in the smtp log, which is
    /var/log/maillog in my case.

    Mailman's 'smtp' log is with Mailman's other logs in /var/log/mailman/,
    but this is not the issue if you have no files in the out/ queue.

    The large number of files in the in/ and archive/ queues indicates you
    have a mail loop of some sort or you are the victim of a DOS attack.

    Stop Mailman. Move those queues aside in their entirety (e.g. mv
    /var/spool/mailman/in somewhere/else), and examine the messages with
    bin/show_qfiles.

    See if more /var/spool/mailman/in/ message files are created with
    Mailman stopped (new posts will create them even with Mailman stopped)

    Once you figure out what's going on, start Mailman.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Troy Campbell at Sep 27, 2010 at 4:17 am
    Sorry I realized I hadn't sent a followup email which is why you might
    find my reply below a little confusing. Here is that followup:

    I've got a little more information. I noticed that there was a lot of
    "deferred" postfix connections. When I dumped out the deferred queue
    using "postqueue -p | more" and then looked an an individual using
    "postcat -q 677F0FE66" for example, then I see someone is trying over
    and over again to send an email to a non-existent list and then the list
    server is replying back and getting a "connection refused" and putting
    that on the "deferred" queue. Is there an easy way to send the incoming
    request to the non-existent queue to /dev/null until I can get a hold of
    the admin of the server sending me this? I'm tempted to create the list
    that they are trying to reach and then add a member translated to
    /dev/null in postfix but wondering if there might be even an easier way?

    Thanks,
    Troy

    -----Original Message-----
    From: Troy Campbell
    Sent: Sunday, September 26, 2010 10:13 PM
    To: 'Mark Sapiro'; 'mailman-users at python.org'
    Subject: RE: [Mailman-Users] mailman is very slow...

    The "in" directory went empty soon after I created the "null" list (just
    didn't add any members)..didn't even have to stop mailman. I'm looking
    at the "archive" directory now trying to figure out why those files are
    there. It looks like it's one list over and over (different one than
    what was generating the "deferred" above). Is there anything to look at
    in particular in the message. Why are they ending up here? Side note,
    it's kind of odd I can't get into the list through the Web interface
    using the site password but can see its contents using the command line.
    I wonder if the list is corrupt somehow and should be recreated?

    -----Original Message-----
    From: Mark Sapiro [mailto:mark at msapiro.net]
    Sent: Sunday, September 26, 2010 9:40 PM
    To: Troy Campbell; mailman-users at python.org
    Subject: RE: [Mailman-Users] mailman is very slow...

    Troy Campbell wrote:
    Thanks Mark for the reply... what I meant by "bouncing" was
    "restarting"...sorry for the slang. The emails I sent out to the list
    but it took about 3 hours. There is nothing in the "out" directory
    right now but there are 187 ".pck" files in the "in" directory if that
    means anything and 7624 ".pck" files in the "archive" directory.

    I restarted mailman carefully to verify that all processes stopped.

    I'm not exactly sure what to look for in the smtp log, which is
    /var/log/maillog in my case.

    Mailman's 'smtp' log is with Mailman's other logs in /var/log/mailman/,
    but this is not the issue if you have no files in the out/ queue.

    The large number of files in the in/ and archive/ queues indicates you
    have a mail loop of some sort or you are the victim of a DOS attack.

    Stop Mailman. Move those queues aside in their entirety (e.g. mv
    /var/spool/mailman/in somewhere/else), and examine the messages with
    bin/show_qfiles.

    See if more /var/spool/mailman/in/ message files are created with
    Mailman stopped (new posts will create them even with Mailman stopped)

    Once you figure out what's going on, start Mailman.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Mark Sapiro at Sep 27, 2010 at 10:53 am

    Troy Campbell wrote:
    I've got a little more information. I noticed that there was a lot of
    "deferred" postfix connections. When I dumped out the deferred queue
    using "postqueue -p | more" and then looked an an individual using
    "postcat -q 677F0FE66" for example, then I see someone is trying over
    and over again to send an email to a non-existent list and then the list
    server is replying back and getting a "connection refused" and putting
    that on the "deferred" queue. Is there an easy way to send the incoming
    request to the non-existent queue to /dev/null until I can get a hold of
    the admin of the server sending me this? I'm tempted to create the list
    that they are trying to reach and then add a member translated to
    /dev/null in postfix but wondering if there might be even an easier way?

    This is entirely a Postfix issue. If mail is sent to a non-existent
    list, Postfix should refuse the mail at incoming SMTP time. You may
    need

    unknown_local_recipient_reject_code = 550

    in Postfix main.cf.

    If any case, Mailman should not be involved no matter what, even if
    Postfix is responding with a 450.

    What is Postfix's method of delivery to Mailman? Why is mail for a
    non-existent list being responded to at all?

    The "in" directory went empty soon after I created the "null" list (just
    didn't add any members)..didn't even have to stop mailman.

    OK, but something is wrong as Mailman shouldn't be involved with mail
    to a non-existent list any more than any non-existent user.

    I'm looking
    at the "archive" directory now trying to figure out why those files are
    there. It looks like it's one list over and over (different one than
    what was generating the "deferred" above). Is there anything to look at
    in particular in the message. Why are they ending up here?

    They are messages that have been posted to a list and are queued to be
    added to the list's archive.

    Side note,
    it's kind of odd I can't get into the list through the Web interface
    using the site password but can see its contents using the command line.
    I wonder if the list is corrupt somehow and should be recreated?

    The list may be locked. See the FAQ at <http://wiki.list.org/x/noA9>.

    What happens for an attempted web access? What's in Mailman's 'error',
    'qrunner' and 'locks' logs?

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Troy Campbell at Sep 27, 2010 at 4:23 pm
    Interestingly I have the

    unknown_local_recipient_reject_code = 550

    in my postfix configuration file so I'll have to investigate why that's
    not working.

    There were several pid lockfiles on the list of which one looks like it
    still had a pid related to it so I kept it and removed the others.

    Thanks again for your valuable support,
    Troy

    -----Original Message-----
    From: Mark Sapiro [mailto:mark at msapiro.net]
    Sent: Monday, September 27, 2010 4:54 AM
    To: Troy Campbell; mailman-users at python.org
    Subject: RE: [Mailman-Users] mailman is very slow...

    Troy Campbell wrote:
    I've got a little more information. I noticed that there was a lot of
    "deferred" postfix connections. When I dumped out the deferred queue
    using "postqueue -p | more" and then looked an an individual using
    "postcat -q 677F0FE66" for example, then I see someone is trying over
    and over again to send an email to a non-existent list and then the list
    server is replying back and getting a "connection refused" and putting
    that on the "deferred" queue. Is there an easy way to send the incoming
    request to the non-existent queue to /dev/null until I can get a hold of
    the admin of the server sending me this? I'm tempted to create the list
    that they are trying to reach and then add a member translated to
    /dev/null in postfix but wondering if there might be even an easier
    way?


    This is entirely a Postfix issue. If mail is sent to a non-existent
    list, Postfix should refuse the mail at incoming SMTP time. You may
    need

    unknown_local_recipient_reject_code = 550

    in Postfix main.cf.

    If any case, Mailman should not be involved no matter what, even if
    Postfix is responding with a 450.

    What is Postfix's method of delivery to Mailman? Why is mail for a
    non-existent list being responded to at all?

    The "in" directory went empty soon after I created the "null" list (just
    didn't add any members)..didn't even have to stop mailman.

    OK, but something is wrong as Mailman shouldn't be involved with mail
    to a non-existent list any more than any non-existent user.

    I'm looking
    at the "archive" directory now trying to figure out why those files are
    there. It looks like it's one list over and over (different one than
    what was generating the "deferred" above). Is there anything to look at
    in particular in the message. Why are they ending up here?

    They are messages that have been posted to a list and are queued to be
    added to the list's archive.

    Side note,
    it's kind of odd I can't get into the list through the Web interface
    using the site password but can see its contents using the command line.
    I wonder if the list is corrupt somehow and should be recreated?

    The list may be locked. See the FAQ at <http://wiki.list.org/x/noA9>.

    What happens for an attempted web access? What's in Mailman's 'error',
    'qrunner' and 'locks' logs?

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedSep 26, '10 at 10:23p
activeSep 27, '10 at 4:23p
posts8
users2
websitelist.org

2 users in discussion

Troy Campbell: 5 posts Mark Sapiro: 3 posts

People

Translate

site design / logo © 2022 Grokbase