FAQ
I'm using Mailman 2.1.2 on FreeBSD v4.8-Release, built using the port. MTA
is sendmail 8.12.8p1

Very frequently I will see the ArchRunner process using 99+ % of cpu. I have
searched the archives and found lots of messages about qrunners using large
percentages of cpu, but they all seem to talk about the fixes being related
to actual mail processing (sendmail), not archRunner. I am assuming that if
the problem was mail delivery or reception I would be seeing the large cpu
use on a different qrunner process. My issue is specific to the archrunner
process which I don't find much on in the archives/faq.

I am using a pretty default install, haven't tweaked anything. If it
helps... here are some possibly germane things:

1) I never seem to be able to catch anything in
/usr/local/mailman/qfiles/archive, but that may be a timing thing, as my
archives do appear to be getting updated.
2) I looked in the /usr/local/mailman/archives/private/*.mbox directories,
and find listname.mbox at 33mb and listname.mbox.1 at 54mb. Could it be that
these files are just so big that it takes huge amounts of cpu to add posts
to these? I'm guessing they are the archives. This gives rise to several
questions (someone else maintained this setup before I did). Does mailman
split them (the .1 file), or can I just rename listname.mbox to
listname.mbox.2 and mailman will have a smaller chunk to deal with?

Any thoughts? Thanks in advance!!!

I have another question or two but will post separately for them.

Jay West

---
[This E-mail scanned for viruses by Declude Virus]

Search Discussions

  • Jon Carnes at Oct 31, 2003 at 2:40 pm
    Well you've pegged it. That was a bug in version 2.1.2 which is fixed
    in 2.1.3. The patch for 2.1.2 should still be available - you could
    probably patch your running system and just leave it at that (an upgrade
    will bring the patch in anyway).

    Good Luck - Jon Carnes
    On Fri, 2003-10-31 at 09:26, Jay West wrote:
    I'm using Mailman 2.1.2 on FreeBSD v4.8-Release, built using the port. MTA
    is sendmail 8.12.8p1

    Very frequently I will see the ArchRunner process using 99+ % of cpu. I have
    searched the archives and found lots of messages about qrunners using large
    percentages of cpu, but they all seem to talk about the fixes being related
    to actual mail processing (sendmail), not archRunner. I am assuming that if
    the problem was mail delivery or reception I would be seeing the large cpu
    use on a different qrunner process. My issue is specific to the archrunner
    process which I don't find much on in the archives/faq.

    I am using a pretty default install, haven't tweaked anything. If it
    helps... here are some possibly germane things:

    1) I never seem to be able to catch anything in
    /usr/local/mailman/qfiles/archive, but that may be a timing thing, as my
    archives do appear to be getting updated.
    2) I looked in the /usr/local/mailman/archives/private/*.mbox directories,
    and find listname.mbox at 33mb and listname.mbox.1 at 54mb. Could it be that
    these files are just so big that it takes huge amounts of cpu to add posts
    to these? I'm guessing they are the archives. This gives rise to several
    questions (someone else maintained this setup before I did). Does mailman
    split them (the .1 file), or can I just rename listname.mbox to
    listname.mbox.2 and mailman will have a smaller chunk to deal with?

    Any thoughts? Thanks in advance!!!

    I have another question or two but will post separately for them.

    Jay West

    ---
    [This E-mail scanned for viruses by Declude Virus]


    ------------------------------------------------------
    Mailman-Users mailing list
    Mailman-Users at python.org
    http://mail.python.org/mailman/listinfo/mailman-users
    Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
    Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/

    This message was sent to: jonc at nc.rr.com
    Unsubscribe or change your options at
    http://mail.python.org/mailman/options/mailman-users/jonc%40nc.rr.com
  • Scott Lambert at Oct 31, 2003 at 8:52 pm

    On Fri, Oct 31, 2003 at 09:40:11AM -0500, Jon Carnes wrote:
    On Fri, 2003-10-31 at 09:26, Jay West wrote:
    I'm using Mailman 2.1.2 on FreeBSD v4.8-Release, built using the port. MTA
    is sendmail 8.12.8p1

    Very frequently I will see the ArchRunner process using 99+ % of cpu. I have
    searched the archives and found lots of messages about qrunners using large
    percentages of cpu, but they all seem to talk about the fixes being related
    to actual mail processing (sendmail), not archRunner. I am assuming that if
    the problem was mail delivery or reception I would be seeing the large cpu
    use on a different qrunner process. My issue is specific to the archrunner
    process which I don't find much on in the archives/faq.
    Well you've pegged it. That was a bug in version 2.1.2 which is fixed
    in 2.1.3. The patch for 2.1.2 should still be available - you could
    probably patch your running system and just leave it at that (an upgrade
    will bring the patch in anyway).
    I still see this problem with Mailman 2.1.3 for a high-volume list.

    PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND
    66428 mailman 64 0 168M 147M CPU1 0 376.7H 99.02% 99.02% python2.3

    That's the archiver process. There are 1318 messages in the archive
    queue...

    12:00:28 Fri Oct 31 # truss -p 66428
    break(0x114f6000) = 0 (0x0)
    break(0x1302c000) = 0 (0x0)
    break(0x114f8000) = 0 (0x0)
    break(0x13030000) = 0 (0x0)
    break(0x114fa000) = 0 (0x0)
    break(0x13034000) = 0 (0x0)
    break(0x114fc000) = 0 (0x0)
    break(0x13038000) = 0 (0x0)
    break(0x114fe000) = 0 (0x0)
    break(0x1303c000) = 0 (0x0)
    break(0x11500000) = 0 (0x0)
    break(0x13040000) = 0 (0x0)
    break(0x11502000) = 0 (0x0)
    break(0x13044000) = 0 (0x0)
    break(0x11504000) = 0 (0x0)
    break(0x13048000) = 0 (0x0)
    break(0x11506000) = 0 (0x0)
    break(0x1304c000) = 0 (0x0)

    Once I kill off the mailman queue runners and clean up the several lock
    files for this mailing list, it runs just fine and manages to empty the
    archive queue.

    Two days worth of mailman cron jobs were still stuck in the process list.

    Supposition: Maybe they were blocked by the list's lockfile?

    So, it seems that the archRunner process went off the deep end somewhere
    between two and three days ago.

    I have the htdig patches for 2.1.3 installed. Which might be germane...

    --
    Scott Lambert KC5MLE Unix SysAdmin
    lambert at lambertfam.org
  • Richard Barrett at Oct 31, 2003 at 9:35 pm

    On Friday, October 31, 2003, at 08:52 pm, Scott Lambert wrote:
    On Fri, Oct 31, 2003 at 09:40:11AM -0500, Jon Carnes wrote:
    On Fri, 2003-10-31 at 09:26, Jay West wrote:
    I'm using Mailman 2.1.2 on FreeBSD v4.8-Release, built using the
    port. MTA
    is sendmail 8.12.8p1

    Very frequently I will see the ArchRunner process using 99+ % of
    cpu. I have
    searched the archives and found lots of messages about qrunners
    using large
    percentages of cpu, but they all seem to talk about the fixes being
    related
    to actual mail processing (sendmail), not archRunner. I am assuming
    that if
    the problem was mail delivery or reception I would be seeing the
    large cpu
    use on a different qrunner process. My issue is specific to the
    archrunner
    process which I don't find much on in the archives/faq.
    Well you've pegged it. That was a bug in version 2.1.2 which is fixed
    in 2.1.3. The patch for 2.1.2 should still be available - you could
    probably patch your running system and just leave it at that (an
    upgrade
    will bring the patch in anyway).
    I still see this problem with Mailman 2.1.3 for a high-volume list.

    PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU
    COMMAND
    66428 mailman 64 0 168M 147M CPU1 0 376.7H 99.02% 99.02%
    python2.3

    That's the archiver process. There are 1318 messages in the archive
    queue...

    12:00:28 Fri Oct 31 # truss -p 66428
    break(0x114f6000) = 0 (0x0)
    break(0x1302c000) = 0 (0x0)
    break(0x114f8000) = 0 (0x0)
    break(0x13030000) = 0 (0x0)
    break(0x114fa000) = 0 (0x0)
    break(0x13034000) = 0 (0x0)
    break(0x114fc000) = 0 (0x0)
    break(0x13038000) = 0 (0x0)
    break(0x114fe000) = 0 (0x0)
    break(0x1303c000) = 0 (0x0)
    break(0x11500000) = 0 (0x0)
    break(0x13040000) = 0 (0x0)
    break(0x11502000) = 0 (0x0)
    break(0x13044000) = 0 (0x0)
    break(0x11504000) = 0 (0x0)
    break(0x13048000) = 0 (0x0)
    break(0x11506000) = 0 (0x0)
    break(0x1304c000) = 0 (0x0)

    Once I kill off the mailman queue runners and clean up the several lock
    files for this mailing list, it runs just fine and manages to empty the
    archive queue.

    Two days worth of mailman cron jobs were still stuck in the process
    list.

    Supposition: Maybe they were blocked by the list's lockfile?

    So, it seems that the archRunner process went off the deep end
    somewhere
    between two and three days ago.

    I have the htdig patches for 2.1.3 installed. Which might be
    germane...
    If you are referring to patch #444884 then, while I would never say
    never, it is not highly likely to be the cause. The code inserted by
    patch #444884 impinges very little on the execution path taken when
    mail is being archived and archive pages are being generated by
    pipermail. If you discover any different let me know and I'll take
    another look at the htdig integration patch.

    You say you have the problem with a high volume list. What sort of
    message sizes and traffic volume is the list handling? Do the messages
    tend to have large attachments? I have found that the internal
    pipermail archiver starts to choke on high volume lists and on a least
    one of them I run the solution I adopted was to reduce the archiving
    period from a month to a week, which seemed to alleviate the problem. I
    suspect the problem is partially related to the pickled data structures
    that pipermail uses to control archiver operation and index generation.

    I'm now using a fairly tight Mailman/MHonArc integration for such
    lists; I developed it because MHonArc has a reputation for handling
    large archives better than pipermail but I still wanted MM list archive
    privacy, my htdig integration, etc. A patch for this is available at
    http://www.openinfo.co.uk/mailman/patches/mhonarc/index.html or as MM
    patch #820723 on sourceforge. It subcontracts MHonArc to generate the
    message and period index pages in the normal
    $prefix/archives/private/<listname>/<archive-period> directory
    structure while the pipermail/MM code looks after the top level index,
    archive control and access control. The integration makes the choice of
    pipermail or MHonArc a per-list option so if you change your mind or
    decide it was all a big mistake it is not a disaster; select the
    archiver of choice and run $prefix/bin/arch --wipe to have the archiver
    of choice regenerate the list archive from the its mbox file.

    So far this MM/MH integration has worked OK for me but that's a single
    data point.

    Enough over-selling of a free product and the usual caveat emptor :)
    but if you give it a try let me know how you get on.
    --
    Scott Lambert KC5MLE Unix
    SysAdmin
    lambert at lambertfam.org
    -----------------------------------------------------------------------
    Richard Barrett http://www.openinfo.co.uk
  • Scott Lambert at Oct 31, 2003 at 11:21 pm

    On Fri, Oct 31, 2003 at 03:52:34PM -0500, Scott Lambert wrote:
    Once I kill off the mailman queue runners and clean up the several lock
    files for this mailing list, it runs just fine and manages to empty the
    archive queue.
    Well, the above statement is not entirely accurate. It was working
    quickly immediately after restart but went downhill. I logged out and
    took care of other things after seeing it move a good number of messages
    in a short amount of time. Five hours later, it still had 377 messages
    in the archive queue and was taking several minutes per message. I
    trussed it again and saw more of the incredibly long series of breaks,
    but watched it long than I did this morning. After a lot of breaks it
    goes to a lot of writes then does some file stuff quickly and repeats for
    the next message.

    I restarted the queue runners again and it it processed fourty or so
    messages quickly then began the downward spiral again. Within reducing
    the queue to 177 entries, it was back to 3 minutes per message and
    expanding. Restarting knocked it down pretty quick for a while then
    started taking longer again. I was watching more closely this time.
    After a couple more restart cycles, the queue was cleaned out quickly
    and all is well.

    I haven't looked at the code yet, and probably won't (ENOTIME), but it
    almost sounds to me like it's not pruning it's list of handled messages
    and has to walk all of them each time. I would have expected queue
    handling to get faster as the queue got smaller due to fewer files
    in the directory that it needs to search through. Maybe it's just a
    function of the python datastructure being used.

    The fast after restart part makes me doubt that it is the size of the
    archive that is at issue.

    The server we are using is a dual PIII450 machine. I would guess this
    would not show as such a big problem on a more modern system, but other
    than the archiver, this box is more than enough for the load on it.

    The dual processor aspect of this box is what allows us to miss the
    archiver running off the deep end until someone complains that the
    archive search feature is broken. The mail passes through the system
    just fine using the other processor.

    38M 2003-October.txt
    13M 2003-October.txt.gz
    48M portsidelist.mbox

    --
    Scott Lambert KC5MLE Unix SysAdmin
    lambert at lambertfam.org
  • Brad Knowles at Oct 31, 2003 at 11:59 pm

    At 6:21 PM -0500 2003/10/31, Scott Lambert wrote:

    I haven't looked at the code yet, and probably won't (ENOTIME), but it
    almost sounds to me like it's not pruning it's list of handled messages
    and has to walk all of them each time. I would have expected queue
    handling to get faster as the queue got smaller due to fewer files
    in the directory that it needs to search through. Maybe it's just a
    function of the python datastructure being used.
    If it's using files as the queue mechanism, then deleting a file
    simply marks the entry in the directory as "available", and it still
    takes just at long to scan the directory afterwards as it did before.

    This is a known problem with many MTAs handling large amounts of
    messages, and is one reason why you should use a hashed directory
    scheme for your mail queue (a la postfix), or you should periodically
    stop the MTA, move the mail queue directory aside, create a new mail
    queue directory (with appropriate ownership and permissions), then
    move what messages may remain from the old queue back into the new
    one (or fire up queue runners to clear the old queue while the new
    one is being used for new mail).

    Mailman could very easily be suffering from the same sort of
    problem -- once you get a directory with a large number of entries in
    it, it takes a long time to scan it even if there are only a few
    files that are currently visible. Same problem, perhaps the same
    solution?

    --
    Brad Knowles, <brad.knowles at skynet.be>

    "They that can give up essential liberty to obtain a little temporary
    safety deserve neither liberty nor safety."
    -Benjamin Franklin, Historical Review of Pennsylvania.

    GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
    !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
    tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
  • Richard Barrett at Nov 1, 2003 at 12:52 am

    On Friday, October 31, 2003, at 11:59 pm, Brad Knowles wrote:
    At 6:21 PM -0500 2003/10/31, Scott Lambert wrote:

    I haven't looked at the code yet, and probably won't (ENOTIME), but
    it
    almost sounds to me like it's not pruning it's list of handled
    messages
    and has to walk all of them each time. I would have expected queue
    handling to get faster as the queue got smaller due to fewer files
    in the directory that it needs to search through. Maybe it's just a
    function of the python datastructure being used.
    If it's using files as the queue mechanism, then deleting a file
    simply marks the entry in the directory as "available", and it still
    takes just at long to scan the directory afterwards as it did before.

    This is a known problem with many MTAs handling large amounts of
    messages, and is one reason why you should use a hashed directory
    scheme for your mail queue (a la postfix), or you should periodically
    stop the MTA, move the mail queue directory aside, create a new mail
    queue directory (with appropriate ownership and permissions), then
    move what messages may remain from the old queue back into the new one
    (or fire up queue runners to clear the old queue while the new one is
    being used for new mail).
    In MM 2.1.3, the relevant code is in
    $prefix/Mailman/Queue/Switchboard.py function files() starting at line
    204 which is called from $prefix/Mailman/Queue/Runner.py line 89 when
    subclassed from $prefix/Mailman/Queue/ArchRunner.py

    Rather than just theorize, feel free to make specific suggestions about
    the deficiencies and appropriate remedies based on the code being
    executed. Dare I say it, you could even submit a patch to fix any
    obvious errors in the code.
    Mailman could very easily be suffering from the same sort of problem
    -- once you get a directory with a large number of entries in it, it
    takes a long time to scan it even if there are only a few files that
    are currently visible. Same problem, perhaps the same solution?

    -- Brad Knowles, <brad.knowles at skynet.be>

    "They that can give up essential liberty to obtain a little temporary
    safety deserve neither liberty nor safety."
    -Benjamin Franklin, Historical Review of Pennsylvania.

    GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---)
    W+++(--) N+
    !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++)
    R+(+++)
    tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)*
    z(+++)

    ------------------------------------------------------
    Mailman-Users mailing list
    Mailman-Users at python.org
    http://mail.python.org/mailman/listinfo/mailman-users
    Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
    Searchable Archives:
    http://www.mail-archive.com/mailman-users%40python.org/

    This message was sent to: r.barrett at openinfo.co.uk
    Unsubscribe or change your options at
    http://mail.python.org/mailman/options/mailman-users/
    r.barrett%40openinfo.co.uk
  • Brad Knowles at Nov 1, 2003 at 10:40 pm

    At 12:52 AM +0000 2003/11/01, Richard Barrett wrote:

    Rather than just theorize, feel free to make specific suggestions
    about the deficiencies and appropriate remedies based on the code
    being executed. Dare I say it, you could even submit a patch to
    fix any obvious errors in the code.
    I have said before, and I will say again, that I am not a
    programmer. The last time I did any "real" programming was when I
    was a senior in college, before I graduated -- 1989.

    I can talk intelligently about mechanisms and techniques that are
    known to have specific flaws, but don't ask me to write or comment on
    code. If you do, please restrict your languages to Bourne shell or
    maybe a bit of Perl (not too obfuscated, please).

    --
    Brad Knowles, <brad.knowles at skynet.be>

    "They that can give up essential liberty to obtain a little temporary
    safety deserve neither liberty nor safety."
    -Benjamin Franklin, Historical Review of Pennsylvania.

    GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
    !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
    tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
  • Scott Lambert at Nov 1, 2003 at 2:29 am

    On Sat, Nov 01, 2003 at 12:59:24AM +0100, Brad Knowles wrote:
    At 6:21 PM -0500 2003/10/31, Scott Lambert wrote:
    I haven't looked at the code yet, and probably won't (ENOTIME), but
    it almost sounds to me like it's not pruning it's list of handled
    messages and has to walk all of them each time. I would have
    expected queue handling to get faster as the queue got smaller due
    to fewer files in the directory that it needs to search through.
    Maybe it's just a function of the python datastructure being used.
    If it's using files as the queue mechanism, then deleting a file
    simply marks the entry in the directory as "available", and it still
    takes just at long to scan the directory afterwards as it did before.
    If we were talking about more than 10,000 files, I might buy it. But we
    are talking about 1300 files. Also the processing goes something like
    O(n), in reverse, slower as it processes the files in the directory. I
    might buy it staying slow if it started slow but it doesn't.

    --
    Scott Lambert KC5MLE Unix SysAdmin
    lambert at lambertfam.org
  • Jon Carnes at Nov 1, 2003 at 2:54 am

    On Fri, 2003-10-31 at 21:29, Scott Lambert wrote:
    On Sat, Nov 01, 2003 at 12:59:24AM +0100, Brad Knowles wrote:
    At 6:21 PM -0500 2003/10/31, Scott Lambert wrote:
    I haven't looked at the code yet, and probably won't (ENOTIME), but
    it almost sounds to me like it's not pruning it's list of handled
    messages and has to walk all of them each time. I would have
    expected queue handling to get faster as the queue got smaller due
    to fewer files in the directory that it needs to search through.
    Maybe it's just a function of the python datastructure being used.
    If it's using files as the queue mechanism, then deleting a file
    simply marks the entry in the directory as "available", and it still
    takes just at long to scan the directory afterwards as it did before.
    If we were talking about more than 10,000 files, I might buy it. But we
    are talking about 1300 files. Also the processing goes something like
    O(n), in reverse, slower as it processes the files in the directory. I
    might buy it staying slow if it started slow but it doesn't.
    To me it sounds like a memory problem.

    I wonder how fast we can fix it?
  • Brad Knowles at Nov 1, 2003 at 10:45 pm

    At 9:29 PM -0500 2003/10/31, Scott Lambert wrote:

    If we were talking about more than 10,000 files, I might buy it. But we
    are talking about 1300 files.
    Many filesystems start significantly slowing down around 1,000
    files, not 10,000. Moreover, are you sure that this is the largest
    number of files you've ever had in that directory?
    Also the processing goes something like
    O(n), in reverse, slower as it processes the files in the directory.
    That is a bit strange, but might be explained by holes in the
    directory structure that need to be skipped.
    I
    might buy it staying slow if it started slow but it doesn't.
    I've seen mail servers at large freemail providers that had
    previously grown to very large sizes, and worked reasonably well for
    numbers of files in the low thousands, but seriously flaked out when
    pushed much beyond that.

    Move the directory aside, move the files to a new directory, and
    restart -- suddenly everything works like magic again.


    Unless you know the filesystem code intimately, as well as the
    code that is using the filesystem, it can be difficult to predict how
    or when things will break or how badly they will break.

    --
    Brad Knowles, <brad.knowles at skynet.be>

    "They that can give up essential liberty to obtain a little temporary
    safety deserve neither liberty nor safety."
    -Benjamin Franklin, Historical Review of Pennsylvania.

    GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+
    !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++)
    tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
  • Richard Barrett at Nov 3, 2003 at 10:39 pm
    Scott

    Further to my earlier post on this topic, I have taken a look at the
    pipermail archiver code.

    I concluded that there is a bug (or is it a feature?) which bloats the
    size of the -article file in the pipermail "database" for each list.
    This bloat will affect archiving performance, particularly for list
    with large amounts of traffic and/or those that have large text
    postings to them.

    I think the bug has been around for a number of releases and it would
    explain why I had previously found shortening the archive period
    improved matters.

    This may or may not be part of the problem you reported. I have posted
    a patch to correct this problem here which you might like to try if you
    are feeling particularly brave:

    http://www.openinfo.co.uk/mailman/patches/835332/index.html

    and here:

    http://sourceforge.net/tracker/
    ?funcÞtail&aidƒ5332&group_id3&atid0103

    Feedback either +ve or -ve would be appreciated if you try the patch.

    Richard
    On Friday, October 31, 2003, at 08:52 pm, Scott Lambert wrote:
    On Fri, Oct 31, 2003 at 09:40:11AM -0500, Jon Carnes wrote:
    On Fri, 2003-10-31 at 09:26, Jay West wrote:
    I'm using Mailman 2.1.2 on FreeBSD v4.8-Release, built using the
    port. MTA
    is sendmail 8.12.8p1

    Very frequently I will see the ArchRunner process using 99+ % of
    cpu. I have
    searched the archives and found lots of messages about qrunners
    using large
    percentages of cpu, but they all seem to talk about the fixes being
    related
    to actual mail processing (sendmail), not archRunner. I am assuming
    that if
    the problem was mail delivery or reception I would be seeing the
    large cpu
    use on a different qrunner process. My issue is specific to the
    archrunner
    process which I don't find much on in the archives/faq.
    Well you've pegged it. That was a bug in version 2.1.2 which is fixed
    in 2.1.3. The patch for 2.1.2 should still be available - you could
    probably patch your running system and just leave it at that (an
    upgrade
    will bring the patch in anyway).
    I still see this problem with Mailman 2.1.3 for a high-volume list.

    PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU
    COMMAND
    66428 mailman 64 0 168M 147M CPU1 0 376.7H 99.02% 99.02%
    python2.3

    That's the archiver process. There are 1318 messages in the archive
    queue...

    12:00:28 Fri Oct 31 # truss -p 66428
    break(0x114f6000) = 0 (0x0)
    break(0x1302c000) = 0 (0x0)
    break(0x114f8000) = 0 (0x0)
    break(0x13030000) = 0 (0x0)
    break(0x114fa000) = 0 (0x0)
    break(0x13034000) = 0 (0x0)
    break(0x114fc000) = 0 (0x0)
    break(0x13038000) = 0 (0x0)
    break(0x114fe000) = 0 (0x0)
    break(0x1303c000) = 0 (0x0)
    break(0x11500000) = 0 (0x0)
    break(0x13040000) = 0 (0x0)
    break(0x11502000) = 0 (0x0)
    break(0x13044000) = 0 (0x0)
    break(0x11504000) = 0 (0x0)
    break(0x13048000) = 0 (0x0)
    break(0x11506000) = 0 (0x0)
    break(0x1304c000) = 0 (0x0)

    Once I kill off the mailman queue runners and clean up the several lock
    files for this mailing list, it runs just fine and manages to empty the
    archive queue.

    Two days worth of mailman cron jobs were still stuck in the process
    list.

    Supposition: Maybe they were blocked by the list's lockfile?

    So, it seems that the archRunner process went off the deep end
    somewhere
    between two and three days ago.

    I have the htdig patches for 2.1.3 installed. Which might be
    germane...

    --
    Scott Lambert KC5MLE Unix
    SysAdmin
    lambert at lambertfam.org
    -----------------------------------------------------------------------
    Richard Barrett http://www.openinfo.co.uk
  • Jay West at Oct 31, 2003 at 9:35 pm
    John wrote...
    Well you've pegged it. That was a bug in version 2.1.2 which is fixed
    in 2.1.3. The patch for 2.1.2 should still be available - you could
    probably patch your running system and just leave it at that (an upgrade
    will bring the patch in anyway).
    I am having trouble finding that specific patch (for archrunner performance)
    in the patch area of the website. I see one patch specifically for
    archrunner, but the things that it fixes don't mention anything that sounds
    outwardly similar to my performance problem. Can someone confirm or deny if
    this is the correct patch? If not, maybe point me to the right one?

    Thanks!

    Jay West

    ---
    [This E-mail scanned for viruses by Declude Virus]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedOct 31, '03 at 2:26p
activeNov 3, '03 at 10:39p
posts13
users6
websitelist.org

People

Translate

site design / logo © 2022 Grokbase