FAQ
We had an issue with a failed disk on our Mailman server recently. As a
result, some of our lists' config.pck and config.pck.last files were
corrupted. I have restored files from older backups or dropped and
recreated the lists.

All of the Mailman admin web pages and user web pages appear to work at
this time.

The issue I have is that at least one list is not automatically updating
its archive when new posts come in. I have verified by sending posts to
other lists that their archives are updated at that time.

If I run "arch --wipe <list> <path to list mbox file>", the list archive
is re-built successfully and includes the latest posts that have come
in. New posts to this list are being appended to the .mbox file.

Here is the section of error messages that are showing up in the
"logs/error" file:

=====

Aug 10 22:56:54 2008 (23548) Uncaught runner exception: invalid load
key, '^_'.
Aug 10 22:56:54 2008 (23548) Traceback (most recent call last):
~ File "/home/mailman/Mailman/Queue/Runner.py", line 111, in _oneloop
~ self._onefile(msg, msgdata)
~ File "/home/mailman/Mailman/Queue/Runner.py", line 167, in _onefile
~ keepqueued = self._dispose(mlist, msg, msgdata)
~ File "/home/mailman/Mailman/Queue/IncomingRunner.py", line 130, in
_dispose
~ more = self._dopipeline(mlist, msg, msgdata, pipeline)
~ File "/home/mailman/Mailman/Queue/IncomingRunner.py", line 153, in
_dopipeline
~ sys.modules[modname].process(mlist, msg, msgdata)
~ File "/home/mailman/Mailman/Handlers/Moderate.py", line 109, in process
~ Hold.hold_for_approval(mlist, msg, msgdata, Hold.NonMemberPost)
~ File "/home/mailman/Mailman/Handlers/Hold.py", line 218, in
hold_for_approval
~ id = mlist.HoldMessage(msg, reason, msgdata)
~ File "/home/mailman/Mailman/ListAdmin.py", line 186, in HoldMessage
~ self.__opendb()
~ File "/home/mailman/Mailman/ListAdmin.py", line 86, in __opendb
~ self.__db = cPickle.load(fp)
UnpicklingError: invalid load key, '^_'.

Aug 10 22:56:54 2008 (23548) SHUNTING:
1218409013.986546+40eb47837328b00ded0f971e044058c916c0e0c3

=====

If I run "check_db --all", I get no error messages. If I run "check_db
- --all --verbose", the only errors I get are about missing config.db and
config.db.last files, which I believe are not required.

Is there an easy way to add some debugging statements to the
Mailman/ListAdmin.py code to see what file (assuming it is one of the
.pck ones) is generating the error message?

Any suggestions as to how to determine what file might be corrupt and
hopefully how to fix it?

Thanks,
Dave

Search Discussions

  • Mark Sapiro at Aug 11, 2008 at 2:10 am

    David Goldsmith wrote:
    The issue I have is that at least one list is not automatically updating
    its archive when new posts come in. I have verified by sending posts to
    other lists that their archives are updated at that time.

    If I run "arch --wipe <list> <path to list mbox file>", the list archive
    is re-built successfully and includes the latest posts that have come
    in. New posts to this list are being appended to the .mbox file.

    Here is the section of error messages that are showing up in the
    "logs/error" file:

    =====

    Aug 10 22:56:54 2008 (23548) Uncaught runner exception: invalid load
    key, '^_'.
    Aug 10 22:56:54 2008 (23548) Traceback (most recent call last):
    ~ File "/home/mailman/Mailman/Queue/Runner.py", line 111, in _oneloop
    ~ self._onefile(msg, msgdata)
    ~ File "/home/mailman/Mailman/Queue/Runner.py", line 167, in _onefile
    ~ keepqueued = self._dispose(mlist, msg, msgdata)
    ~ File "/home/mailman/Mailman/Queue/IncomingRunner.py", line 130, in
    _dispose
    ~ more = self._dopipeline(mlist, msg, msgdata, pipeline)
    ~ File "/home/mailman/Mailman/Queue/IncomingRunner.py", line 153, in
    _dopipeline
    ~ sys.modules[modname].process(mlist, msg, msgdata)
    ~ File "/home/mailman/Mailman/Handlers/Moderate.py", line 109, in process
    ~ Hold.hold_for_approval(mlist, msg, msgdata, Hold.NonMemberPost)
    ~ File "/home/mailman/Mailman/Handlers/Hold.py", line 218, in
    hold_for_approval
    ~ id = mlist.HoldMessage(msg, reason, msgdata)
    ~ File "/home/mailman/Mailman/ListAdmin.py", line 186, in HoldMessage
    ~ self.__opendb()
    ~ File "/home/mailman/Mailman/ListAdmin.py", line 86, in __opendb
    ~ self.__db = cPickle.load(fp)
    UnpicklingError: invalid load key, '^_'.

    Aug 10 22:56:54 2008 (23548) SHUNTING:
    1218409013.986546+40eb47837328b00ded0f971e044058c916c0e0c3

    =====

    I don't think this is the "not archiving" issue. This is a post that
    was going to be held for the moderator because it was from a
    non-member and got shunted instead. It wouldn't have gotten to the
    listname.mbox file.

    If I run "check_db --all", I get no error messages. If I run "check_db
    - --all --verbose", the only errors I get are about missing config.db and
    config.db.last files, which I believe are not required.

    Correct. Also, check_db only checks config.* files which aren't the
    problem here.

    Is there an easy way to add some debugging statements to the
    Mailman/ListAdmin.py code to see what file (assuming it is one of the
    .pck ones) is generating the error message?

    Any suggestions as to how to determine what file might be corrupt and
    hopefully how to fix it?

    UTSL

    The corrupt file is lists/listname/request.pck - the file that holds
    the outstanding moderator requests. You should also be seeing some
    error if you go to the admindb interface for this list. If you just
    remove the request.pck, that will fix this problem, but I doubt it
    will fix the non-archiving problem. That sounds like a corrupt archive
    database, but bin/arch --wipe should fix that.

    BTW, the list that has the corrupt request.pck probably isn't the one
    that's not archiving. Do

    bin/show_qfiles
    qfiles/shunt/1218409013.986546+40eb47837328b00ded0f971e044058c916c0e0c3.pck

    to see the shunted post which should indicate (in To:) what list it's
    for. If it's To: more than one list, use bin/dumpdb instead to see the
    listname in the metadata.

    There may be several of these shunted posts. After you remove the bad
    request.pck, you can run bin/unshunt to reprocess them. Before running
    unshunt it is always a good idea to look at all the files in
    qfiles/shunt to make sure they are all current and relevant and remove
    the ones that aren't.

    Anyway, after you take care of all this, if you still have the archive
    issue, check the error log for errors relating to that (from
    ArchRunner) and post a traceback if you can't figure it out (it may be
    a permissions issue).

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • David Goldsmith at Aug 11, 2008 at 6:19 pm

    Mark Sapiro wrote:
    The corrupt file is lists/listname/request.pck - the file that holds
    the outstanding moderator requests. You should also be seeing some
    error if you go to the admindb interface for this list. If you just
    remove the request.pck, that will fix this problem, but I doubt it
    will fix the non-archiving problem. That sounds like a corrupt archive
    database, but bin/arch --wipe should fix that.
    The request.pck file has been removed.
    BTW, the list that has the corrupt request.pck probably isn't the one
    that's not archiving. Do

    bin/show_qfiles
    qfiles/shunt/1218409013.986546+40eb47837328b00ded0f971e044058c916c0e0c3.pck
    to see the shunted post which should indicate (in To:) what list it's
    for. If it's To: more than one list, use bin/dumpdb instead to see the
    listname in the metadata.

    There may be several of these shunted posts. After you remove the bad
    request.pck, you can run bin/unshunt to reprocess them. Before running
    unshunt it is always a good idea to look at all the files in
    qfiles/shunt to make sure they are all current and relevant and remove
    the ones that aren't.
    Yikes. Didn't realize that Mailman was maintaining a queue like that.
    Nuked all of the really old messages, cleaned out the spam, ran
    "unshunt" and got various errors due to permissions. Ran "check_perms
    - -v" and "check_perms -f" to correct that. Re-ran "unshunt" and most of
    the messages were picked up and delivered to the lists.
    Anyway, after you take care of all this, if you still have the archive
    issue, check the error log for errors relating to that (from
    ArchRunner) and post a traceback if you can't figure it out (it may be
    a permissions issue).
    The list with problems has an updated archive now. I'll wait and see
    what happens when the next message comes in. Looking at the
    "logs/error" file, I did get some errors when I unshunted the files.
    Turns out that one of the other lists had a corrupt pipermail.pck file.
    ~ After nuking and rebuilding that lists archives, I think things are
    running as they should be right now.

    Thanks for the info,
    Dave

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedAug 11, '08 at 12:34a
activeAug 11, '08 at 6:19p
posts3
users2
websitelist.org

2 users in discussion

David Goldsmith: 2 posts Mark Sapiro: 1 post

People

Translate

site design / logo © 2022 Grokbase