FAQ
Hi,

I have lots of problems with out-of-office replies. I tried to set up a few
filter rules using 2.1.10. Unfortuantely they don't catch them. Are the
expressions case sensitiv? Are the expressions basic or extended?

What I tried yet:

^subject:.*Accepted.*
^subject:.*Declined.*
^subject:.*is out of office.*

Thanks, Helmut

Search Discussions

  • Mark Sapiro at Nov 26, 2008 at 4:12 am

    Helmut Schneider wrote:
    I have lots of problems with out-of-office replies. I tried to set up a few
    filter rules using 2.1.10. Unfortuantely they don't catch them. Are the
    expressions case sensitiv? Are the expressions basic or extended?

    What I tried yet:

    ^subject:.*Accepted.*
    ^subject:.*Declined.*
    ^subject:.*is out of office.*

    There are two different filters at # Privacy options... ->Spam filters,
    and they work differently.

    The more flexible of the two is header_filter_rules. For
    header_filter_rules the regexps are matched against a multi-line
    string containing all the unfolded headers in the message, both
    message headers and sub-part headers. The regexp is a python regexp
    <http://docs.python.org/library/re.html#regular-expression-syntax> and
    the headers are searched
    <http://docs.python.org/library/re.html#re.search> for a match of the
    regexp in MULTILINE and IGNORECASE mode. This means the '^' matches
    the beginning of the string or the null character immediately
    following a newline and the match is case insensitive. Thus your above
    expressions look good.

    The other is bounce_matching_headers which works differently. It
    expects a header name followed by a colon followed by a regexp to
    match against the contents of that header - e.g.

    subject:is out of office

    would match any subject: header that contained 'is out of office'. This
    match too is case insensitive.

    Also, with bounce_matching_headers, you can't specify an action. The
    action is always 'Hold'.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Helmut Schneider at Nov 26, 2008 at 9:40 am
    ----- Original Message -----
    From: "Mark Sapiro" <mark at msapiro.net>
    To: "Helmut Schneider" <jumper99 at gmx.de>; <mailman-users at python.org>
    Sent: Wednesday, November 26, 2008 5:12 AM
    Subject: Re: [Mailman-Users] privacy options, SPAM, regex
    Helmut Schneider wrote:
    I have lots of problems with out-of-office replies. I tried to set up
    a few filter rules using 2.1.10. Unfortuantely they don't catch them.
    Are the expressions case sensitiv? Are the expressions basic or
    extended?
    What I tried yet:

    ^subject:.*Accepted.*
    ^subject:.*Declined.*
    ^subject:.*is out of office.*

    There are two different filters at # Privacy options... ->Spam filters,
    and they work differently.

    The more flexible of the two is header_filter_rules. For
    header_filter_rules the regexps are matched against a multi-line
    string containing all the unfolded headers in the message, both
    message headers and sub-part headers. The regexp is a python regexp
    <http://docs.python.org/library/re.html#regular-expression-syntax> and
    the headers are searched
    <http://docs.python.org/library/re.html#re.search> for a match of the
    regexp in MULTILINE and IGNORECASE mode. This means the '^' matches
    the beginning of the string or the null character immediately
    following a newline and the match is case insensitive. Thus your above
    expressions look good.
    That's weird. Messages still pass with e.g.

    Subject: [Somelist] Declined: Invitation to workshop on 13rd Dec. 2008

    in the Header. Do I need to escape the colon? Or something else?

    Thanks, Helmut
  • Mark Sapiro at Nov 26, 2008 at 11:21 pm

    Helmut Schneider wrote:

    Mark Sapiro wrote:
    Helmut Schneider wrote:
    I have lots of problems with out-of-office replies. I tried to set up
    a few filter rules using 2.1.10. Unfortuantely they don't catch them.
    Are the expressions case sensitiv? Are the expressions basic or
    extended?
    What I tried yet:

    ^subject:.*Accepted.*
    ^subject:.*Declined.*
    ^subject:.*is out of office.*

    There are two different filters at # Privacy options... ->Spam filters,
    and they work differently.

    The more flexible of the two is header_filter_rules. For
    header_filter_rules the regexps are matched against a multi-line
    string containing all the unfolded headers in the message, both
    message headers and sub-part headers. The regexp is a python regexp
    <http://docs.python.org/library/re.html#regular-expression-syntax> and
    the headers are searched
    <http://docs.python.org/library/re.html#re.search> for a match of the
    regexp in MULTILINE and IGNORECASE mode. This means the '^' matches
    the beginning of the string or the null character immediately
    following a newline and the match is case insensitive. Thus your above
    expressions look good.
    That's weird. Messages still pass with e.g.

    Subject: [Somelist] Declined: Invitation to workshop on 13rd Dec. 2008

    in the Header. Do I need to escape the colon? Or something else?

    I just tested a rule with the three regexps

    ^subject:.*Accepted.*
    ^subject:.*Declined.*
    ^subject:.*is out of office.*

    copied from your post and Action set to Reject, and a message with

    Subject: [Somelist] Declined: Invitation to workshop on 13rd Dec. 2008

    was rejected for matching the rule. Perhaps you didn't set the rule
    action. Note that Action = Defer does not mean defer the post; it
    means defer the rule - i.e. don't enforce it.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Michael Welch at Nov 27, 2008 at 12:09 am
    Hi friends.
    ^subject:.*is out of office.*
    I just added this rule as a "Reject" since we have had a few of these come to the list lately.
    ^subject:.*out of office.*

    Would folks be willing to share the rules they have developed that would apply to general business lists? We have not had any problems with spam, as our list is very tight. This would be for accidental or automated sendings.

    Also, what does the sender receive upon rejection? I am hesitant to test lest something accidentally gets through.

    Is the list owner notified of these spam filter rejections? Are the spam rules applied after testing for list membership?


    - - - - - - - - - - - -
    Michael Welch, volunteer
    Redwood Alliance
    PO Box 293
    Arcata, CA 95518
    707-822-7884
    mwelch at redwoodalliance.org
    www.redwoodalliance.org
  • Mark Sapiro at Nov 27, 2008 at 12:23 am

    Michael Welch wrote:
    ^subject:.*is out of office.*
    I just added this rule as a "Reject" since we have had a few of these come to the list lately.
    ^subject:.*out of office.*

    Would folks be willing to share the rules they have developed that would apply to general business lists? We have not had any problems with spam, as our list is very tight. This would be for accidental or automated sendings.

    Also, what does the sender receive upon rejection? I am hesitant to test lest something accidentally gets through.

    That's what test lists are for ;)

    The post is send back to the poster attached to a message with the
    original subject which says "Message rejected by filter rule match".

    Is the list owner notified of these spam filter rejections?

    No

    Are the spam rules applied after testing for list membership?

    No. in the default pipeline, header_filter_rules are the first thing
    done, even before checking for an Approved: header.

    OTOH, bounce_matching_headers are not checked until after membership
    checks.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Helmut Schneider at Nov 27, 2008 at 9:31 am

    Helmut Schneider wrote:
    I have lots of problems with out-of-office replies. I tried to set up
    a few filter rules using 2.1.10. Unfortuantely they don't catch them.
    Are the expressions case sensitiv? Are the expressions basic or
    extended?
    What I tried yet:

    ^subject:.*Accepted.*
    ^subject:.*Declined.*
    ^subject:.*is out of office.*

    There are two different filters at # Privacy options... ->Spam filters,
    and they work differently.

    The more flexible of the two is header_filter_rules. For
    header_filter_rules the regexps are matched against a multi-line
    string containing all the unfolded headers in the message, both
    message headers and sub-part headers. The regexp is a python regexp
    <http://docs.python.org/library/re.html#regular-expression-syntax> and
    the headers are searched
    <http://docs.python.org/library/re.html#re.search> for a match of the
    regexp in MULTILINE and IGNORECASE mode. This means the '^' matches
    the beginning of the string or the null character immediately
    following a newline and the match is case insensitive. Thus your above
    expressions look good.
    That's weird. Messages still pass with e.g.

    Subject: [Somelist] Declined: Invitation to workshop on 13rd Dec. 2008

    in the Header. Do I need to escape the colon? Or something else?
    Interesting, with "^subject:.*Declined.*"

    Subject: Declined: [Somelist] Invitation to workshop on 13rd Dec. 2008

    matches while

    Subject: [Somelist] Declined: Invitation to workshop on 13rd Dec. 2008

    does not. Huh?!
  • Mark Sapiro at Nov 27, 2008 at 3:09 pm

    Helmut Schneider wrote:
    Interesting, with "^subject:.*Declined.*"

    Subject: Declined: [Somelist] Invitation to workshop on 13rd Dec. 2008

    matches while

    Subject: [Somelist] Declined: Invitation to workshop on 13rd Dec. 2008

    does not. Huh?!

    It turns out that RFC 2047 encoded headers are not decoded before
    matching against the regexps. Is that the issue here? What do the raw
    headers look like?

    I think that the headers should be decoded, but I wonder if people are
    currently working around this with regexps that match encoded headers
    and wouldn't match decoded headers.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan
  • Mark Sapiro at Nov 27, 2008 at 8:20 pm

    Mark Sapiro wrote:
    Helmut Schneider wrote:
    Interesting, with "^subject:.*Declined.*"

    Subject: Declined: [Somelist] Invitation to workshop on 13rd Dec. 2008

    matches while

    Subject: [Somelist] Declined: Invitation to workshop on 13rd Dec. 2008

    does not. Huh?!

    It turns out that RFC 2047 encoded headers are not decoded before
    matching against the regexps. Is that the issue here? What do the raw
    headers look like?

    I think that the headers should be decoded, but I wonder if people are
    currently working around this with regexps that match encoded headers
    and wouldn't match decoded headers.

    I have developed a patch for SpamDetect.py which will decode RFC 2047
    encoded headers. This is somewhat problematic because the decoded
    headers will presumably contain non-ascii characters, and while the
    character sets of the headers are known (and there can be different
    headers or even different parts of a single header encoded in different
    character sets), the character set of the regexps in header_filter_rules
    is not known.

    The patch creates a unicode object containing all the headers unfolded
    and RFC 2047 decoded with one complete header per line and then encodes
    it into the character set of the list's preferred_language, and this
    result is what the regexps will search. As long as the regexps contain
    only ascii and the raw headers contain no non-ascii characters, this
    should give expected results. If the regexps contain non-ascii
    characters or the headers contain non-ascii not RFC 2047 encoded,
    results may be unexpected.

    If in fact, the original issue is due to RFC 2047 encoded headers, try
    the patch and let us know how it works.

    - --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan

    -------------- next part --------------
    An embedded and charset-unspecified text was scrubbed...
    Name: SpamDetect.patch.txt
    URL: <http://mail.python.org/pipermail/mailman-users/attachments/20081127/4332a243/attachment.txt>
  • Helmut Schneider at Nov 28, 2008 at 3:24 pm
    From: "Mark Sapiro" <mark at msapiro.net>
    Mark Sapiro wrote:
    Helmut Schneider wrote:
    Interesting, with "^subject:.*Declined.*"

    Subject: Declined: [Somelist] Invitation to workshop on 13rd Dec. 2008

    matches while

    Subject: [Somelist] Declined: Invitation to workshop on 13rd Dec. 2008

    does not. Huh?!

    It turns out that RFC 2047 encoded headers are not decoded before
    matching against the regexps. Is that the issue here? What do the raw
    headers look like?

    I think that the headers should be decoded, but I wonder if people are
    currently working around this with regexps that match encoded headers
    and wouldn't match decoded headers.

    I have developed a patch for SpamDetect.py which will decode RFC 2047
    encoded headers. This is somewhat problematic because the decoded
    headers will presumably contain non-ascii characters, and while the
    character sets of the headers are known (and there can be different
    headers or even different parts of a single header encoded in different
    character sets), the character set of the regexps in header_filter_rules
    is not known.

    The patch creates a unicode object containing all the headers unfolded
    and RFC 2047 decoded with one complete header per line and then encodes
    it into the character set of the list's preferred_language, and this
    result is what the regexps will search. As long as the regexps contain
    only ascii and the raw headers contain no non-ascii characters, this
    should give expected results. If the regexps contain non-ascii
    characters or the headers contain non-ascii not RFC 2047 encoded,
    results may be unexpected.

    If in fact, the original issue is due to RFC 2047 encoded headers, try
    the patch and let us know how it works.
    As far as I can see this patch works great. As a positive side effect, is it
    possible that this patch also affects uncaught bounces? I recieve lots of
    uncaught bounces now where a SPAM-filter was required before the patch.

    Thanks a lot, Helmut
  • Mark Sapiro at Nov 28, 2008 at 4:02 pm

    Helmut Schneider wrote:
    As far as I can see this patch works great. As a positive side effect, is it
    possible that this patch also affects uncaught bounces? I recieve lots of
    uncaught bounces now where a SPAM-filter was required before the patch.

    No. The patch has absolutely no effect on uncaught bounces. Uncaught
    bounces are messages sent to a LIST-bounces address that are not
    VERPed and are not recognized as DSNs. If spam is sent to a
    LIST-bounces address and makes it to Mailman, it will be an
    unrecognized bounce. SpamDetect.py and header_filter_rules are not
    involved at all in processing mail received at a LIST-bounces address.

    Any change you observed in uncaught bounces is just a coincidence.

    --
    Mark Sapiro <mark at msapiro.net> The highway is for gamblers,
    San Francisco Bay Area, California better use your sense - B. Dylan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmailman-users @
categoriespython
postedNov 25, '08 at 10:55a
activeNov 28, '08 at 4:02p
posts11
users3
websitelist.org

People

Translate

site design / logo © 2022 Grokbase