FAQ
Hello
I wrote a program which was working on python 2.x. I'd like to go for newer
version but I face the problem on how the emails are parsed.
In particular I'd like to extract the significant parts of the headers, but
the query to the servers had turned in to list of bytes.
What could be a method that will parse and return the headers into ascii if
I'll pass the headers as bytes. Even I don't know whether I can pass as they
arrive to the program.

For example if I try:

import poplib.POP3
_pop= poplib.POP3(srvr)
_pop.user(args[1])
_pop.pass_(args[2])

header =_pop.top(nmuid, 0)

This will return a list of bytes string and I don't have idea to process
them in order to have a dictionary containing
'from', 'to', 'cc', 'bcc', 'date', 'subject', 'reply-to', 'message-id'
as keys.

--
goto /dev/null

Search Discussions

  • Steven D'Aprano at Jun 12, 2011 at 11:45 am

    On Sun, 12 Jun 2011 19:20:00 +0800, TheSaint wrote:

    Hello
    I wrote a program which was working on python 2.x. I'd like to go for
    newer version but I face the problem on how the emails are parsed. In
    particular I'd like to extract the significant parts of the headers, but
    the query to the servers had turned in to list of bytes. What could be a
    method that will parse and return the headers into ascii if I'll pass
    the headers as bytes. Even I don't know whether I can pass as they
    arrive to the program.

    For example if I try:

    import poplib.POP3
    _pop= poplib.POP3(srvr)
    _pop.user(args[1])
    _pop.pass_(args[2])

    header =_pop.top(nmuid, 0)

    This will return a list of bytes string and I don't have idea to process
    them in order to have a dictionary containing 'from', 'to', 'cc', 'bcc',
    'date', 'subject', 'reply-to', 'message-id' as keys.
    To parse emails, you should use the email package. It already handles
    bytes and strings.

    Other than that, I'm not entirely sure I understand your problem. In
    general, if you have some bytes, you can decode it into a string by hand:
    header = b'To: python-list at python.org\n'
    s = header.decode('ascii')
    s
    'To: python-list at python.org\n'


    If this is not what you mean, perhaps you should give an example of what
    header looks like, what you hope to get, and a concrete example of how it
    differs in Python 3.


    --
    Steven
  • TheSaint at Jun 12, 2011 at 1:57 pm
    Steven D'Aprano wrote:

    First of all: thanks for the reply
    header =_pop.top(nmuid, 0)
    To parse emails, you should use the email package. It already handles
    bytes and strings.
    I've read several information this afternoon, mostly are leading to errors.
    That could be my ignorance fault :)
    For what I could come over, I decided to write my own code.

    def msg_parser(listOfBytes):
    header={}
    for lin in listOfBytes:
    try: line= lin.decode()
    except UnicodeDecodeError:
    continue
    for key in _FULLhdr:
    if key in line:
    header[key]= line
    continue
    return header

    listOfBytes is the header content, whuch id given by
    libpop.POP3.top(num_msg. how_much), tuple second part.

    However, some line will fail to decode correctly. I can't imagine why emails
    don't comply to a standard.
    Other than that, I'm not entirely sure I understand your problem. In
    general, if you have some bytes, you can decode it into a string by hand:
    I see. I didn't learn a good english yet :P. I'm Italian :)
    header = b'To: python-list at python.org\n'
    s = header.decode('ascii')
    s
    'To: python-list at python.org\n'
    I know this, in case to post the entire massege header and envelope it's not
    applicable.
    The libraries handling emails and their headers seems to me a big confusion
    and I suppose I should take a different smaller approach.

    I'll try to show a header (if content isn't privacy breaker) but as the
    above example the *_pop.top(nmuid, 0)* won't go into your example
    If this is not what you mean, perhaps you should give an example of what
    header looks like
    The difference is that previous version returning text strings and the
    following processes are based on strings manipulations.
    Just to mention, my program reads headers from POP3 or IMAP4 server and
    apply some regex filtering in order to remove unwanted emails from the
    server. All the filters treating IO as ascii string of characters.

    I passed my modules to 2to3 for the conversion to the newer python, but at
    the first run it told that downloaded header is not a string.

    --
    goto /dev/null
  • Nobody at Jun 13, 2011 at 1:46 am

    On Sun, 12 Jun 2011 21:57:38 +0800, TheSaint wrote:

    However, some line will fail to decode correctly. I can't imagine why emails
    don't comply to a standard.
    Any headers should be in ASCII; Non-ASCII characters should be encoded
    using quoted-printable and/or base-64 encoding.

    Any message with non-ASCII characters in the headers can safely be
    discarded as spam (I've never seen this bug in "legitimate" email).
    Many MTAs will simply reject such messages.

    The message body can be in any encoding, or in multiple encodings (e.g.
    for multipart/mixed content), or none (e.g. the body may be binary data
    rather than text).
  • Dan Stromberg at Jun 13, 2011 at 3:29 am

    On Sun, Jun 12, 2011 at 6:46 PM, Nobody wrote:

    Any message with non-ASCII characters in the headers can safely be
    discarded as spam (I've never seen this bug in "legitimate" email).
    Many MTAs will simply reject such messages.

    http://en.wikipedia.org/wiki/Email_address#Internationalization

    It may not yet be in common use, but tossing international e-mails is
    probably not a great policy going forward.

    The reign of ASCII is coming to an end, security concerns about unicode's
    complexity notwithstanding.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://mail.python.org/pipermail/python-list/attachments/20110612/1c0f1439/attachment-0001.html>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppython-list @
categoriespython
postedJun 12, '11 at 11:20a
activeJun 13, '11 at 3:29a
posts5
users4
websitepython.org

People

Translate

site design / logo © 2022 Grokbase