FAQ
Hi,

I need extract HTML part from MHT file. For that I use all power of
"net/mail", "mime" and "mime/multipart" packages.

------=_NextPart_01CABE02.6F15EB80
...
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="us-ascii"

This is a main concept of my code for that:

mht, err := os.Open("example.mht")
panicIfError(err)

msg, err := mail.ReadMessage(mht)
panicIfError(err)

_, prm, err := mime.ParseMediaType(msg.Header.Get("Content-Type"))
panicIfError(err)
bnd := prm["boundary"]

rdr := multipart.NewReader(msg.Body, bnd)
prt, err := rdr.NextPart()
panicIfError(err)

buf := &bytes.Buffer{}
io.Copy(buf, prt)

err = ioutil.WriteFile("example.html", buf.Bytes(), os.ModePerm)
panicIfError(err)

My troubles:
1. I got HTML source without all line endings
2. Many words are joined in texts

There is 4 words, but I got 1 word only (words are joined):
<p class=3DMsoNormal align=3Dcenter style=3D'text-align:center'><b style=3D=
'mso-bidi-font-weight:
normal'><span lang=3DUK style=3D'mso-ansi-language:UK'>&#1076;&#1083;&#1103;
&#1084;&#1077;&#1076;&#1080;&#1095;&#1085;&#1086;&#1075;&#1086;
&#1079;&#1072;&#1089;&#1090;&#1086;&#1089;&#1091;&#1074;&#1072;&#1085;&#108=
5;&#1103;
&#1087;&#1088;&#1077;&#1087;&#1072;&#1088;&#1072;&#1090;&#1091;</span><o:p>=
</o:p></b></p>

Where I'm wrong? Thanks.

Also I wrote a comment for issue 4411:
https://code.google.com/p/go/issues/detail?id=4411

--
Dmitriy

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Minux at Feb 6, 2013 at 1:02 pm

    On Wed, Feb 6, 2013 at 8:32 PM, Dmitriy Kovalenko wrote:

    Hi,

    I need extract HTML part from MHT file. For that I use all power of
    "net/mail", "mime" and "mime/multipart" packages.

    ------=_NextPart_01CABE02.6F15EB80
    ...
    Content-Transfer-Encoding: quoted-printable
    Content-Type: text/html; charset="us-ascii"

    This is a main concept of my code for that:

    mht, err := os.Open("example.mht")
    panicIfError(err)

    msg, err := mail.ReadMessage(mht)
    panicIfError(err)

    _, prm, err := mime.ParseMediaType(msg.Header.Get("Content-Type"))
    panicIfError(err)
    bnd := prm["boundary"]

    rdr := multipart.NewReader(msg.Body, bnd)
    prt, err := rdr.NextPart()
    panicIfError(err)

    buf := &bytes.Buffer{}
    io.Copy(buf, prt)

    err = ioutil.WriteFile("example.html", buf.Bytes(), os.ModePerm)
    panicIfError(err)

    My troubles:
    1. I got HTML source without all line endings
    2. Many words are joined in texts

    There is 4 words, but I got 1 word only (words are joined):
    <p class=3DMsoNormal align=3Dcenter style=3D'text-align:center'><b
    style=3D=
    'mso-bidi-font-weight:
    normal'><span lang=3DUK
    style=3D'mso-ansi-language:UK'>&#1076;&#1083;&#1103;
    &#1084;&#1077;&#1076;&#1080;&#1095;&#1085;&#1086;&#1075;&#1086;

    &#1079;&#1072;&#1089;&#1090;&#1086;&#1089;&#1091;&#1074;&#1072;&#1085;&#108=
    5;&#1103;

    &#1087;&#1088;&#1077;&#1087;&#1072;&#1088;&#1072;&#1090;&#1091;</span><o:p>=
    </o:p></b></p>

    Where I'm wrong? Thanks.
    which version of Go are you using?

    Note that fix for issue 4411 is only included in the tip version now, so if
    you're using
    a release version of Go, mime/multipart won't decode quoted-printable.

    If you want to install the latest development version of Go (tip), follow
    instructions
    in http://golang.org/doc/install/source, and drop "-u release” from the "hg
    clone"
    command.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedFeb 6, '13 at 12:32p
activeFeb 6, '13 at 1:02p
posts2
users2
websitegolang.org

2 users in discussion

Minux: 1 post Dmitriy Kovalenko: 1 post

People

Translate

site design / logo © 2021 Grokbase