FAQ
Dear All

While doing some HTML hacking today (yes I do understand
that XML and HTML are not the same thing), I came across
two behaviours of encoding/eml that indicate least one of

* I don't actually understand xml as much as I thought
* I don't understand encoding/xml as much as I'd hoped
* I don't understand HTML as well as I thought
* CASE NIGHTMARE GREEN is upon us

Code at

     http://play.golang.org/p/o4staWriiX

which parses some XML from a string and then encodes it
back to stdout.

(a) If an xmlns=WOSSNAME is present then when encoding
     EVERY element has been given an xmlns=WOSSNAME
     attribute.

     Is this intended? If so, what's the reason? Is it controllable
     and can I switch it off?

(a') How does Encode know what xmlns to (mis?) write? My
     struct Result doesn't have a slot for it and the decoder and
     encoder don't obviously share store. Did I miss some
     documentation about shared state in encoding/xml?

(b) The (documented in encoding/xml Marshal) behaviour
     that all the chardata fields are smooshed up into one lump
     seems strange. It means a structure such as the example

     <p>This is the <i>first</i> verse.</p>

     loses track completely of where the <i> subelement appeared
     within the <p>. Is this an XML thing that all the chardata of
     an element forms one text, and does this also in fact apply to
     HTML but I never knew?

Chris

[The HTML that provoked this discovery was from an External
  Source, in somwhat iffy state, and had been tempered by passing
  it through tidy. Answers reminiscent of "you should write proper
  HTML in the first place!" would be pointmissing.]

--
Chris "allusive" Dollin

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Chris dollin at Nov 30, 2014 at 7:17 pm
    (fx:rereads-golang-dev)

    (Hmm, looks like Roger Peppe has a related fix ...)

    Chris
    On 30 November 2014 at 19:03, chris dollin wrote:
    Dear All

    While doing some HTML hacking today (yes I do understand
    that XML and HTML are not the same thing), I came across
    two behaviours of encoding/eml that indicate least one of

    * I don't actually understand xml as much as I thought
    * I don't understand encoding/xml as much as I'd hoped
    * I don't understand HTML as well as I thought
    * CASE NIGHTMARE GREEN is upon us

    Code at

    http://play.golang.org/p/o4staWriiX

    which parses some XML from a string and then encodes it
    back to stdout.

    (a) If an xmlns=WOSSNAME is present then when encoding
    EVERY element has been given an xmlns=WOSSNAME
    attribute.

    Is this intended? If so, what's the reason? Is it controllable
    and can I switch it off?

    (a') How does Encode know what xmlns to (mis?) write? My
    struct Result doesn't have a slot for it and the decoder and
    encoder don't obviously share store. Did I miss some
    documentation about shared state in encoding/xml?

    (b) The (documented in encoding/xml Marshal) behaviour
    that all the chardata fields are smooshed up into one lump
    seems strange. It means a structure such as the example

    <p>This is the <i>first</i> verse.</p>

    loses track completely of where the <i> subelement appeared
    within the <p>. Is this an XML thing that all the chardata of
    an element forms one text, and does this also in fact apply to
    HTML but I never knew?

    Chris

    [The HTML that provoked this discovery was from an External
    Source, in somwhat iffy state, and had been tempered by passing
    it through tidy. Answers reminiscent of "you should write proper
    HTML in the first place!" would be pointmissing.]

    --
    Chris "allusive" Dollin


    --
    Chris "allusive" Dollin

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Matt Harden at Nov 30, 2014 at 9:38 pm

    On Sun Nov 30 2014 at 1:04:31 PM chris dollin wrote:


    Code at

    http://play.golang.org/p/o4staWriiX

    which parses some XML from a string and then encodes it
    back to stdout.

    (a) If an xmlns=WOSSNAME is present then when encoding
    EVERY element has been given an xmlns=WOSSNAME
    attribute.

    Is this intended?

    Yes.

    If so, what's the reason?


    The encoder uses the XMLName field of type xml.Name to specify the
    namespaces.

    Is it controllable
    and can I switch it off?
    Zero out your XMLName field when encoding.

    (a') How does Encode know what xmlns to (mis?) write? My
    struct Result doesn't have a slot for it and the decoder and
    encoder don't obviously share store. Did I miss some
    documentation about shared state in encoding/xml?
    No shared state. Just your Result value. It's in the XMLName field.

    (b) The (documented in encoding/xml Marshal) behaviour
    that all the chardata fields are smooshed up into one lump
    seems strange. It means a structure such as the example

    <p>This is the <i>first</i> verse.</p>

    loses track completely of where the <i> subelement appeared
    within the <p>. Is this an XML thing that all the chardata of
    an element forms one text, and does this also in fact apply to
    HTML but I never knew?
    It's not so much an XML thing AFAIK. It's just the way Marshal works. It's
    definitely not true that the output XML is equivalent, as HTML, to the
    input.

    You mentioned that you know that XML and HTML are not the same thing. So
    why are you trying to treat HTML as XML? They are not the same thing.

    Maybe you want http://godoc.org/golang.org/x/net/html.

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Chris dollin at Dec 1, 2014 at 9:06 am

    On 30 November 2014 at 21:38, Matt Harden wrote:
    On Sun Nov 30 2014 at 1:04:31 PM chris dollin wrote:

    (a) If an xmlns=WOSSNAME is present then when encoding
    EVERY element has been given an xmlns=WOSSNAME
    attribute.

    Is this intended?
    (etc)
    No shared state. Just your Result value. It's in the XMLName field.
    *headdesk* Thank you, I completely overlooked that
    an xml.Name could be storing information other than
    the element name, and that I could clear it out before
    encoding.
    (b) The (documented in encoding/xml Marshal) behaviour
    that all the chardata fields are smooshed up into one lump
    seems strange. It means a structure such as the example

    <p>This is the <i>first</i> verse.</p>

    loses track completely of where the <i> subelement appeared
    within the <p>. Is this an XML thing that all the chardata of
    an element forms one text, and does this also in fact apply to
    HTML but I never knew?
    It's not so much an XML thing AFAIK. It's just the way Marshal works. It's
    definitely not true that the output XML is equivalent, as HTML, to the
    input.
    I understand it's the war Marshal works but I wonder(ed)
    why. Ignoring the HML origins of the snippet

         <p>This is the <i>first</i> verse.</p>

    is the usual XML interpretation that the <p> has one
    text component and one <i>component? I suppose
    I have some reading to do ...
    You mentioned that you know that XML and HTML are
    not the same thing. So why are you trying to treat HTML
    as XML? They are not the same thing.
    To solve the motivating problem [1] I was intending to
    read the HTML into something, hack as required,
    and then render back to text. The two natural choices
    were to (a) use net/html and learn the API for fiddling
    with HTML trees or (b) render the HTML into XML,
    hack the XML, then render back to something that
    looked like HTML again.

    I guessed that I'd get on better with (b). Until I've tried
    something (a)like, I won't know whether I guessed wrong
    or right ...

    Chris

    [1] "Can you help me clean up this HTML we've got?"

    --
    Chris "allusive" Dollin

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Martin Schnabel at Dec 1, 2014 at 6:47 pm

    On 12/01/2014 10:06 AM, chris dollin wrote:
    You mentioned that you know that XML and HTML are
    not the same thing. So why are you trying to treat HTML
    as XML? They are not the same thing.
    To solve the motivating problem [1] I was intending to
    read the HTML into something, hack as required,
    and then render back to text. The two natural choices
    were to (a) use net/html and learn the API for fiddling
    with HTML trees or (b) render the HTML into XML,
    hack the XML, then render back to something that
    looked like HTML again.
    you might want to look at
    http://godoc.org/code.google.com/p/go-html-transform

    --
    You received this message because you are subscribed to the Google Groups "golang-nuts" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedNov 30, '14 at 7:04p
activeDec 1, '14 at 6:47p
posts5
users3
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase