FAQ
I am currently implementing a general fuzzer which I am testing on JSON
implementations. I stumbled upon a difference in Go's implementation which
made me curious. The current JSON RFC https://tools.ietf.org/html/rfc7159
defines some characters for strings which can be escaped with a backslash
instead of using the character's code point. E.g. The line feed character
can be written as "\n" instead of "\u000A". Go's implementation currently
escapes \, ", \n and \r but leaves out \b, \f and \t. I am wondering why?
Is it a performance issue?

I found the following CL https://codereview.appspot.com/4678046/ which
added escaping for \r and \n. \b and \f were intentionally left out with
the reason "no one cares about \f and more people know \b as word boundary
than as backspace". (I agree but it made me wonder why they are defined in
Go's language specification which does also conflict with the mentioned \b
as regex word boundary when used in an interpreted string. Maybe they
should be removed from the spec as well?) \t on the other hand was
mentioned but never included in the patch.

I do not have any common statistics to back this up but if \n and \r are
common enough to be included in encoding/json I suggest that \t should be
added too. Since this would reduce JSON encoding of one \t from 6 bytes to
2 which could be significant for servers who for example send source code.

I also found a mistake in encoding/json/encode.go. encodeState.stringBytes,
which is marked as "keep in sync with encodeState.string", has a comment
"// as well as < and >. The latter are escaped because they" which should
be "// as well as <, > and &. The latter are escaped because they".

Cheers
- Markus

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Russ Cox at Oct 27, 2014 at 11:00 pm

    On Sat, Oct 25, 2014 at 5:28 PM, wrote:

    I am currently implementing a general fuzzer which I am testing on JSON
    implementations. I stumbled upon a difference in Go's implementation which
    made me curious. The current JSON RFC https://tools.ietf.org/html/rfc7159
    defines some characters for strings which can be escaped with a backslash
    instead of using the character's code point. E.g. The line feed character
    can be written as "\n" instead of "\u000A". Go's implementation currently
    escapes \, ", \n and \r but leaves out \b, \f and \t. I am wondering why?
    Is it a performance issue?

    I found the following CL https://codereview.appspot.com/4678046/ which
    added escaping for \r and \n. \b and \f were intentionally left out with
    the reason "no one cares about \f and more people know \b as word boundary
    than as backspace". (I agree but it made me wonder why they are defined in
    Go's language specification which does also conflict with the mentioned \b
    as regex word boundary when used in an interpreted string. Maybe they
    should be removed from the spec as well?) \t on the other hand was
    mentioned but never included in the patch.

    I do not have any common statistics to back this up but if \n and \r are
    common enough to be included in encoding/json I suggest that \t should be
    added too. Since this would reduce JSON encoding of one \t from 6 bytes to
    2 which could be significant for servers who for example send source code.

    I also found a mistake in encoding/json/encode.go.
    encodeState.stringBytes, which is marked as "keep in sync with
    encodeState.string", has a comment "// as well as < and >. The latter are
    escaped because they" which should be "// as well as <, > and &. The latter
    are escaped because they".
    Thanks. I fixed the comment and I added \t. I don't see any discussion of
    \t in the CL you found. I think we all just forgot about it. I left out \b
    and \f because they are obscure, and printing the Unicode code point may
    actually be clearer (you don't have to look up what it means when you see
    it).

    Russ

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Zimmski at Oct 28, 2014 at 2:45 pm
    "bradfitz" just mentioned it as part of his review "If we're doing \n,
    should probably also do \r and \t at least."

    Thanks for the changes. I am wondering if the decision for \b and \f should
    be mentioned in the code? The RFC states that they are alternatives but I
    think that every deviation from the RFC should be mentioned. Like for
    example the mentioning of encoding <, > and &.

    I will send a CL if I find any other problems.
    On Tuesday, October 28, 2014 12:00:25 AM UTC+1, rsc wrote:

    Thanks. I fixed the comment and I added \t. I don't see any discussion of
    \t in the CL you found. I think we all just forgot about it. I left out \b
    and \f because they are obscure, and printing the Unicode code point may
    actually be clearer (you don't have to look up what it means when you see
    it).

    Russ
    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Russ Cox at Oct 28, 2014 at 2:54 pm

    On Tue, Oct 28, 2014 at 10:45 AM, wrote:

    "bradfitz" just mentioned it as part of his review "If we're doing \n,
    should probably also do \r and \t at least."

    Thanks for the changes. I am wondering if the decision for \b and \f
    should be mentioned in the code? The RFC states that they are alternatives
    but I think that every deviation from the RFC should be mentioned. Like for
    example the mentioning of encoding <, > and &.
    thanks, but i'm going to leave them unmentioned. it's not a deviation, just
    an implementation choice. clearly either is allowed.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedOct 25, '14 at 9:28p
activeOct 28, '14 at 2:54p
posts4
users2
websitegolang.org

2 users in discussion

Russ Cox: 2 posts Zimmski: 2 posts

People

Translate

site design / logo © 2022 Grokbase