FAQ
http://codereview.appspot.com/6500075/diff/2002/src/cmd/gc/lex.c
File src/cmd/gc/lex.c (right):

http://codereview.appspot.com/6500075/diff/2002/src/cmd/gc/lex.c#newcode1597
src/cmd/gc/lex.c:1597: if(rune == 0xfeff) {
I thought it's only a BOM if it's at the start of the input. I don't see
the problem with a string literal starting with a BOM, for instance.

http://codereview.appspot.com/6500075/

Search Discussions

  • Rob Pike at Sep 5, 2012 at 2:58 pm
    i'm still thinking about what's right and don't want to make a
    decision yet, but I believe David is correct. My current working
    hypothesis is:

    1. BOM should be legal and discarded if it is the first code point in the file.
    2. BOM should be legal and preserved inside strings both raw and double-quoted.
    3. BOM should be illegal everywhere else.

    It's possible that point 1 should read "preserved".

    This seems to meet what the Unicode consortium asks, codifying idiocy
    with the goal of making progress.

    -rob
  • Russ Cox at Sep 6, 2012 at 12:45 am

    On Wed, Sep 5, 2012 at 10:58 AM, Rob Pike wrote:
    i'm still thinking about what's right and don't want to make a
    decision yet, but I believe David is correct. My current working
    hypothesis is:

    1. BOM should be legal and discarded if it is the first code point in the file.
    2. BOM should be legal and preserved inside strings both raw and double-quoted.
    3. BOM should be illegal everywhere else.

    It's possible that point 1 should read "preserved".
    I assume you're talking about gofmt / gofix. I don't believe it should
    be preserved by those tools. We don't preserve Windows line endings
    either: we assume that there is a single unique encoding of a
    particular AST.

    I do think that if we want to start allowing beginning of file BOMs in
    source code, we need to mention it in the spec. It could be added onto
    this paragraph:

    Implementation restriction: For compatibility with other tools, a
    compiler may
    disallow the NUL character (U+0000) in the source text. A compiler may also
    ignore a leading Unicode byte-order mark (U+FEFF) in a source file.

    Russ
  • Andrew Gerrand at Sep 6, 2012 at 1:15 am

    On 6 September 2012 10:45, Russ Cox wrote:
    On Wed, Sep 5, 2012 at 10:58 AM, Rob Pike wrote:
    i'm still thinking about what's right and don't want to make a
    decision yet, but I believe David is correct. My current working
    hypothesis is:

    1. BOM should be legal and discarded if it is the first code point in the file.
    2. BOM should be legal and preserved inside strings both raw and double-quoted.
    3. BOM should be illegal everywhere else.

    It's possible that point 1 should read "preserved".
    I assume you're talking about gofmt / gofix. I don't believe it should
    be preserved by those tools. We don't preserve Windows line endings
    either: we assume that there is a single unique encoding of a
    particular AST.
    +1 to this. It would be a great benefit for gofmt to strip the BOM.

    Andrew
  • Rob Pike at Sep 6, 2012 at 1:28 am
    Sounds like my proposal has resonance.

    -rob
  • Rob Pike at Sep 6, 2012 at 3:39 am
    I had dinner with Ken and he talked me into a slightly different
    proposal. Rules are changed to discard BOM everywhere outside string
    literals. The argument is that if you're going to toss them at the
    beginning, might as well toss them elsewhere too, so concatenated
    files don't crop up later. (You won't concatenate mid-string literal.)

    We now have two simple rules, both implemented in the lexer:

    1. BOM should be legal and preserved inside strings both raw and double-quoted.
    2. BOM should be discarded everywhere else.

    We believe that these rules mean we won't have to have this design
    discussion again.

    -rob
  • Russ Cox at Sep 6, 2012 at 3:49 am
    If you do the spec change I'll fix the compiler.
  • David Symonds at Sep 6, 2012 at 4:18 am

    On Thu, Sep 6, 2012 at 1:38 PM, Rob Pike wrote:

    1. BOM should be legal and preserved inside strings both raw and double-quoted.
    What about a BOM character literal?
  • Rob Pike at Sep 6, 2012 at 6:28 am
    Rune literal you mean. Yes, that too.

    I'll do the spec tomorrow.

    -rob

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedSep 5, '12 at 4:32a
activeSep 6, '12 at 6:28a
posts9
users4
websitegolang.org

People

Translate

site design / logo © 2022 Grokbase