FAQ
Dean Rasheed wrote:
...
So the current code in escape_yaml() is inadequate for producing valid
YAML. I think it would have to also consider at least the following
characters as special "-" ":" "[" "]" "{" "}" "," "\"" "'"
"|" "*" "&". Technically, it would also need to trap empty strings,
and strings with leading or trailing whitespace.

Making escape_yaml() completely bulletproof with this approach would
be quite difficult, and (IMO) not worth the effort
...

Doesn't seem like a lot of effort to me. You've already laid out most of
the exceptions above, although they require a few tweaks.
The rules should be:

Requires quoting only if the first character:
& * ! | > ' " % @ ` #

Same as above, but no quoting if the second character is "safe":
- ? :

Always requires quoting:
":<space>" "<space>#" aka ': ' ' #'

Always requires quoting:
, [ ] { }

Always require quoting:
(leading space) (trailing space) (empty string)

See:
http://yaml.org/spec/1.2/spec.html section 5.3 and 7.3.3


- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201006070943
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

Search Discussions

  • Tom Lane at Jun 7, 2010 at 2:18 pm

    "Greg Sabino Mullane" <greg@turnstep.com> writes:
    The rules should be:
    Requires quoting only if the first character:
    & * ! | > ' " % @ ` #
    Same as above, but no quoting if the second character is "safe":
    - ? :
    Always requires quoting:
    ":<space>" "<space>#" aka ': ' ' #'
    Always requires quoting:
    , [ ] { }
    Always require quoting:
    (leading space) (trailing space) (empty string)
    Egad ... this is supposed to be an easily machine-generatable format?

    If it's really as broken as the above suggests, I think we should
    rip it out while we still can.

    regards, tom lane
  • Greg Sabino Mullane at Jun 7, 2010 at 2:37 pm
    Tom Lane wrote:
    ...
    Egad ... this is supposed to be an easily machine-generatable format?

    If it's really as broken as the above suggests, I think we should
    rip it out while we still can.
    Heh ... not like you to shrink from a challenge. ;)

    I don't think the above would be particularly hard to implement myself,
    but if it becomes a really big deal, we can certainly punt by simply
    quoting anything containing an indicator (the special characters above).
    It will still be 100% valid YAML, just with some excess quoting for the
    very rare case when a value contains one of the special characters.

    - --
    Greg Sabino Mullane greg@turnstep.com
    End Point Corporation http://www.endpoint.com/
    PGP Key: 0x14964AC8 201006071035
    http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
  • Robert Haas at Jun 7, 2010 at 2:49 pm

    On Mon, Jun 7, 2010 at 10:37 AM, Greg Sabino Mullane wrote:
    Tom Lane wrote:
    I don't think the above would be particularly hard to implement myself,
    but if it becomes a really big deal, we can certainly punt by simply
    quoting anything containing an indicator (the special characters above).
    It will still be 100% valid YAML, just with some excess quoting for the
    very rare case when a value contains one of the special characters.
    Since you're the main advocate of this feature, I think you should
    implement it rather than leaving it to Tom or I.

    The reason why I was initially skeptical of adding a YAML output
    format is that JSON is a subset of YAML. Therefore, the JSON output
    format ought to be perfectly sufficient for anyone using a YAML
    parser. If it's not, that's because their YAML processor is broken,
    and they should get a new one, or because the YAML spec is defective.
    The YAML format got voted in by consensus because people thought that
    it would also make a nice alternative to the text format for human
    readable output. I don't believe that (it uses way too much vertical
    space) but even if you accept the argument, the more we make the YAML
    format look like the JSON format, the less water that argument holds.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Andrew Dunstan at Jun 7, 2010 at 2:54 pm

    Robert Haas wrote:
    On Mon, Jun 7, 2010 at 10:37 AM, Greg Sabino Mullane wrote:

    Tom Lane wrote:
    I don't think the above would be particularly hard to implement myself,
    but if it becomes a really big deal, we can certainly punt by simply
    quoting anything containing an indicator (the special characters above).
    It will still be 100% valid YAML, just with some excess quoting for the
    very rare case when a value contains one of the special characters.
    Since you're the main advocate of this feature, I think you should
    implement it rather than leaving it to Tom or I.
    Or anyone else :-)
    The reason why I was initially skeptical of adding a YAML output
    format is that JSON is a subset of YAML. Therefore, the JSON output
    format ought to be perfectly sufficient for anyone using a YAML
    parser.
    There is some debate on this point, IIRC.

    cheers

    andrew
  • Tom Lane at Jun 7, 2010 at 2:56 pm

    "Greg Sabino Mullane" <greg@turnstep.com> writes:
    I don't think the above would be particularly hard to implement myself,
    but if it becomes a really big deal, we can certainly punt by simply
    quoting anything containing an indicator (the special characters above).
    I would go with that. The quoting rules you proposed previously seem
    way too complicated --- meaning potentially buggy, and even if they're
    not buggy, the behavior would seem unpredictable to most users.

    regards, tom lane
  • Dean Rasheed at Jun 7, 2010 at 3:18 pm

    On 7 June 2010 15:56, Tom Lane wrote:
    "Greg Sabino Mullane" <greg@turnstep.com> writes:
    I don't think the above would be particularly hard to implement myself,
    but if it becomes a really big deal, we can certainly punt by simply
    quoting anything containing an indicator (the special characters above).
    I would go with that.  The quoting rules you proposed previously seem
    way too complicated --- meaning potentially buggy, and even if they're
    not buggy, the behavior would seem unpredictable to most users.
    Well actually it's not just everything containing a special character,
    it's also anything with leading or trailing whitespace, and empty
    strings (not sure that can ever happen in practice).

    It's because of the potential for bugs in this area, that I'd propose
    just quoting everything (except numeric values) as in my original
    patch.

    Regards,
    Dean
  • Josh Berkus at Jun 7, 2010 at 6:26 pm

    It's because of the potential for bugs in this area, that I'd propose
    just quoting everything (except numeric values) as in my original
    patch.
    I don't see a problem with this.

    I supported YAML output because I find it easier to read and copy&paste
    than the other outputs. This is still the case even with quoting. And
    it's not exactly a hugely intrusive patch.

    --
    -- Josh Berkus
    PostgreSQL Experts Inc.
    http://www.pgexperts.com
  • Florian Weimer at Jun 7, 2010 at 4:39 pm

    * Tom Lane:

    Egad ... this is supposed to be an easily machine-generatable format?
    Perhaps you could surround all strings with "" in the generator, and
    escape all potentially special characters (which seems to include some
    whitespace even in quoted strings, unfortunately)?

    It has been claimed before that YAML is a superset of JSON, so why
    can't the YAML folks use the existing JSON output instead?

    --
    Florian Weimer <fweimer@bfk.de>
    BFK edv-consulting GmbH http://www.bfk.de/
    Kriegsstraße 100 tel: +49-721-96201-1
    D-76133 Karlsruhe fax: +49-721-96201-99
  • Greg Smith at Jun 7, 2010 at 5:34 pm

    Florian Weimer wrote:
    It has been claimed before that YAML is a superset of JSON, so why
    can't the YAML folks use the existing JSON output instead?
    Because JSON just crosses the line where it feels like there's so much
    markup that people expect a tool is necessary to read it, which has
    always been the issue with XML too--bad human readability. I was on the
    fence about YAML until I used it for a client issue over the weekend. I
    was able to hack together a quick tool to work on the issue that parsed
    enough YAML *without using an external library* well enough for my
    purposes in an hour, one that was still far more robust than a similar
    hack trying to read plain old text format for EXPLAIN. And the client
    was able to follow what was going on as I passed YAML output back and
    forth with them. Just having every field labeled clearly cut off all
    the usual "which of these is the startup cost again?" questions I'm used
    to getting.

    The complaints about YAML taking up too much vertical space are
    understandable, but completely opposite of what I care about. I can
    e-mail a customer a YAML plan and it will survive to the other side and
    even in a reply back to me. Whereas any non-trivial text format one is
    guaranteed to utterly destroyed by line wrapping along the way.

    I think this thread could use a fresh example to remind anyone who
    hasn't played with the curent YAML format what it looks like. Here's
    one from a query against the Dell Store 2 database:

    EXPLAIN SELECT * FROM customers WHERE customerid>1000 ORDER BY zip;
    QUERY PLAN
    ----------
    Sort (cost=4449.30..4496.80 rows=19000 width=268)
    Sort Key: zip
    -> Seq Scan on customers (cost=0.00..726.00 rows=19000 width=268)
    Filter: (customerid > 1000)

    EXPLAIN (FORMAT YAML) SELECT * FROM customers WHERE customerid>1000
    ORDER BY zip;
    QUERY PLAN
    -------------------------------------
    - Plan: +
    Node Type: Sort +
    Startup Cost: 4449.30 +
    Total Cost: 4496.80 +
    Plan Rows: 19000 +
    Plan Width: 268 +
    Sort Key: +
    - zip +
    Plans: +
    - Node Type: Seq Scan +
    Parent Relationship: Outer +
    Relation Name: customers +
    Alias: customers +
    Startup Cost: 0.00 +
    Total Cost: 726.00 +
    Plan Rows: 19000 +
    Plan Width: 268 +
    Filter: (customerid > 1000)

    --
    Greg Smith 2ndQuadrant US Baltimore, MD
    PostgreSQL Training, Services and Support
    greg@2ndQuadrant.com www.2ndQuadrant.us
  • Tom Lane at Jun 7, 2010 at 5:44 pm

    Greg Smith writes:
    The complaints about YAML taking up too much vertical space are
    understandable, but completely opposite of what I care about. I can
    e-mail a customer a YAML plan and it will survive to the other side and
    even in a reply back to me. Whereas any non-trivial text format one is
    guaranteed to utterly destroyed by line wrapping along the way.
    I think this thread could use a fresh example to remind anyone who
    hasn't played with the curent YAML format what it looks like.
    So? This doesn't look amazingly unlike the current JSON output,
    and to the extent that we have to add more quoting to it, it's
    going to look even more like the JSON output.

    Given the lack of any field separators other than newlines, I'm also
    finding myself extremely doubtful about the claim that it survives
    line-wrapping mutilations well. For instance this bit:

    - Node Type: Seq Scan
    Parent Relationship: Outer

    doesn't appear to have anything but whitespace to distinguish it from

    - Node Type: Seq Scan Parent
    Relationship: Outer

    regards, tom lane
  • Greg Smith at Jun 7, 2010 at 6:39 pm

    Tom Lane wrote:
    This doesn't look amazingly unlike the current JSON output,
    and to the extent that we have to add more quoting to it, it's
    going to look even more like the JSON output.
    I don't know about that; here's the JSON one:

    EXPLAIN (FORMAT JSON) SELECT * FROM customers WHERE customerid>1000
    ORDER BY zip;
    QUERY PLAN
    -------------------------------------------
    [ +
    { +
    "Plan": { +
    "Node Type": "Sort", +
    "Startup Cost": 4449.30, +
    "Total Cost": 4496.80, +
    "Plan Rows": 19000, +
    "Plan Width": 268, +
    "Sort Key": ["zip"], +
    "Plans": [ +
    { +
    "Node Type": "Seq Scan", +
    "Parent Relationship": "Outer",+
    "Relation Name": "customers", +
    "Alias": "customers", +
    "Startup Cost": 0.00, +
    "Total Cost": 726.00, +
    "Plan Rows": 19000, +
    "Plan Width": 268, +
    "Filter": "(customerid > 1000)"+
    } +
    ] +
    } +
    } +
    ]

    From the perspective of how that's less useful as a human form of
    output, it's longer, wider, and has redundant punctuation that gets in
    the way.

    I think that YAML quoting will need to respect one of the special cases
    to keep from ruining its readability: "Requires quoting only if the
    first character" for " will make its current format look terrible if
    that rule is applied to the whole line instead. That sounds like a
    necessary special case to include: don't quote any quote characters
    that appear unless they're the first character on the line. Everything
    else could switch back to really aggressive quoting in every spot and
    that wouldn't hurt the readability of the format very much IMHO.
    Given the lack of any field separators other than newlines, I'm also
    finding myself extremely doubtful about the claim that it survives
    line-wrapping mutilations well.
    All I was claiming there is that the output is dramatically less wide
    than the standard text format of the same plan, and therefore far less
    likely to get nailed by a mail client that wraps at normal line widths.
    Agreed that once wrapping does occur, it has serious problems too.

    Here are the stats for this plan, leaving off the QUERY PLAN header from
    each:

    TEXT: 4 vertical, 69 horizontal
    YAML: 18 vertical, 36 horizontal
    JSON: 25 vertical, 43 horizontal
    XML[1]: 27 vertical, 60 horizontal

    Quote the TEXT line with "> " or get a plan with one more line of
    intendation, and you're likely to get wrapped badly at the 72 character
    line limit some clients use. Quite a bit more headroom before the YAML
    format will wrap like that; JSON is in the middle.

    I now see plenty of use for YAML when exchanging plans over e-mail, and
    it's a bonus that should survive that format to be parseable on the
    other side. JSON and XML are certainly the preferred way to feed plans
    into analysis tools. unambiguously.

    [1] Might as well make this a complete example:

    <explain xmlns="http://www.postgresql.org/2009/explain"> +
    <Query> +
    <Plan> +
    <Node-Type>Sort</Node-Type> +
    <Startup-Cost>4449.30</Startup-Cost> +
    <Total-Cost>4496.80</Total-Cost> +
    <Plan-Rows>19000</Plan-Rows> +
    <Plan-Width>268</Plan-Width> +
    <Sort-Key> +
    <Item>zip</Item> +
    </Sort-Key> +
    <Plans> +
    <Plan> +
    <Node-Type>Seq Scan</Node-Type> +
    <Parent-Relationship>Outer</Parent-Relationship>+
    <Relation-Name>customers</Relation-Name> +
    <Alias>customers</Alias> +
    <Startup-Cost>0.00</Startup-Cost> +
    <Total-Cost>726.00</Total-Cost> +
    <Plan-Rows>19000</Plan-Rows> +
    <Plan-Width>268</Plan-Width> +
    <Filter>(customerid &gt; 1000)</Filter> +
    </Plan> +
    </Plans> +
    </Plan> +
    </Query> +
    </explain>

    --
    Greg Smith 2ndQuadrant US Baltimore, MD
    PostgreSQL Training, Services and Support
    greg@2ndQuadrant.com www.2ndQuadrant.us
  • Florian Weimer at Jun 8, 2010 at 8:01 am

    * Greg Smith:

    Florian Weimer wrote:
    It has been claimed before that YAML is a superset of JSON, so why
    can't the YAML folks use the existing JSON output instead?
    Because JSON just crosses the line where it feels like there's so much
    markup that people expect a tool is necessary to read it, which has
    always been the issue with XML too--bad human readability.
    But YAML is not human-readable. There are human-readable subsets of
    it, but the general serializers do not produce them, and specific
    serializers are difficult to get right (as we've seen).
    EXPLAIN (FORMAT YAML) SELECT * FROM customers WHERE customerid>1000
    ORDER BY zip;
    QUERY PLAN
    -------------------------------------
    - Plan: +
    Node Type: Sort +
    Startup Cost: 4449.30 +
    Total Cost: 4496.80 +
    Plan Rows: 19000 +
    Plan Width: 268 +
    Sort Key: +
    - zip +
    Plans: +
    - Node Type: Seq Scan +
    Parent Relationship: Outer +
    Relation Name: customers +
    Alias: customers +
    Startup Cost: 0.00 +
    Total Cost: 726.00 +
    Plan Rows: 19000 +
    Plan Width: 268 +
    Filter: (customerid > 1000)
    What does your parser do with this (equivalent but shorter) YAML
    output?

    - Plan: !!map
    &0 Node Type: Sort
    &1 Startup Cost: 4449.30
    &2 Total Cost: 4496.80
    &3 Plan Rows: &5 19000
    &4 Plan Width: &6 268
    Sort Key: ["zip"]
    Plans: !!seq
    - *0: Seq Scan
    Parent Relationship: Outer
    Relation Name: &7 customers
    Alias: *7
    *1: 0.00
    *2: 726.00
    *3: *5
    *4: *6
    Filter: (customerid > 1000)

    Looking at the spec, it's rather difficult to come up with a readable
    subset which can parsed easily and is general in the sense that it can
    express empty strings, strings with embedded newlines, and so on.
    YAML's rules for dealing with whitespace are fairly complex, but are
    probably needed to get a more compact notation than JSON.

    --
    Florian Weimer <fweimer@bfk.de>
    BFK edv-consulting GmbH http://www.bfk.de/
    Kriegsstraße 100 tel: +49-721-96201-1
    D-76133 Karlsruhe fax: +49-721-96201-99
  • Greg Sabino Mullane at Jun 8, 2010 at 1:37 pm

    But YAML is not human-readable. There are human-readable subsets of
    it, but the general serializers do not produce them, and specific
    serializers are difficult to get right (as we've seen).
    No, it *is* human readable. Indeed, that's one of the things that
    differentiates it from JSON: readability is the main goal, whereas
    JSON's goals are different. The readablity necessarily makes
    the parsing rules more complex, but that's the implicit tradeoff.
    (Did you miss the part where the other Greg is sending explain
    plans via email?)
    What does your parser do with this (equivalent but shorter)
    YAML output?

    - Plan: !!map
    &0 Node Type: Sort
    &1 Startup Cost: 4449.30
    &2 Total Cost: 4496.80
    &3 Plan Rows: &5 19000
    &4 Plan Width: &6 268
    Sort Key: ["zip"]
    Plans: !!seq
    - *0: Seq Scan
    Parent Relationship: Outer
    Relation Name: &7 customers
    Alias: *7
    *1: 0.00
    *2: 726.00
    *3: *5
    *4: *6
    Filter: (customerid > 1000)
    But we're not using alias nodes (nor would we ever want to), so I'm not
    sure what the point of your contrived example is. That's shorter, but
    certainly not easier to read by human /or/ machine.
    Looking at the spec, it's rather difficult to come up with a readable
    subset which can parsed easily and is general in the sense that it can
    express empty strings, strings with embedded newlines, and so on.
    YAML's rules for dealing with whitespace are fairly complex, but are
    probably needed to get a more compact notation than JSON.
    I'll state that both embedded newlines and column names and values with
    funny characters like '*' and '|' are rare events, and the great majority
    of things you'll see in an explain plan are plain ol' ASCII, in which
    YAML produces a very good representation. But you are right that we need
    to make sure we are handling the whitespace correctly.

    When I get some free time, I'll make a patch to implement as much of
    the spec as we sanely can. As I said before, I don't think we need to
    strive for putting everything we possibly can into "plain scalar"
    objects, as we can cover 99% of the cases easy enough and fall back to
    'when in doubt, quote' for the rest.

    - --
    Greg Sabino Mullane greg@turnstep.com
    End Point Corporation http://www.endpoint.com/
    PGP Key: 0x14964AC8 201006080931
    http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
  • Robert Haas at Jun 9, 2010 at 1:51 am

    On Tue, Jun 8, 2010 at 9:37 AM, Greg Sabino Mullane wrote:
    When I get some free time, I'll make a patch to implement as much of
    the spec as we sanely can.
    Saying that you'll fix it but not on any particular timetable is
    basically equivalent to saying that you're not willing to fix it at
    all. We are trying to get a release out the door. I'm not trying to
    be rude, but it's frustrating to me when people object to having their
    code ripped out but also won't commit to getting it fixed in a timely
    fashion.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Greg Sabino Mullane at Jun 9, 2010 at 3:57 pm

    Robert Haas wrote:
    When I get some free time, I'll make a patch to implement as
    much of the spec as we sanely can.
    Saying that you'll fix it but not on any particular timetable is
    basically equivalent to saying that you're not willing to fix it at
    all.
    It's not equivalent at all. If I wasn't willing to fix it all,
    I'd say so.
    We are trying to get a release out the door. I'm not trying to
    be rude, but it's frustrating to me when people object to having their
    code ripped out but also won't commit to getting it fixed in a timely
    fashion.
    You might not be trying, but you are coming across as quite rude. The
    bug was only reported Monday morning, and you are yelling at me
    on a Tuesday night for not being willing to drop everything I'm doing
    and fix it right now? Yes, we're heading towards 9.0 and yes, I'd
    sure hate to see YAML ripped out (especially now that it's been
    listed near and far as one of our new features), but I've got bills
    to pay and writing a patch is a volunteer effort for me.

    Since you seem so keen on telling other people what they should be
    doing, here's some of your own medicine: why not focus on something
    other than YAML, which myself and many other people can write, and
    work more on the 9.0 open issues that your energy and expertise
    would be more suited for?

    - --
    Greg Sabino Mullane greg@turnstep.com
    PGP Key: 0x14964AC8 201006091156
    http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
  • Robert Haas at Jun 9, 2010 at 5:32 pm

    On Wed, Jun 9, 2010 at 11:57 AM, Greg Sabino Mullane wrote:
    The bug was only reported Monday morning, and you are yelling at me
    on a Tuesday night for not being willing to drop everything I'm doing
    and fix it right now?
    I am not saying and have not said that you needed to drop everything
    you were doing and fix it right now. Had you said, "I will get a
    patch for this out this week", that would have been fine with me.
    What I did and do object to is that your commitment to fix it was
    completely open-ended. From reading your email, there's no way for
    someone to know whether you'll get to this in two days or a month, and
    a month, at least in my opinion, is too long, maybe at any time but
    certainly at this point in the release cycle.
    Since you seem so keen on telling other people what they should be
    doing, here's some of your own medicine: why not focus on something
    other than YAML, which myself and many other people can write, and
    work more on the 9.0 open issues that your energy and expertise
    would be more suited for?
    I think this comment is a little snide, but it deserves a serious
    response. I spent most of yesterday afternoon and evening working on
    every open item that I had a clue about, and another two hours this
    morning. Most of the remaining items are either things that I am not
    qualified to fix or for which there are currently patches out for
    comment.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedJun 7, '10 at 2:00p
activeJun 9, '10 at 5:32p
posts17
users8
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase