Here is a patch for basic JSON support. It adds only those features:
* Add "json" data type, that is binary-compatible with text.
* Syntax checking on text to JSON conversion.
* json_pretty() -- print JSON tree with indentation.

We have "JSON datatype (WIP) 01" item:
https://commitfest.postgresql.org/action/patch_view?id=351
but it seems too complex for me to apply all of the feature
at once, especially JSON-Path support. So, I'd like to submit
only basic and minimal JSON datatype support at first.

The most different point of my patch is that JSON parser is
reimplemented with flex and bison to make maintenance easier.

Note that the JSON parser accept top-level scalar values
intensionally, because of requirement when we add functions
to extract values from JSON array/object. For example,
CREATE FUNCTION json_to_array(json) RETURNS json[]
might return naked scalar values in the result array.

--
Itagaki Takahiro

Search Discussions

  • David E. Wheeler at Sep 15, 2010 at 4:46 pm

    On Sep 14, 2010, at 7:32 PM, Itagaki Takahiro wrote:

    Here is a patch for basic JSON support. It adds only those features:
    * Add "json" data type, that is binary-compatible with text.
    * Syntax checking on text to JSON conversion.
    * json_pretty() -- print JSON tree with indentation.

    We have "JSON datatype (WIP) 01" item:
    https://commitfest.postgresql.org/action/patch_view?id=351
    but it seems too complex for me to apply all of the feature
    at once, especially JSON-Path support. So, I'd like to submit
    only basic and minimal JSON datatype support at first.
    So should this be added to the commitfest, or replace this one?

    Best,

    David
  • Itagaki Takahiro at Sep 15, 2010 at 11:24 pm

    On Thu, Sep 16, 2010 at 1:45 AM, David E. Wheeler wrote:
    We have "JSON datatype (WIP) 01" item:
    https://commitfest.postgresql.org/action/patch_view?id=351
    but it seems too complex for me to apply all of the feature
    at once, especially JSON-Path support. So, I'd like to submit
    only basic and minimal JSON datatype support at first.
    So should this be added to the commitfest, or replace this one?
    I added my patch, but the previous one will be returned with feedback
    if the author doesn't have a new version adjusted to existing comments.

    --
    Itagaki Takahiro
  • Robert Haas at Sep 16, 2010 at 12:44 am

    On Wed, Sep 15, 2010 at 7:23 PM, Itagaki Takahiro wrote:
    On Thu, Sep 16, 2010 at 1:45 AM, David E. Wheeler wrote:
    We have "JSON datatype (WIP) 01" item:
    https://commitfest.postgresql.org/action/patch_view?id=351
    but it seems too complex for me to apply all of the feature
    at once, especially JSON-Path support. So, I'd like to submit
    only basic and minimal JSON datatype support at first.
    So should this be added to the commitfest, or replace this one?
    I added my patch, but the previous one will be returned with feedback
    if the author doesn't have a new version adjusted to existing comments.
    Did you extract this from his work, or is this completely independent?
    I'm a bit disinclined to say we should just toss overboard all the
    work that's already been done. Why did you write a new patch?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Itagaki Takahiro at Sep 16, 2010 at 1:53 am

    On Thu, Sep 16, 2010 at 9:44 AM, Robert Haas wrote:
    Did you extract this from his work, or is this completely independent?
    I'm a bit disinclined to say we should just toss overboard all the
    work that's already been done.  Why did you write a new patch?
    I read his patch and am inspired by the work, but the code is almost
    rewritten. As my previous comment,
    http://archives.postgresql.org/pgsql-hackers/2010-08/msg01773.php
    I think it's not a good idea to develop JSON datatype based on the
    complex node links. The point of my patch is to simplify it.

    I'd like to encourage use of the simplified structure to implement
    his other works, including JSONPath.

    --
    Itagaki Takahiro
  • Robert Haas at Sep 16, 2010 at 2:16 pm

    On Wed, Sep 15, 2010 at 9:53 PM, Itagaki Takahiro wrote:
    On Thu, Sep 16, 2010 at 9:44 AM, Robert Haas wrote:
    Did you extract this from his work, or is this completely independent?
    I'm a bit disinclined to say we should just toss overboard all the
    work that's already been done.  Why did you write a new patch?
    I read his patch and am inspired by the work, but the code is almost
    rewritten. As my previous comment,
    http://archives.postgresql.org/pgsql-hackers/2010-08/msg01773.php
    I think it's not a good idea to develop JSON datatype based on the
    complex node links. The point of my patch is to simplify it.

    I'd like to encourage use of the simplified structure to implement
    his other works, including JSONPath.
    I think that if we make a habit of rewriting the contributions of
    first-time contributors in toto, we will have fewer second-time
    contributors. I think it would have been a good idea to discuss this
    on the list before you went and did it.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Josh Berkus at Sep 16, 2010 at 6:04 pm

    I think that if we make a habit of rewriting the contributions of
    first-time contributors in toto, we will have fewer second-time
    contributors. I think it would have been a good idea to discuss this
    on the list before you went and did it.
    To be fair to Itagaki-san, he DID ask about the status of the SOC
    project, and didn't get much of an answer.

    --
    -- Josh Berkus
    PostgreSQL Experts Inc.
    http://www.pgexperts.com
  • Joseph Adams at Sep 20, 2010 at 4:38 am

    On Tue, Sep 14, 2010 at 10:32 PM, Itagaki Takahiro wrote:
    Here is a patch for basic JSON support. It adds only those features:
    * Add "json" data type, that is binary-compatible with text.
    * Syntax checking on text to JSON conversion.
    * json_pretty() -- print JSON tree with indentation.

    We have "JSON datatype (WIP) 01" item:
    https://commitfest.postgresql.org/action/patch_view?id=351
    but it seems too complex for me to apply all of the feature
    at once, especially JSON-Path support. So, I'd like to submit
    only basic and minimal JSON datatype support at first.

    The most different point of my patch is that JSON parser is
    reimplemented with flex and bison to make maintenance easier.

    Note that the JSON parser accept top-level scalar values
    intensionally, because of requirement when we add functions
    to extract values from JSON array/object. For example,
    CREATE FUNCTION json_to_array(json) RETURNS json[]
    might return naked scalar values in the result array.

    --
    Itagaki Takahiro


    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers
    I have written a patch that amends the basic_json-20100915.patch . It
    does the following:

    * Fixes bugs in the lexer (namely, ' -0 '::JSON was not accepted, and
    ' """ '::JSON was accepted).
    * Adds a json_validate() function.
    * Adds test strings from original JSON patch along with some new ones.
    * Tweaks the error message for invalid JSON input.
    * Fixes the Makefile so json.c won't fail to build because
    json_parser.h and json_scanner.h haven't been built yet.

    In the lexer, the following regular expression for parsing a number
    was incorrect:

    0|-?([1-9][0-9]*(\.[0-9]+)?|0\.[0-9]+)([Ee][+-]?[0-9]+)?

    This regex doesn't accept '-0', but that's a valid JSON number
    according to the standard.

    Also, the exclusive state <str> didn't have an <<EOF>> rule, causing
    the lexer to treat all characters from an unclosed quote to EOF as
    nothing at all.

    A less important issue with the lexer is that it doesn't allow the
    control character \x7F to appear in strings unescaped. The JSON RFC
    doesn't mention \x7F as a control character, and many implementations
    of JSON and JavaScript don't treat it as one either.

    I went ahead and added json_validate() now because it's useful for
    testing (my test strings use it).

    I made the error message say "ERROR: invalid input syntax for JSON"
    rather than the redundant "ERROR: syntax error in json: syntax
    error". However, the specific parsing error (we haven't defined any
    yet) is simply ignored by json_validate and company.

    As for the Makefile, I think the reason you weren't getting build
    problems but I was is because I normally build with make -j2 (run 2
    processes at once), which tends to test makefile rule correctness.

    Here's one thing I'm worried about: the bison/flex code in your patch
    looks rather similar to the code in
    http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the
    GPL. In particular, the incorrect number regex I discussed above can
    also be found in jsonval verbatim. However, because there are a lot
    of differences in both the bison and flex code now, I'm not sure
    they're close enough to be "copied", but I am not a lawyer. It might
    be a good idea to contact Ben Spencer and ask him for permission to
    license our modified version of the code under PostgreSQL's more
    relaxed license, just to be on the safe side.

    Finally, could you post a patch with your latest work (with or without
    my contribution) so we can tinker with it more? Thanks!


    Joey Adams
  • Itagaki Takahiro at Sep 21, 2010 at 12:38 pm

    On Mon, Sep 20, 2010 at 1:38 PM, Joseph Adams wrote:
    I have written a patch that amends the basic_json-20100915.patch .
    Thanks. I merged your patch and added json_to_array(), as a demonstration
    of json_stringify(). As the current code, json_stringify(json) just returns
    the input text as-is, but json_stringify(json, NULL) trims all of unnecessary
    whitespaces. We could do it in json_in() and json_parse() and always store
    values in compressed representation instead. We leave room for discussion.

    I also merge json_test_strings.sql into the main test file.
    I slimed some tests -- many test cases seem to be duplicated for me.
    I went ahead and added json_validate() now because it's useful for
    testing (my test strings use it).
    Good idea, but how about calling it json_is_well_formed()? We have
    similar name of functions for xml type. I renamed it in the patch.

    Here's one thing I'm worried about: the bison/flex code in your patch
    looks rather similar to the code in
    http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the
    GPL.  In particular, the incorrect number regex I discussed above can
    also be found in jsonval verbatim.  However, because there are a lot
    of differences in both the bison and flex code now,  I'm not sure
    they're close enough to be "copied", but I am not a lawyer.  It might
    be a good idea to contact Ben Spencer and ask him for permission to
    license our modified version of the code under PostgreSQL's more
    relaxed license, just to be on the safe side.
    Sorry for my insincere manner. Surely I read his code.
    Do you know his contact address? I cannot find it...

    --
    Itagaki Takahiro
  • Robert Haas at Sep 21, 2010 at 12:54 pm

    On Tue, Sep 21, 2010 at 8:38 AM, Itagaki Takahiro wrote:
    Sorry for my insincere manner. Surely I read his code.
    Do you know his contact address? I cannot find it...
    It alarms me quite a bit that someone who is a committer on this
    project would accidentally copy code from another project with a
    different license into PostgreSQL. How does that happen? And how
    much got copied, besides the regular expression? I would be inclined
    to flush this patch altogether rather than take ANY risk of GPL
    contamination.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Itagaki Takahiro at Sep 21, 2010 at 1:31 pm

    On Tue, Sep 21, 2010 at 9:54 PM, Robert Haas wrote:
    It alarms me quite a bit that someone who is a committer on this
    project would accidentally copy code from another project with a
    different license into PostgreSQL.  How does that happen?  And how
    much got copied, besides the regular expression?  I would be inclined
    to flush this patch altogether rather than take ANY risk of GPL
    contamination.
    Only regular expressions in the scanner. So I've thought it's OK,
    but I should have been more careful. Sorry.

    --
    Itagaki Takahiro
  • Joseph Adams at Oct 4, 2010 at 6:51 pm

    On Mon, Sep 20, 2010 at 12:38 AM, Joseph Adams wrote:
    Here's one thing I'm worried about: the bison/flex code in your patch
    looks rather similar to the code in
    http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the
    GPL.  In particular, the incorrect number regex I discussed above can
    also be found in jsonval verbatim.  However, because there are a lot
    of differences in both the bison and flex code now,  I'm not sure
    they're close enough to be "copied", but I am not a lawyer.  It might
    be a good idea to contact Ben Spencer and ask him for permission to
    license our modified version of the code under PostgreSQL's more
    relaxed license, just to be on the safe side.
    With the help and motivation of David Fetter, I sent an e-mail to Ben
    Spencer (google "jsonval", check out the git repo, and look in the git
    log to find his e-mail address) a few minutes ago, asking if he would
    license jsonval under a license compatible with PostgreSQL, and am
    currently awaiting a response.

    If he doesn't respond, or outright refuses (which I, for one, doubt
    will happen), my fallback plan is to rewrite the JSON validation code
    by drawing from my original code (meaning it won't be in bison/flex)
    and post a patch for it. Unfortunately, it seems to me that there
    aren't very many ways of expressing a JSON parser in bison/flex, and
    thus the idea of JSON parsing with bison/flex is effectively locked
    down by the GPL unless we can get a more permissive license for
    jsonval. But, I am not a lawyer.
  • Robert Haas at Oct 4, 2010 at 7:06 pm

    On Mon, Oct 4, 2010 at 2:50 PM, Joseph Adams wrote:
    On Mon, Sep 20, 2010 at 12:38 AM, Joseph Adams
    wrote:
    Here's one thing I'm worried about: the bison/flex code in your patch
    looks rather similar to the code in
    http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the
    GPL.  In particular, the incorrect number regex I discussed above can
    also be found in jsonval verbatim.  However, because there are a lot
    of differences in both the bison and flex code now,  I'm not sure
    they're close enough to be "copied", but I am not a lawyer.  It might
    be a good idea to contact Ben Spencer and ask him for permission to
    license our modified version of the code under PostgreSQL's more
    relaxed license, just to be on the safe side.
    With the help and motivation of David Fetter, I sent an e-mail to Ben
    Spencer (google "jsonval", check out the git repo, and look in the git
    log to find his e-mail address) a few minutes ago, asking if he would
    license jsonval under a license compatible with PostgreSQL, and am
    currently awaiting a response.

    If he doesn't respond, or outright refuses (which I, for one, doubt
    will happen), my fallback plan is to rewrite the JSON validation code
    by drawing from my original code (meaning it won't be in bison/flex)
    and post a patch for it.  Unfortunately, it seems to me that there
    aren't very many ways of expressing a JSON parser in bison/flex, and
    thus the idea of JSON parsing with bison/flex is effectively locked
    down by the GPL unless we can get a more permissive license for
    jsonval.  But, I am not a lawyer.
    If someone who hasn't looked at the GPL code sits down and codes
    something up based on the json.org home page, it's hard to imagine how
    anyone could be grumpy about that. But we do need to be at some
    points to make sure that anything we do is truly clean-room, because
    we simply can't have GPL code creeping into our code base.

    Thanks for continuing to pursue this!

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Tom Lane at Oct 4, 2010 at 11:45 pm

    Robert Haas writes:
    On Mon, Oct 4, 2010 at 2:50 PM, Joseph Adams wrote:
    If he doesn't respond, or outright refuses (which I, for one, doubt
    will happen), my fallback plan is to rewrite the JSON validation code
    by drawing from my original code (meaning it won't be in bison/flex)
    and post a patch for it.  Unfortunately, it seems to me that there
    aren't very many ways of expressing a JSON parser in bison/flex, and
    thus the idea of JSON parsing with bison/flex is effectively locked
    down by the GPL unless we can get a more permissive license for
    jsonval.  But, I am not a lawyer.
    If someone who hasn't looked at the GPL code sits down and codes
    something up based on the json.org home page, it's hard to imagine how
    anyone could be grumpy about that.
    Yeah. Joseph seems to be confusing copyrights with patents. The idea
    of "parse JSON with bison/flex" is not patentable by any stretch of the
    imagination.

    But having said that, I wonder whether bison/flex are really the best
    tool for the job in the first place. From what I understand of JSON
    (which admittedly ain't much) a bison parser seems like overkill:
    it'd probably be both bloated and slow compared to a simple handwritten
    recursive-descent parser.

    regards, tom lane
  • Josh Berkus at Oct 5, 2010 at 12:00 am
    All,
    But having said that, I wonder whether bison/flex are really the best
    tool for the job in the first place. From what I understand of JSON
    (which admittedly ain't much) a bison parser seems like overkill:
    it'd probably be both bloated and slow compared to a simple handwritten
    recursive-descent parser.
    This appears not to be necessary. The author of JSONval has indicated
    that, should we choose to include it in PostgreSQL 9.1, he is open to
    re-licensing.

    So on a completely *technical* basis, do we want to use JSONval?

    --
    -- Josh Berkus
    PostgreSQL Experts Inc.
    http://www.pgexperts.com
  • Andrew Dunstan at Oct 5, 2010 at 1:09 am

    On 10/04/2010 08:00 PM, Josh Berkus wrote:
    All,
    But having said that, I wonder whether bison/flex are really the best
    tool for the job in the first place. From what I understand of JSON
    (which admittedly ain't much) a bison parser seems like overkill:
    it'd probably be both bloated and slow compared to a simple handwritten
    recursive-descent parser.
    This appears not to be necessary. The author of JSONval has indicated
    that, should we choose to include it in PostgreSQL 9.1, he is open to
    re-licensing.

    So on a completely *technical* basis, do we want to use JSONval?
    I agree with Tom that a hand-cut RD parser is much more likely to be the
    way to go. We should not use bison/flex for parsing data values.

    cheers

    andrew
  • Joseph Adams at Oct 5, 2010 at 2:30 am

    On Mon, Oct 4, 2010 at 7:45 PM, Tom Lane wrote:
    Yeah.  Joseph seems to be confusing copyrights with patents.  The idea
    of "parse JSON with bison/flex" is not patentable by any stretch of the
    imagination.
    What I meant is, anyone who sets out to write a JSON parser with
    bison/flex is probably going to end up with something very similar to
    jsonval even without looking at it.
    But having said that, I wonder whether bison/flex are really the best
    tool for the job in the first place.  From what I understand of JSON
    (which admittedly ain't much) a bison parser seems like overkill:
    it'd probably be both bloated and slow compared to a simple handwritten
    recursive-descent parser.
    I agree, and also because JSON isn't a moving target. However, in my
    opinion, the differences in quality between a bison parser and a
    handwritten parser are to small to be a big deal right now.

    A bison parser is probably slower and bigger than a simple
    hand-written one, but I haven't benchmarked it, and it's probably not
    worth benchmarking at this point.

    In correctness, I suspect a bison parser and a hand-written one to be
    about the same. Both pass the same army of testcases I built up while
    writing my original JSON patch. Granted, there could be weird
    semantics of bison/flex that make the parser wrong in subtle ways (for
    instance, the . regex char doesn't match \n and EOF), but there can
    also be weird errors in hand-written code. Pick your poison.

    In safety, a handwritten recursive descent parser could stack overflow
    and crash the server unless the proper guards are in place (e.g if
    someone tries to encode '[[[[[[...' as JSON). bison guards against
    this (chopping the maximum depth of JSON trees to 9997 levels or so),
    but I don't know if the real stack is smaller than bison thinks it is,
    or if bison uses its own buffer. A handwritten parser could also
    accept trees of any depth using a manually managed stack, but it
    wouldn't do much good because code that works with JSON trees (e.g.
    json_stringify) is currently recursive and has the same problem. In
    any case, I think limiting the depth of JSON, to something generous
    like 1000, is the right way to go. The RFC allows us to do that.

    In my opinion, the path of least resistance is the best path for now.
    If we use the bison parser now, then we can replace it with a
    hand-written one that's functionally equivalent later.


    Joey Adams
  • Pavel Stehule at Oct 5, 2010 at 5:13 am

    2010/10/5 Tom Lane <tgl@sss.pgh.pa.us>:
    Robert Haas <robertmhaas@gmail.com> writes:
    On Mon, Oct 4, 2010 at 2:50 PM, Joseph Adams wrote:
    If he doesn't respond, or outright refuses (which I, for one, doubt
    will happen), my fallback plan is to rewrite the JSON validation code
    by drawing from my original code (meaning it won't be in bison/flex)
    and post a patch for it.  Unfortunately, it seems to me that there
    aren't very many ways of expressing a JSON parser in bison/flex, and
    thus the idea of JSON parsing with bison/flex is effectively locked
    down by the GPL unless we can get a more permissive license for
    jsonval.  But, I am not a lawyer.
    If someone who hasn't looked at the GPL code sits down and codes
    something up based on the json.org home page, it's hard to imagine how
    anyone could be grumpy about that.
    Yeah.  Joseph seems to be confusing copyrights with patents.  The idea
    of "parse JSON with bison/flex" is not patentable by any stretch of the
    imagination.

    But having said that, I wonder whether bison/flex are really the best
    tool for the job in the first place.  From what I understand of JSON
    (which admittedly ain't much) a bison parser seems like overkill:
    it'd probably be both bloated and slow compared to a simple handwritten
    recursive-descent parser.
    PostgreSQL use a bison for same simple things - cube, PostGis,
    bootstrap .. so I don't see some special overhead. I am thinking so
    lex can be shared from core parser. bison parser for JSON means a less
    rows for maintaining a repeating some a basic pattern in pg source
    code.

    Regards

    Pavel Stehule

    ./contrib/seg/segparse.y
    ./contrib/cube/cubeparse.y
    ./src/backend/bootstrap/bootparse.y
    ./src/backend/parser/gram.y
    ./src/pl/plpgsql/src/gram.y
    ./src/interfaces/ecpg/preproc/preproc.y

    regards, tom lane

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 15, '10 at 2:33a
activeOct 5, '10 at 5:13a
posts18
users8
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase