Here is a patch for the first part of the JSON API that was recently
discussed. It includes the json parser hook infrastructure and
functions for json_get and friends, plus json_keys.

As is, this exposes the json lexer fully for use by the hook functions.
But I could easily be persuaded that this should be an opaque structure
with some constructor and getter functions - I don't think the hook
functions need or should be able to set anything in the lexer.

Work is proceeding on some of the more advanced functionality discussed.

cheers

andrew

Search Discussions

  • Robert Haas at Jan 2, 2013 at 9:45 pm

    On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan wrote:
    Here is a patch for the first part of the JSON API that was recently
    discussed. It includes the json parser hook infrastructure and functions
    for json_get and friends, plus json_keys.

    As is, this exposes the json lexer fully for use by the hook functions. But
    I could easily be persuaded that this should be an opaque structure with
    some constructor and getter functions - I don't think the hook functions
    need or should be able to set anything in the lexer.

    Work is proceeding on some of the more advanced functionality discussed.
    This seems to contain a large number of spurious whitespace changes.

    And maybe some other spurious changes. For example, I'm not sure why
    this comment is truncated:

         }
        }

    ! /*
    ! * Check for trailing garbage. As in json_lex(), any alphanumeric stuff
    ! * here should be considered part of the token for error-reporting
    ! * purposes.
    ! */
        for (p = s; JSON_ALPHANUMERIC_CHAR(*p); p++)
         error = true;
        lex->token_terminator = p;
        if (error)
         report_invalid_token(lex);
    --- 730,739 ----
         }
        }

    ! /* Check for trailing garbage. */
        for (p = s; JSON_ALPHANUMERIC_CHAR(*p); p++)
         error = true;
    + lex->prev_token_terminator = lex->token_terminator;
        lex->token_terminator = p;
        if (error)
         report_invalid_token(lex);


    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andrew Dunstan at Jan 2, 2013 at 10:52 pm

    On 01/02/2013 04:45 PM, Robert Haas wrote:
    On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan wrote:
    Here is a patch for the first part of the JSON API that was recently
    discussed. It includes the json parser hook infrastructure and functions
    for json_get and friends, plus json_keys.

    As is, this exposes the json lexer fully for use by the hook functions. But
    I could easily be persuaded that this should be an opaque structure with
    some constructor and getter functions - I don't think the hook functions
    need or should be able to set anything in the lexer.

    Work is proceeding on some of the more advanced functionality discussed.
    This seems to contain a large number of spurious whitespace changes.

    I'm glad you're looking at it :-)

    I did do a run of pgindent on the changed files before I cut the patch,
    which might have made some of those changes.

    I notice a couple of other infelicities too, which are undoubtedly my fault.

    The original prototype of this was cut against some older code, and I
    then tried to merge it with the current code base, and make sure that
    all the regression tests passed. That might well have resulted in a
    number of things that need review.
    And maybe some other spurious changes. For example, I'm not sure why
    this comment is truncated:

    Another good question.

    I'll make another pass over this and try to remove some of what's
    annoying you.

    cheers

    andrew
  • Andrew Dunstan at Jan 4, 2013 at 8:24 pm

    On 01/02/2013 05:51 PM, Andrew Dunstan wrote:
    On 01/02/2013 04:45 PM, Robert Haas wrote:
    On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net>
    wrote:
    Here is a patch for the first part of the JSON API that was recently
    discussed. It includes the json parser hook infrastructure and
    functions
    for json_get and friends, plus json_keys.

    Udated patch that contains most of the functionality I'm after. One
    piece left is populate_recordset (populate a set of records from a
    single json datum which is an array of objects, in one pass). That
    requires a bit of thought.

    I hope most of the whitespace issues are fixed.

    cheers

    andrew
  • Pavel Stehule at Jan 4, 2013 at 8:36 pm
    Hello



    2013/1/4 Andrew Dunstan <andrew@dunslane.net>:
    On 01/02/2013 05:51 PM, Andrew Dunstan wrote:

    On 01/02/2013 04:45 PM, Robert Haas wrote:

    On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net>
    wrote:
    Here is a patch for the first part of the JSON API that was recently
    discussed. It includes the json parser hook infrastructure and
    functions
    for json_get and friends, plus json_keys.


    Udated patch that contains most of the functionality I'm after. One piece
    left is populate_recordset (populate a set of records from a single json
    datum which is an array of objects, in one pass). That requires a bit of
    thought.

    I hope most of the whitespace issues are fixed.
    it is looking well

    I have one note - is it "json_each" good name?

    Regards

    Pavel
    cheers

    andrew


    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers
  • Andrew Dunstan at Jan 4, 2013 at 8:52 pm

    On 01/04/2013 03:36 PM, Pavel Stehule wrote:
    Hello



    2013/1/4 Andrew Dunstan <andrew@dunslane.net>:
    On 01/02/2013 05:51 PM, Andrew Dunstan wrote:
    On 01/02/2013 04:45 PM, Robert Haas wrote:
    On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net>
    wrote:
    Here is a patch for the first part of the JSON API that was recently
    discussed. It includes the json parser hook infrastructure and
    functions
    for json_get and friends, plus json_keys.

    Udated patch that contains most of the functionality I'm after. One piece
    left is populate_recordset (populate a set of records from a single json
    datum which is an array of objects, in one pass). That requires a bit of
    thought.

    I hope most of the whitespace issues are fixed.
    it is looking well

    I have one note - is it "json_each" good name?

    Possibly not, although hstore has each(). json_unnest might be even less
    felicitous.

    I'm expecting a good deal of bikeshedding - I'm trying to ignore those
    issues for the most part because the functionality seems much more
    important to me than the names.

    The more people that pound on it and try to break it the happier I'll be.

    cheers

    andrew
  • Pavel Stehule at Jan 4, 2013 at 9:04 pm
    2013/1/4 Andrew Dunstan <andrew@dunslane.net>:
    On 01/04/2013 03:36 PM, Pavel Stehule wrote:

    Hello



    2013/1/4 Andrew Dunstan <andrew@dunslane.net>:
    On 01/02/2013 05:51 PM, Andrew Dunstan wrote:

    On 01/02/2013 04:45 PM, Robert Haas wrote:

    On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net>
    wrote:
    Here is a patch for the first part of the JSON API that was recently
    discussed. It includes the json parser hook infrastructure and
    functions
    for json_get and friends, plus json_keys.


    Udated patch that contains most of the functionality I'm after. One piece
    left is populate_recordset (populate a set of records from a single json
    datum which is an array of objects, in one pass). That requires a bit of
    thought.

    I hope most of the whitespace issues are fixed.
    it is looking well

    I have one note - is it "json_each" good name?

    Possibly not, although hstore has each(). json_unnest might be even less
    felicitous.
    I understand - but hstore isn't in core - so it should not be precedent

    regexp_split_to_table

    I am not native speaker, it sounds little bit strange - but maybe
    because I am not native speaker :)

    Regards

    Pavel
    I'm expecting a good deal of bikeshedding - I'm trying to ignore those
    issues for the most part because the functionality seems much more important
    to me than the names.

    The more people that pound on it and try to break it the happier I'll be.

    cheers

    andrew
  • Merlin Moncure at Jan 7, 2013 at 3:25 pm

    On Fri, Jan 4, 2013 at 3:03 PM, Pavel Stehule wrote:
    I understand - but hstore isn't in core - so it should not be precedent

    regexp_split_to_table

    I am not native speaker, it sounds little bit strange - but maybe
    because I am not native speaker :)
    it's common usage: http://api.jquery.com/jQuery.each/

    the patch looks fabulous. There are a few trivial whitespace issues
    yet and I noticed a leaked hstore comment@ 2440:
    + /*
    + * if the input hstore is empty, we can only skip the rest if we were
    + * passed in a non-null record, since otherwise there may be issues with
    + * domain nulls.
    + */

    merlin
  • Pavel Stehule at Jan 7, 2013 at 7:32 pm

    2013/1/7 Merlin Moncure <mmoncure@gmail.com>:
    On Fri, Jan 4, 2013 at 3:03 PM, Pavel Stehule wrote:
    I understand - but hstore isn't in core - so it should not be precedent

    regexp_split_to_table

    I am not native speaker, it sounds little bit strange - but maybe
    because I am not native speaker :)
    it's common usage: http://api.jquery.com/jQuery.each/
    ook

    Regards

    Pavel
    the patch looks fabulous. There are a few trivial whitespace issues
    yet and I noticed a leaked hstore comment@ 2440:
    + /*
    + * if the input hstore is empty, we can only skip the rest if we were
    + * passed in a non-null record, since otherwise there may be issues with
    + * domain nulls.
    + */

    merlin
  • Andrew Dunstan at Jan 7, 2013 at 10:16 pm

    On 01/07/2013 10:25 AM, Merlin Moncure wrote:
    the patch looks fabulous. There are a few trivial whitespace issues
    yet and I noticed a leaked hstore comment@ 2440:
    + /*
    + * if the input hstore is empty, we can only skip the rest if we were
    + * passed in a non-null record, since otherwise there may be issues with
    + * domain nulls.
    + */

    Here is a patch that has all the functionality I'm intending to provide.

    The API is improved some, with the parser now keeping track of the
    nesting levels instead of callers having to do so, and a constructor
    function provided for JsonLexContext objects.

    The processing functions have been extended to provide populate_record()
    and populate_recordset() functions.The latter in particular could be
    useful in decomposing a piece of json representing an array of flat
    objects (a fairly common pattern) into a set of Postgres records in a
    single pass.

    The main thing I'm going to concentrate on now is making sure that this
    doesn't leak memory. I'm sure there's some tightening required in that
    area. Any eyeballs there will be greatly appreciated. There are also a
    couple of very minor things to clean up.

    You (Merlin) have kindly volunteered to work on documentation, so before
    we go too far with that any bikeshedding on names, or on the
    functionality being provided, should now take place.


    cheers

    andrew
  • James at Jan 8, 2013 at 6:45 am
    The processing functions have been extended to provide populate_record() and populate_recordset() functions.The latter in particular could be useful in decomposing a piece of json representing an array of flat objects (a fairly common pattern) into a set of Postgres records in a single pass.
    So this would allow an 'insert into ... select ... from
    <unpack-the-JSON>(...)'?

    I had been wondering how to do such an insertion efficiently in the
    context of SPI, but it seems that there is no SPI_copy equiv that would
    allow a query parse and plan to be avoided.

    Is this mechanism likely to be as fast as we can get at the moment in
    contexts where copy is not feasible?
  • Andrew Dunstan at Jan 8, 2013 at 2:59 pm

    On 01/08/2013 01:45 AM, james wrote:
    The processing functions have been extended to provide
    populate_record() and populate_recordset() functions.The latter in
    particular could be useful in decomposing a piece of json
    representing an array of flat objects (a fairly common pattern) into
    a set of Postgres records in a single pass.
    So this would allow an 'insert into ... select ... from
    <unpack-the-JSON>(...)'? Yes.
    I had been wondering how to do such an insertion efficiently in the
    context of SPI, but it seems that there is no SPI_copy equiv that
    would allow a query parse and plan to be avoided.
    Your query above would need to be planned too, although the plan will be
    trivial.
    Is this mechanism likely to be as fast as we can get at the moment in
    contexts where copy is not feasible?
    You should not try to use it as a general bulk load facility. And it
    will not be as fast as COPY for several reasons, including that the Json
    parsing routines are necessarily much heavier than the COPY parse
    routines, which have in any case been optimized over quite a long
    period. Also, a single json datum is limited to no more than 1Gb. If you
    have such a datum, parsing it involves having it in memory and then
    taking a copy (I wonder if we could avoid that step - will take a look).
    Then each object is decomposed into a hash table of key value pairs,
    which it then used to construct the record datum. Each field name in
    the result record is used to look up the value in the hash table - this
    happens once in the case of populate_record() and once per object in the
    array in the case of populate_recordset(). In the latter case the
    resulting records are put into a tuplestore structure (which spills to
    disk if necessary) which is then returned to the caller when all the
    objects in the json array are processed. COPY doesn't have these sorts
    of issues. It knows without having to look things up where each datum is
    in each record, and it stashes the result straight into the target
    table. It can read and insert huge numbers of rows without significant
    memory implications.

    Both these routines and COPY in non-binary mode use the data type input
    routines to convert text values. In some cases (very notably timestamps)
    these routines can easily be shown to be fantastically expensive
    compared to binary input. This is part of what has led to the creation
    of utilities like pg_bulkload.

    Perhaps if you give us a higher level view of what you're trying to
    achieve we can help you better.

    cheers

    andrew
  • James at Jan 8, 2013 at 8:07 pm

    I had been wondering how to do such an insertion efficiently in the context of SPI, but it seems that there is no SPI_copy equiv that would allow a query parse and plan to be avoided.
    Your query above would need to be planned too, although the plan will be trivial.
    Ah yes, I meant that I had not found a way to avoid it (for multi-row
    inserts etc) from a stored proc context where I have SPI functions
    available.
    You should not try to use it as a general bulk load facility. And it will not be as fast as COPY for several reasons, including that the Json parsing routines are necessarily much heavier than the COPY parse routines, which have in any case been optimized over quite a long period. Also, a single json datum is limited to no more than 1Gb. If you have such a datum, parsing it involves having it in memory and then taking a copy (I wonder if we could avoid that step - will take a look). Then each object is decomposed into a hash table of key value pairs, which it then used to construct the record datum. Each field name in the result record is used to look up the value in the hash table - this happens once in the case of populate_record() and once per object in the array in the case of populate_recordset(). In the latter case the resulting records are put into a tuplestore structure (which spills to disk if necessary) which is then returned to the caller when all the objects in
       the js
    on array are processed. COPY doesn't have these sorts of issues. It knows without having to look things up where each datum is in each record, and it stashes the result straight into the target table. It can read and insert huge numbers of rows without significant memory implications.

    Yes - but I don't think I can use COPY from a stored proc context can I?
       If I could use binary COPY from a stored proc that has received a
    binary param and unpacked to the data, it would be handy.

    If SPI provided a way to perform a copy to a temp table and then some
    callback on an iterator that yields rows to it, that would do the trick
    I guess.
    Perhaps if you give us a higher level view of what you're trying to achieve we can help you better.
    I had been trying to identify a way to work with record sets where the
    records might be used for insert, or for updates or deletion statements,
    preferably without forming a large custom SQL statement that must then
    be parsed and planned (and which would be a PITA if I wanted to use the
    SQL-C preprocessor or some language bindings that like to prepare a
    statement and execute with params).

    The data I work with has a master-detail structure and insertion
    performance matters, so I'm trying to limit manipulations to one
    statement per table per logical operation even where there are multiple
    detail rows.

    Sometimes the network latency can be a pain too and that also suggests
    an RPC with unpack and insert locally.

    Cheers
    James
  • Andrew Dunstan at Jan 8, 2013 at 8:26 pm

    On 01/08/2013 03:07 PM, james wrote:

    Yes - but I don't think I can use COPY from a stored proc context can
    I? If I could use binary COPY from a stored proc that has received a
    binary param and unpacked to the data, it would be handy.
    You can use COPY from a stored procedure, but only to and from files.
    If SPI provided a way to perform a copy to a temp table and then some
    callback on an iterator that yields rows to it, that would do the
    trick I guess.
    SPI is useful, but it's certainly possible to avoid its use. After all,
    that what almost the whole backend does, including the COPY code. Of
    course, it's a lot harder to write that way, which is part of why SPI
    exists. Efficiency has its price.


    cheers

    andrew
  • James at Jan 8, 2013 at 9:31 pm
    You can use COPY from a stored procedure, but only to and from files.
    I think that's in the chocolate fireguard realm though as far as
    efficiency for this sort of scenario goes, even if its handled by
    retaining an mmap'd file as workspace.
    If SPI provided a way to perform a copy to a temp table and then some callback on an iterator that yields rows to it, that would do the trick I guess.
    SPI is useful, but it's certainly possible to avoid its use. After all, that what almost the whole backend does, including the COPY code. Of course, it's a lot harder to write that way, which is part of why SPI exists. Efficiency has its price.
    So it is possible to use a lower level interface from a C stored proc?
    SPI is the (only) documented direct function extension API isn't it?

    Is the issue with using the JSON data-to-record set that the parsing can
    be costly? Perhaps it can be achieved with B64 of compressed protobuf,
    or such. I don't mind if it seems a bit messy - the code can be
    generated from the table easily enough, especially if I can use C++. I
    guess an allocator that uses SPI_palloc would solve issues with memory
    management on error?
  • Andrew Dunstan at Jan 8, 2013 at 8:13 pm

    On 01/08/2013 09:58 AM, Andrew Dunstan wrote:
    If you have such a datum, parsing it involves having it in memory and
    then taking a copy (I wonder if we could avoid that step - will take a
    look).

    Here is a Proof Of Concept patch against my development tip on what's
    involved in getting the JSON lexer not to need a nul-terminated string
    to parse. This passes regression, incidentally. The downside is that
    processing is very slightly more complex, and that json_in() would need
    to call strlen() on its input. The upside would be that the processing
    routines I've been working on would no longer need to create copies of
    their json arguments using text_to_cstring() just so they can get a
    null-terminated string to process.

    Consequent changes would modify the signature of makeJsonLexContext() so
    it's first argument would be a text* instead of a char* (and of course
    its logic would change accordingly).

    I could go either way. Thoughts?

    cheers

    andrew
  • Andrew Dunstan at Jan 8, 2013 at 8:22 pm

    On 01/08/2013 03:12 PM, Andrew Dunstan wrote:
    On 01/08/2013 09:58 AM, Andrew Dunstan wrote:

    If you have such a datum, parsing it involves having it in memory and
    then taking a copy (I wonder if we could avoid that step - will take
    a look).

    Here is a Proof Of Concept patch against my development tip on what's
    involved in getting the JSON lexer not to need a nul-terminated string
    to parse. This passes regression, incidentally. The downside is that
    processing is very slightly more complex, and that json_in() would
    need to call strlen() on its input. The upside would be that the
    processing routines I've been working on would no longer need to
    create copies of their json arguments using text_to_cstring() just so
    they can get a null-terminated string to process.

    Consequent changes would modify the signature of makeJsonLexContext()
    so it's first argument would be a text* instead of a char* (and of
    course its logic would change accordingly).

    I could go either way. Thoughts?
    this time with patch ...
  • Peter Eisentraut at Jan 8, 2013 at 9:18 pm

    On 1/7/13 5:15 PM, Andrew Dunstan wrote:
    You (Merlin) have kindly volunteered to work on documentation, so before
    we go too far with that any bikeshedding on names, or on the
    functionality being provided, should now take place.
    Hmm, I was going to say, this patch contains no documentation, so I have
    no idea what it is supposed to do. "Recently discussed" isn't a good
    substitute for describing what the patch is supposed to accomplish.
  • Merlin Moncure at Jan 8, 2013 at 9:32 pm

    On Tue, Jan 8, 2013 at 3:19 PM, Peter Eisentraut wrote:
    On 1/7/13 5:15 PM, Andrew Dunstan wrote:
    You (Merlin) have kindly volunteered to work on documentation, so before
    we go too far with that any bikeshedding on names, or on the
    functionality being provided, should now take place.
    Hmm, I was going to say, this patch contains no documentation, so I have
    no idea what it is supposed to do. "Recently discussed" isn't a good
    substitute for describing what the patch is supposed to accomplish.
    Why not? There are functional examples in the docs and the purpose of
    the various functions was hashed out pretty well a couple weeks back,
    deficiencies corrected, etc.

    reference: http://postgresql.1045698.n5.nabble.com/json-accessors-td5733929.html

    merlin
  • Andrew Dunstan at Jan 8, 2013 at 10:01 pm

    On 01/08/2013 04:32 PM, Merlin Moncure wrote:
    On Tue, Jan 8, 2013 at 3:19 PM, Peter Eisentrautwrote:
    On 1/7/13 5:15 PM, Andrew Dunstan wrote:
    You (Merlin) have kindly volunteered to work on documentation, so before
    we go too far with that any bikeshedding on names, or on the
    functionality being provided, should now take place.
    Hmm, I was going to say, this patch contains no documentation, so I have
    no idea what it is supposed to do. "Recently discussed" isn't a good
    substitute for describing what the patch is supposed to accomplish.
    Why not? There are functional examples in the docs and the purpose of
    the various functions was hashed out pretty well a couple weeks back,
    deficiencies corrected, etc.

    reference:http://postgresql.1045698.n5.nabble.com/json-accessors-td5733929.html
    Well, at a high level the patch is meant to do two things: provide an
    API that can be used to build JSON processing functions easily, and
    provide some basic json processing functions built on the API. Those
    functions provide similar capabilities to the accessor functions that
    hstore has.

    Perhaps also this will help. Here is the list of functions and operators
    as currently implemented. I also have working operators for the get_path
    functions which will be in a future patch.

    All these are used in the included regression tests.


                Name | Result data type | Argument data types

    -------------------------+------------------+------------------------------------------------------------------------

       json_array_length | integer | json

       json_each | SETOF record | from_json json, OUT key text, OUT value json

       json_each_as_text | SETOF record | from_json json, OUT key text, OUT value text

       json_get | json | json, integer

       json_get | json | json, text

       json_get_as_text | text | json, integer

       json_get_as_text | text | json, text

       json_get_path | json | from_json json, VARIADIC path_elems text[]

       json_get_path_as_text | text | from_json json, VARIADIC path_elems text[]

       json_object_keys | SETOF text | json

       json_populate_record | anyelement | anyelement, json

       json_populate_recordset | SETOF anyelement | base anyelement, from_json json, use_json_as_text boolean DEFAULT false

       json_unnest | SETOF json | from_json json, OUT value json



       Name | Left arg type | Right arg type | Result type | Description

    ------+---------------+----------------+-------------+--------------------------------

       -> | json | integer | json | get json array element

       -> | json | text | json | get json object field

       ->> | json | integer | text | get json array element as text

       ->> | json | text | text | get json object field as text



    cheers

    andrew
  • Merlin Moncure at Jan 10, 2013 at 8:01 pm

    On Mon, Jan 7, 2013 at 4:15 PM, Andrew Dunstan wrote:
    You (Merlin) have kindly volunteered to work on documentation, so before we
    go too far with that any bikeshedding on names, or on the functionality
    being provided, should now take place.
    Barring comment/complaint, I'm just going to roll with what we've got.
      It seems pretty good to me.

    merlin
  • Andrew Dunstan at Jan 10, 2013 at 11:42 pm

    On 01/04/2013 03:23 PM, Andrew Dunstan wrote:
    On 01/02/2013 05:51 PM, Andrew Dunstan wrote:
    On 01/02/2013 04:45 PM, Robert Haas wrote:
    On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan
    wrote:
    Here is a patch for the first part of the JSON API that was recently
    discussed. It includes the json parser hook infrastructure and
    functions
    for json_get and friends, plus json_keys.

    Udated patch that contains most of the functionality I'm after. One
    piece left is populate_recordset (populate a set of records from a
    single json datum which is an array of objects, in one pass). That
    requires a bit of thought.

    I hope most of the whitespace issues are fixed.

    This updated patch contains all the intended functionality, including
    operators for the json_get_path functions, so you can say things like

          select jsonval->array['f1','0','f2] ...

    It also removes any requirement to copy the json value before setting up
    the lexer by removing the lexer's requirement to have a nul terminated
    string. Instead the lexer is told the input length and relies on that.
    For this reason, json_in() now calls cstring_get_text() before rather
    than after calling the validation routine, but that's really not
    something worth bothering about.

    A couple of points worth noting: it's a pity that we have to run CREATE
    OR REPLACE FUNCTION in system_views.sql in order to set up default
    values for builtin functions. That feels very kludgy. Also, making
    operators for variadic functions is a bit of a pain. I had to set up
    non-variadic version of the same functions (see json_get_path_op and
    json_get_path_as_text_op) just so I could set up the operators. Neither
    of these are exactly showstopper items, just mild annoyances.

    I will continue hunting memory leaks, but when Merlin gets done with
    docco I think we'll be far enough advanced to add this to the commitfest.

    cheers

    andrew
  • Robert Haas at Jan 14, 2013 at 4:32 pm

    On Thu, Jan 10, 2013 at 6:42 PM, Andrew Dunstan wrote:
    Udated patch that contains most of the functionality I'm after. One piece
    left is populate_recordset (populate a set of records from a single json
    datum which is an array of objects, in one pass). That requires a bit of
    thought.

    I hope most of the whitespace issues are fixed.

    This updated patch contains all the intended functionality, including
    operators for the json_get_path functions, so you can say things like

    select jsonval->array['f1','0','f2] ...

    It also removes any requirement to copy the json value before setting up the
    lexer by removing the lexer's requirement to have a nul terminated string.
    Instead the lexer is told the input length and relies on that. For this
    reason, json_in() now calls cstring_get_text() before rather than after
    calling the validation routine, but that's really not something worth
    bothering about.

    A couple of points worth noting: it's a pity that we have to run CREATE OR
    REPLACE FUNCTION in system_views.sql in order to set up default values for
    builtin functions. That feels very kludgy. Also, making operators for
    variadic functions is a bit of a pain. I had to set up non-variadic version
    of the same functions (see json_get_path_op and json_get_path_as_text_op)
    just so I could set up the operators. Neither of these are exactly
    showstopper items, just mild annoyances.

    I will continue hunting memory leaks, but when Merlin gets done with docco I
    think we'll be far enough advanced to add this to the commitfest.
    So, how much performance does this lose on json_in() on a large
    cstring, as compared with master?

    I can't shake the feeling that this is adding a LOT of unnecessary
    data copying. For one thing, instead of copying every single lexeme
    (including the single-character ones?) out of the original object, we
    could just store a pointer to the offset where the object starts and a
    length, instead of copying it.

    This is also remarkably thin on comments.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andrew Dunstan at Jan 14, 2013 at 5:52 pm

    On 01/14/2013 11:32 AM, Robert Haas wrote:
    So, how much performance does this lose on json_in() on a large
    cstring, as compared with master?
    That's a good question. I'll try to devise a test.
    I can't shake the feeling that this is adding a LOT of unnecessary
    data copying. For one thing, instead of copying every single lexeme
    (including the single-character ones?) out of the original object, we
    could just store a pointer to the offset where the object starts and a
    length, instead of copying it.
    In the pure pares case (json_in, json_reccv) nothing extra should be
    copied. On checking this after reading the above I found that wasn't
    quite the case, and some lexemes (scalars and field names, but not
    punctuation) were being copied when not needed. I have made a fix (see
    <https://bitbucket.org/adunstan/pgdevel/commits/139043dba7e6b15f1f9f7675732bd9dae1fb6497>)
    which I will include in the next version I publish.

    In the case of string lexemes, we are passing back a de-escaped version,
    so just handing back pointers to the beginning and end in the input
    string doesn't work.
    This is also remarkably thin on comments.
    Fair criticism. I'll work on that.

    Thanks for looking at this.

    cheers

    andrew
  • Andrew Dunstan at Jan 15, 2013 at 4:03 am

    On 01/14/2013 12:52 PM, Andrew Dunstan wrote:
    On 01/14/2013 11:32 AM, Robert Haas wrote:

    So, how much performance does this lose on json_in() on a large
    cstring, as compared with master?
    That's a good question. I'll try to devise a test.
    I can't shake the feeling that this is adding a LOT of unnecessary
    data copying. For one thing, instead of copying every single lexeme
    (including the single-character ones?) out of the original object, we
    could just store a pointer to the offset where the object starts and a
    length, instead of copying it.
    In the pure pares case (json_in, json_reccv) nothing extra should be
    copied. On checking this after reading the above I found that wasn't
    quite the case, and some lexemes (scalars and field names, but not
    punctuation) were being copied when not needed. I have made a fix (see
    <https://bitbucket.org/adunstan/pgdevel/commits/139043dba7e6b15f1f9f7675732bd9dae1fb6497>)
    which I will include in the next version I publish.

    In the case of string lexemes, we are passing back a de-escaped
    version, so just handing back pointers to the beginning and end in the
    input string doesn't work.

    After a couple of iterations, some performance enhancements to the json
    parser and lexer have ended up with a net performance improvement over
    git tip. On our test rig, the json parse test runs at just over 13s per
    10000 parses on git tip and approx 12.55s per 10000 parses with the
    attached patch.

    Truth be told, I think the lexer changes have more than paid for the
    small cost of the switch to an RD parser. But since the result is a net
    performance win PLUS some enhanced functionality, I think we should be
    all good.

    cheers

    andrew
  • Andrew Dunstan at Jan 15, 2013 at 4:32 pm

    On 01/14/2013 11:02 PM, Andrew Dunstan wrote:
    On 01/14/2013 12:52 PM, Andrew Dunstan wrote:
    On 01/14/2013 11:32 AM, Robert Haas wrote:

    So, how much performance does this lose on json_in() on a large
    cstring, as compared with master?
    That's a good question. I'll try to devise a test.
    I can't shake the feeling that this is adding a LOT of unnecessary
    data copying. For one thing, instead of copying every single lexeme
    (including the single-character ones?) out of the original object, we
    could just store a pointer to the offset where the object starts and a
    length, instead of copying it.
    In the pure pares case (json_in, json_reccv) nothing extra should be
    copied. On checking this after reading the above I found that wasn't
    quite the case, and some lexemes (scalars and field names, but not
    punctuation) were being copied when not needed. I have made a fix
    (see
    <https://bitbucket.org/adunstan/pgdevel/commits/139043dba7e6b15f1f9f7675732bd9dae1fb6497>)
    which I will include in the next version I publish.

    In the case of string lexemes, we are passing back a de-escaped
    version, so just handing back pointers to the beginning and end in
    the input string doesn't work.

    After a couple of iterations, some performance enhancements to the
    json parser and lexer have ended up with a net performance improvement
    over git tip. On our test rig, the json parse test runs at just over
    13s per 10000 parses on git tip and approx 12.55s per 10000 parses
    with the attached patch.

    Truth be told, I think the lexer changes have more than paid for the
    small cost of the switch to an RD parser. But since the result is a
    net performance win PLUS some enhanced functionality, I think we
    should be all good.
    Latest version of this patch, including some documentation, mainly from
    Merlin Moncure but tweaked by me.

    cheers

    andrew
  • Andrew Dunstan at Jan 15, 2013 at 10:46 pm

    On 01/15/2013 11:31 AM, Andrew Dunstan wrote:
    On 01/14/2013 11:02 PM, Andrew Dunstan wrote:
    On 01/14/2013 12:52 PM, Andrew Dunstan wrote:
    On 01/14/2013 11:32 AM, Robert Haas wrote:

    So, how much performance does this lose on json_in() on a large
    cstring, as compared with master?
    That's a good question. I'll try to devise a test.
    I can't shake the feeling that this is adding a LOT of unnecessary
    data copying. For one thing, instead of copying every single lexeme
    (including the single-character ones?) out of the original object, we
    could just store a pointer to the offset where the object starts and a
    length, instead of copying it.
    In the pure pares case (json_in, json_reccv) nothing extra should be
    copied. On checking this after reading the above I found that wasn't
    quite the case, and some lexemes (scalars and field names, but not
    punctuation) were being copied when not needed. I have made a fix
    (see
    <https://bitbucket.org/adunstan/pgdevel/commits/139043dba7e6b15f1f9f7675732bd9dae1fb6497>)
    which I will include in the next version I publish.

    In the case of string lexemes, we are passing back a de-escaped
    version, so just handing back pointers to the beginning and end in
    the input string doesn't work.

    After a couple of iterations, some performance enhancements to the
    json parser and lexer have ended up with a net performance
    improvement over git tip. On our test rig, the json parse test runs
    at just over 13s per 10000 parses on git tip and approx 12.55s per
    10000 parses with the attached patch.

    Truth be told, I think the lexer changes have more than paid for the
    small cost of the switch to an RD parser. But since the result is a
    net performance win PLUS some enhanced functionality, I think we
    should be all good.
    Latest version of this patch, including some documentation, mainly
    from Merlin Moncure but tweaked by me.
    Now with more comments.

    cheers

    andrew
  • Andrew Dunstan at Jan 26, 2013 at 3:27 pm

    On 01/15/2013 05:45 PM, Andrew Dunstan wrote:
    Latest version of this patch, including some documentation, mainly
    from Merlin Moncure but tweaked by me.

    Now with more comments.
    New version attached. The only change is to remove some unnecessary uses
    of PG_FUNCTION_INFO_V1 that were the result of overenthusiastic copy and
    paste.

    cheers

    andrew
  • Robert Haas at Jan 16, 2013 at 3:28 pm

    On Mon, Jan 14, 2013 at 11:02 PM, Andrew Dunstan wrote:
    After a couple of iterations, some performance enhancements to the json
    parser and lexer have ended up with a net performance improvement over git
    tip. On our test rig, the json parse test runs at just over 13s per 10000
    parses on git tip and approx 12.55s per 10000 parses with the attached
    patch.

    Truth be told, I think the lexer changes have more than paid for the small
    cost of the switch to an RD parser. But since the result is a net
    performance win PLUS some enhanced functionality, I think we should be all
    good.
    Yeah, that sounds great. Thanks for putting in the effort.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Merlin Moncure at Jan 15, 2013 at 12:36 am

    On Thu, Jan 10, 2013 at 5:42 PM, Andrew Dunstan wrote:
    On 01/04/2013 03:23 PM, Andrew Dunstan wrote:

    On 01/02/2013 05:51 PM, Andrew Dunstan wrote:

    On 01/02/2013 04:45 PM, Robert Haas wrote:

    On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net>
    wrote:
    Here is a patch for the first part of the JSON API that was recently
    discussed. It includes the json parser hook infrastructure and
    functions
    for json_get and friends, plus json_keys.


    Udated patch that contains most of the functionality I'm after. One piece
    left is populate_recordset (populate a set of records from a single json
    datum which is an array of objects, in one pass). That requires a bit of
    thought.

    I hope most of the whitespace issues are fixed.

    This updated patch contains all the intended functionality, including
    operators for the json_get_path functions, so you can say things like

    select jsonval->array['f1','0','f2] ...

    It also removes any requirement to copy the json value before setting up the
    lexer by removing the lexer's requirement to have a nul terminated string.
    Instead the lexer is told the input length and relies on that. For this
    reason, json_in() now calls cstring_get_text() before rather than after
    calling the validation routine, but that's really not something worth
    bothering about.

    A couple of points worth noting: it's a pity that we have to run CREATE OR
    REPLACE FUNCTION in system_views.sql in order to set up default values for
    builtin functions. That feels very kludgy. Also, making operators for
    variadic functions is a bit of a pain. I had to set up non-variadic version
    of the same functions (see json_get_path_op and json_get_path_as_text_op)
    just so I could set up the operators. Neither of these are exactly
    showstopper items, just mild annoyances.

    I will continue hunting memory leaks, but when Merlin gets done with docco I
    think we'll be far enough advanced to add this to the commitfest.
    While testing this I noticed that integer based 'get' routines are
    zero based -- was this intentional? Virtually all other aspects of
    SQL are 1 based:

    postgres=# select json_get('[1,2,3]', 1);
      json_get
    ----------
      2
    (1 row)

    postgres=# select json_get('[1,2,3]', 0);
      json_get
    ----------
      1
    (1 row)

    merlin
  • Andrew Dunstan at Jan 15, 2013 at 12:53 am

    On 01/14/2013 07:36 PM, Merlin Moncure wrote:
    While testing this I noticed that integer based 'get' routines are
    zero based -- was this intentional? Virtually all other aspects of
    SQL are 1 based:

    postgres=# select json_get('[1,2,3]', 1);
    json_get
    ----------
    2
    (1 row)

    postgres=# select json_get('[1,2,3]', 0);
    json_get
    ----------
    1
    (1 row)

    Yes. it's intentional. SQL arrays might be 1-based by default, but
    JavaScript arrays are not. JsonPath and similar gadgets treat the arrays
    as zero-based. I suspect the Json-using community would not thank us for
    being overly SQL-centric on this - and I say that as someone who has
    always thought zero based arrays were a major design mistake,
    responsible for countless off-by-one errors.

    cheers

    andrew
  • David Fetter at Jan 15, 2013 at 7:04 pm

    On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:
    On 01/14/2013 07:36 PM, Merlin Moncure wrote:
    While testing this I noticed that integer based 'get' routines are
    zero based -- was this intentional? Virtually all other aspects of
    SQL are 1 based:

    postgres=# select json_get('[1,2,3]', 1);
    json_get
    ----------
    2
    (1 row)

    postgres=# select json_get('[1,2,3]', 0);
    json_get
    ----------
    1
    (1 row)
    Yes. it's intentional. SQL arrays might be 1-based by default, but
    JavaScript arrays are not. JsonPath and similar gadgets treat the
    arrays as zero-based. I suspect the Json-using community would not
    thank us for being overly SQL-centric on this - and I say that as
    someone who has always thought zero based arrays were a major design
    mistake, responsible for countless off-by-one errors.
    Perhaps we could compromise by making arrays 0.5-based.

    Cheers,
    David.
    --
    David Fetter <david@fetter.org> http://fetter.org/
    Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
    Skype: davidfetter XMPP: david.fetter@gmail.com
    iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

    Remember to vote!
    Consider donating to Postgres: http://www.postgresql.org/about/donate
  • Merlin Moncure at Jan 15, 2013 at 7:47 pm

    On Tue, Jan 15, 2013 at 1:04 PM, David Fetter wrote:
    On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:
    On 01/14/2013 07:36 PM, Merlin Moncure wrote:
    While testing this I noticed that integer based 'get' routines are
    zero based -- was this intentional? Virtually all other aspects of
    SQL are 1 based:

    postgres=# select json_get('[1,2,3]', 1);
    json_get
    ----------
    2
    (1 row)

    postgres=# select json_get('[1,2,3]', 0);
    json_get
    ----------
    1
    (1 row)
    Yes. it's intentional. SQL arrays might be 1-based by default, but
    JavaScript arrays are not. JsonPath and similar gadgets treat the
    arrays as zero-based. I suspect the Json-using community would not
    thank us for being overly SQL-centric on this - and I say that as
    someone who has always thought zero based arrays were a major design
    mistake, responsible for countless off-by-one errors.
    Perhaps we could compromise by making arrays 0.5-based.
    Well, I'm not prepared to argue with Andrew in this one. It was
    surprising behavior to me, but that's sample size one.

    merlin
  • Andrew Dunstan at Jan 15, 2013 at 8:17 pm

    On 01/15/2013 02:47 PM, Merlin Moncure wrote:
    On Tue, Jan 15, 2013 at 1:04 PM, David Fetter wrote:
    On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:
    On 01/14/2013 07:36 PM, Merlin Moncure wrote:
    While testing this I noticed that integer based 'get' routines are
    zero based -- was this intentional? Virtually all other aspects of
    SQL are 1 based:

    postgres=# select json_get('[1,2,3]', 1);
    json_get
    ----------
    2
    (1 row)

    postgres=# select json_get('[1,2,3]', 0);
    json_get
    ----------
    1
    (1 row)
    Yes. it's intentional. SQL arrays might be 1-based by default, but
    JavaScript arrays are not. JsonPath and similar gadgets treat the
    arrays as zero-based. I suspect the Json-using community would not
    thank us for being overly SQL-centric on this - and I say that as
    someone who has always thought zero based arrays were a major design
    mistake, responsible for countless off-by-one errors.
    Perhaps we could compromise by making arrays 0.5-based.
    Well, I'm not prepared to argue with Andrew in this one. It was
    surprising behavior to me, but that's sample size one.

    I doubt I'm very representative either. People like David Wheeler, Taras
    Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask
    than me. I'm quite prepared to change it if that's the consensus.

    cheers

    andrew
  • Daniel Farina at Jan 15, 2013 at 9:24 pm

    On Tue, Jan 15, 2013 at 12:17 PM, Andrew Dunstan wrote:
    On 01/15/2013 02:47 PM, Merlin Moncure wrote:
    On Tue, Jan 15, 2013 at 1:04 PM, David Fetter wrote:
    On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:
    On 01/14/2013 07:36 PM, Merlin Moncure wrote:

    While testing this I noticed that integer based 'get' routines are
    zero based -- was this intentional? Virtually all other aspects of
    SQL are 1 based:

    postgres=# select json_get('[1,2,3]', 1);
    json_get
    ----------
    2
    (1 row)

    postgres=# select json_get('[1,2,3]', 0);
    json_get
    ----------
    1
    (1 row)
    Yes. it's intentional. SQL arrays might be 1-based by default, but
    JavaScript arrays are not. JsonPath and similar gadgets treat the
    arrays as zero-based. I suspect the Json-using community would not
    thank us for being overly SQL-centric on this - and I say that as
    someone who has always thought zero based arrays were a major design
    mistake, responsible for countless off-by-one errors.
    Perhaps we could compromise by making arrays 0.5-based.
    Well, I'm not prepared to argue with Andrew in this one. It was
    surprising behavior to me, but that's sample size one.
    I doubt I'm very representative either. People like David Wheeler, Taras
    Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask than
    me. I'm quite prepared to change it if that's the consensus.
    Hello.

    I'm inclined to go with the same gut feeling you had (zero-based-indexing).

    Here is the background for my reasoning:

    The downside of zero-based-indexing is that people who want to use
    multiple sequential container types will inevitably have to deal with
    detailed and not easily type-checked integer coordinates that mean
    different things in each domain that will, no doubt, lead to a number
    of off-by-one errors. Nevertheless, this cost is already paid because
    one of the first things many people will do in programs generating SQL
    queries is try to zero-index a SQL array, swear a bit after figuring
    things out (because a NULL will be generated, not an error), and then
    adjust all the offsets. So, this is not a new problem. On many
    occasions I'm sure this has caused off-by-one bugs, or the NULLs
    slipped through testing and delivered funny results, yet the world
    moves on.

    On the other hand, the downside of going down the road of 1-based
    indexing and attempting to attain relative sameness to SQL arrays, it
    would also feel like one would be obliged to implement SQL array
    infelicities like 'out of bounds' being SQL NULL rather than an error,
    related to other spectres like non-rectangular nested arrays. SQL
    array semantics are complex and The Committee can change them or --
    slightly more likely -- add interactions, so it seems like a general
    expectation that Postgres container types that happen to have any
    reasonable ordinal addressing will implement some level of same-ness
    with SQL arrays is a very messy one. As such, if it becomes customary
    to implement one-based indexing of containers, I think such customs
    are best carefully circumscribed so that attempts to be 'like' SQL
    arrays are only as superficial as that.

    What made me come down on the side of zero-based indexing in spite of
    the weaknesses are these two reasons:

    * The number of people who use JSON and zero-based-indexing is very
       large, and furthermore, within that set the number that know that
       SQL even defines array support -- much less that Postgres implements
       it -- is much smaller. Thus, one is targeting cohesion with a fairly
       alien concept that is not understood by the audience.

    * Maintaining PL integrated code that uses both 1-based indexing in PG
       functions and 0-based indexing in embedded languages that are likely
       to be combined with JSON -- doesn't sound very palatable, and the
       use of such PLs (e.g. plv8) seems pretty likely, too. That can
       probably be a rich source of bugs and frustration.

    If one wants SQL array semantics, it seems like the right way to get
    them is coercion to a SQL array value. Then one will receive SQL
    array semantics exactly.

    --
    fdr
  • David E. Wheeler at Jan 15, 2013 at 9:36 pm

    On Jan 15, 2013, at 12:17 PM, Andrew Dunstan wrote:

    I doubt I'm very representative either. People like David Wheeler, Taras Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask than me. I'm quite prepared to change it if that's the consensus.
    They’re JSON arrays, not SQL arrays, and JSON arrays are based on JavaScript, where they are 0-based.

    Best,

    David
  • Gavin Flower at Jan 15, 2013 at 9:34 pm

    On 16/01/13 08:04, David Fetter wrote:
    On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:
    On 01/14/2013 07:36 PM, Merlin Moncure wrote:
    While testing this I noticed that integer based 'get' routines are
    zero based -- was this intentional? Virtually all other aspects of
    SQL are 1 based:

    postgres=# select json_get('[1,2,3]', 1);
    json_get
    ----------
    2
    (1 row)

    postgres=# select json_get('[1,2,3]', 0);
    json_get
    ----------
    1
    (1 row)
    Yes. it's intentional. SQL arrays might be 1-based by default, but
    JavaScript arrays are not. JsonPath and similar gadgets treat the
    arrays as zero-based. I suspect the Json-using community would not
    thank us for being overly SQL-centric on this - and I say that as
    someone who has always thought zero based arrays were a major design
    mistake, responsible for countless off-by-one errors.
    Perhaps we could compromise by making arrays 0.5-based.

    Cheers,
    David.
    I think that is far to rational, perhaps the reciprocal of the golden
    ratio(0.618033...) would be more appropriate?

    I used to be insistent that arrays should start with 1, now I find
    starting at 0 far more natural - because evrytime you start an array at
    1, the computer has to subtract 1 in order to calculate the entry. Also
    both Java & C are zero based.

    I first learnt FORTRAN IV which is 1 based, had a shock when I was
    learning Algol and found it was 0 based - many moons ago...


    Cheers,
    Gavin
  • Peter Eisentraut at Jan 31, 2013 at 10:06 pm

    On 1/10/13 6:42 PM, Andrew Dunstan wrote:
    This updated patch contains all the intended functionality, including
    operators for the json_get_path functions, so you can say things like

    select jsonval->array['f1','0','f2] ...
    I would like to not create any -> operators, so that that syntax could
    be used in the future for method invocation or something similar (it's
    in the SQL standard).

    I also don't find the proposed use to be very intuitive. You invented
    lots of other function names -- why not invent a few more for this
    purpose that are clearer?
  • Andrew Dunstan at Jan 31, 2013 at 10:20 pm

    On 01/31/2013 05:06 PM, Peter Eisentraut wrote:
    On 1/10/13 6:42 PM, Andrew Dunstan wrote:
    This updated patch contains all the intended functionality, including
    operators for the json_get_path functions, so you can say things like

    select jsonval->array['f1','0','f2] ...
    I would like to not create any -> operators, so that that syntax could
    be used in the future for method invocation or something similar (it's
    in the SQL standard).

    This is the first time I have heard that we should stay away from this.
    We have operators with this name in hstore, which is why I chose it.

    Have we officially deprecated '->'? I know we deprecated "=>", but I
    simply don't recall anything about '->'.
    I also don't find the proposed use to be very intuitive. You invented
    lots of other function names -- why not invent a few more for this
    purpose that are clearer?

    I'm happy to take opinions about this, and I expected some bikeshedding,
    but your reaction is contrary to everything others have told me. Mostly
    they love the operators.

    I guess that '~>' and '~>>' would work as well as '->' and '->>'.


    cheers

    andrew
  • Merlin Moncure at Jan 31, 2013 at 11:35 pm

    On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan wrote:
    On 01/31/2013 05:06 PM, Peter Eisentraut wrote:
    On 1/10/13 6:42 PM, Andrew Dunstan wrote:

    This updated patch contains all the intended functionality, including
    operators for the json_get_path functions, so you can say things like

    select jsonval->array['f1','0','f2] ...
    I would like to not create any -> operators, so that that syntax could
    be used in the future for method invocation or something similar (it's
    in the SQL standard).


    This is the first time I have heard that we should stay away from this. We
    have operators with this name in hstore, which is why I chose it.

    Have we officially deprecated '->'? I know we deprecated "=>", but I simply
    don't recall anything about '->'.

    I also don't find the proposed use to be very intuitive. You invented
    lots of other function names -- why not invent a few more for this
    purpose that are clearer?

    I'm happy to take opinions about this, and I expected some bikeshedding, but
    your reaction is contrary to everything others have told me. Mostly they
    love the operators.

    I guess that '~>' and '~>>' would work as well as '->' and '->>'.
    also hstore implements ->

    quick off-topic aside: is colon (:) reserved for any purpose as an
    operator in SQL?

    merlin
  • Tom Lane at Feb 1, 2013 at 12:12 am

    Merlin Moncure writes:
    On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan wrote:
    On 01/31/2013 05:06 PM, Peter Eisentraut wrote:
    I would like to not create any -> operators, so that that syntax could
    be used in the future for method invocation or something similar (it's
    in the SQL standard).
    This is the first time I have heard that we should stay away from this. We
    have operators with this name in hstore, which is why I chose it.
    I'm not happy about this either. It's bad enough that we're thinking
    about taking away =>, but to disallow -> as well? My inclination is to
    just say no, we're not implementing that. Even if we remove the contrib
    operators named that way, it's insane to suppose that nobody has chosen
    these names for user-defined operators in their applications.
    quick off-topic aside: is colon (:) reserved for any purpose as an
    operator in SQL?
    We disallow it as an operator character, because of the conflict with
    parameter/variable syntax in ecpg and psql. It was allowed before
    PG 7.0, IIRC.

        regards, tom lane
  • Robert Haas at Feb 1, 2013 at 10:08 pm

    On Thu, Jan 31, 2013 at 7:12 PM, Tom Lane wrote:
    Merlin Moncure <mmoncure@gmail.com> writes:
    On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan wrote:
    On 01/31/2013 05:06 PM, Peter Eisentraut wrote:
    I would like to not create any -> operators, so that that syntax could
    be used in the future for method invocation or something similar (it's
    in the SQL standard).
    This is the first time I have heard that we should stay away from this. We
    have operators with this name in hstore, which is why I chose it.
    I'm not happy about this either. It's bad enough that we're thinking
    about taking away =>, but to disallow -> as well? My inclination is to
    just say no, we're not implementing that. Even if we remove the contrib
    operators named that way, it's insane to suppose that nobody has chosen
    these names for user-defined operators in their applications.
    I think it's smarter for us to ship functions, and let users wrap them
    in operators if they so choose. It's not difficult for people who
    want it to do it, and that way we dodge this whole mess.

    The thing that was really awful about hstore's => operator is that it
    was =>(text, text), meaning that if somebody else invented, say,
    xstore, they could not do the same thing that hstore did without
    colliding with hstore. I think we ought to have an ironclad rule that
    any operators introduced in our core distribution for particular types
    must have at least one argument of that type. Otherwise, we'll end up
    with a free-for-all where everyone tries to grab the "good" operator
    names (of which there are only a handful) for their type, and types
    we're adding five years from now will be stuck with no operator names
    at all, or dumb stuff like ~~~~>.

    But even leaving that aside, I'm surprised to hear so many people
    dismissing SQL standards compliance so blithely. We've certainly
    spent a lot of blood, sweat, and tears on minor standards-compliance
    issues over they years - why is it OK to not care about this
    particular issue when we've spent so much effort caring about other
    ones?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Daniel Farina at Feb 1, 2013 at 10:39 pm

    On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas wrote:
    On Thu, Jan 31, 2013 at 7:12 PM, Tom Lane wrote:
    Merlin Moncure <mmoncure@gmail.com> writes:
    On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan wrote:
    On 01/31/2013 05:06 PM, Peter Eisentraut wrote:
    I would like to not create any -> operators, so that that syntax could
    be used in the future for method invocation or something similar (it's
    in the SQL standard).
    This is the first time I have heard that we should stay away from this. We
    have operators with this name in hstore, which is why I chose it.
    I'm not happy about this either. It's bad enough that we're thinking
    about taking away =>, but to disallow -> as well? My inclination is to
    just say no, we're not implementing that. Even if we remove the contrib
    operators named that way, it's insane to suppose that nobody has chosen
    these names for user-defined operators in their applications.
    I think it's smarter for us to ship functions, and let users wrap them
    in operators if they so choose. It's not difficult for people who
    want it to do it, and that way we dodge this whole mess.
    Normally I'd agree with you, but I think there's a complexity here
    that is very frown-inducing, although I'm not positively inclined to
    state that it's so great that your suggested solution is not the least
    of all evils:

       http://www.postgresql.org/message-id/8551.1331580169@sss.pgh.pa.us

    The problem being: even though pg_operator resolves to functions in
    pg_proc, they have distinct identities as far as the planner is
    concerned w.r.t selectivity estimation and index selection. Already
    there is a slight hazard that some ORMs that want to grow hstore
    support will spell it "fetchval" and others "->". So far, infix
    syntax seems to be the common default, but a minor disagreement among
    ORMs or different user preferences will be destructive.

    Another way to look at this is that by depriving people multiple
    choices in the default install, that hazard goes away. But it also
    means that, practically, CREATE OPERATOR is going to be hazardous to
    use because almost all software is probably not going to assume the
    existence of any non-default installed operators for JSON, and those
    that do will not receive the benefit of indexes from software using
    the other notation. So, I think that if one takes the 'when in doubt,
    leave it out' approach you seem to be proposing, I also think that
    very little profitable use of CREATE OPERATOR will take place -- ORMs
    et al will choose the lowest common denominator (for good sensible
    reason) and flirting with CREATE OPERATOR will probably cause
    surprising plans, so I doubt it'll take hold.

    --
    fdr
  • Tom Lane at Feb 1, 2013 at 11:03 pm

    Daniel Farina writes:
    On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas wrote:
    I think it's smarter for us to ship functions, and let users wrap them
    in operators if they so choose. It's not difficult for people who
    The problem being: even though pg_operator resolves to functions in
    pg_proc, they have distinct identities as far as the planner is
    concerned w.r.t selectivity estimation and index selection.
    Yeah, this is surely not a workable policy unless we first move all
    those planner smarts to apply to functions not operators. And rewrite
    all the index AM APIs to use functions not operators, too. Now this is
    something that's been a wish-list item right along, but actually doing
    it has always looked like a great deal of work for rather small reward.

        regards, tom lane
  • Robert Haas at Feb 4, 2013 at 1:20 am

    On Fri, Feb 1, 2013 at 6:03 PM, Tom Lane wrote:
    Daniel Farina <daniel@heroku.com> writes:
    On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas wrote:
    I think it's smarter for us to ship functions, and let users wrap them
    in operators if they so choose. It's not difficult for people who
    The problem being: even though pg_operator resolves to functions in
    pg_proc, they have distinct identities as far as the planner is
    concerned w.r.t selectivity estimation and index selection.
    Yeah, this is surely not a workable policy unless we first move all
    those planner smarts to apply to functions not operators. And rewrite
    all the index AM APIs to use functions not operators, too. Now this is
    something that's been a wish-list item right along, but actually doing
    it has always looked like a great deal of work for rather small reward.
    Hmm. Well, if the operators are going to be indexable, then I agree
    that's an issue, but isn't -> just a key-extraction operator? That
    wouldn't be something you could index anyway.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andrew Dunstan at Feb 4, 2013 at 2:05 am

    On 02/03/2013 08:20 PM, Robert Haas wrote:
    On Fri, Feb 1, 2013 at 6:03 PM, Tom Lane wrote:
    Daniel Farina <daniel@heroku.com> writes:
    On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas wrote:
    I think it's smarter for us to ship functions, and let users wrap them
    in operators if they so choose. It's not difficult for people who
    The problem being: even though pg_operator resolves to functions in
    pg_proc, they have distinct identities as far as the planner is
    concerned w.r.t selectivity estimation and index selection.
    Yeah, this is surely not a workable policy unless we first move all
    those planner smarts to apply to functions not operators. And rewrite
    all the index AM APIs to use functions not operators, too. Now this is
    something that's been a wish-list item right along, but actually doing
    it has always looked like a great deal of work for rather small reward.
    Hmm. Well, if the operators are going to be indexable, then I agree
    that's an issue, but isn't -> just a key-extraction operator? That
    wouldn't be something you could index anyway.

    Er, what? It gives you the value corresponding to a key (or the numbered
    array element).


    With the Json operators I provided you're more likely to use ->> in an
    index, because it returns de-escaped text rather than json, but I don't
    see any reason in principle why -> couldn't be used.



    cheers

    andrew
  • Robert Haas at Feb 4, 2013 at 3:48 pm

    On Sun, Feb 3, 2013 at 9:05 PM, Andrew Dunstan wrote:
    Yeah, this is surely not a workable policy unless we first move all
    those planner smarts to apply to functions not operators. And rewrite
    all the index AM APIs to use functions not operators, too. Now this is
    something that's been a wish-list item right along, but actually doing
    it has always looked like a great deal of work for rather small reward.
    Hmm. Well, if the operators are going to be indexable, then I agree
    that's an issue, but isn't -> just a key-extraction operator? That
    wouldn't be something you could index anyway.
    Er, what? It gives you the value corresponding to a key (or the numbered
    array element).
    That's what I figured.
    With the Json operators I provided you're more likely to use ->> in an
    index, because it returns de-escaped text rather than json, but I don't see
    any reason in principle why -> couldn't be used.
    The point is that if you're talking about indexing something like
    myeav->'andrew' you could equally well index json_get(myeav,
    'andrew'). So there's no real need for it to be an operator rather
    than a function.

    The case in which it would matter is if it were something that could
    be used as an index predicate, like:

    Index Scan
       -> Index Cond: myeav->'andrew'

    As of now, the query planner won't consider such a plan if it's only a
    function and not an operator. So if we had a case like that, the use
    of operator notation could be justified on performance grounds. If we
    don't, I argue that it's better to stick with functional notation,
    because the number of sensible function names is much larger than the
    number of sensible operator names, and if we start using operator
    notation every time someone thinks it will look nicer that way, we
    will very quickly either run out of nice-looking operator names or
    start overloading them heavily.

    The SQL standards considerations seem worth thinking about, too.
    We've certainly gone through a lot of pain working toward eliminating
    => as an operator name, and if the SQL standard has commandeered ->
    for some purpose or other, I'd really rather not add to the headaches
    involved should we ever decide to reclaim it.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Andrew Dunstan at Feb 4, 2013 at 4:11 pm

    On 02/04/2013 10:47 AM, Robert Haas wrote:
    The SQL standards considerations seem worth thinking about, too.
    We've certainly gone through a lot of pain working toward eliminating
    => as an operator name, and if the SQL standard has commandeered ->
    for some purpose or other, I'd really rather not add to the headaches
    involved should we ever decide to reclaim it.

    OK, but I'd like to know what is going to be safe. There's no way to
    future-proof the language. I'm quite prepared to replace -> with
    something else, and if I do then ->> will need to be adjusted
    accordingly, I think.

    My suggestion would be ~> and ~>>. I know David Wheeler didn't like that
    on the ground that some fonts elevate ~ rather than aligning it in the
    middle as most monospaced fonts do, but I'm tempted just to say "then
    use a different font." Other possibilities that come to mind are +> and
    +>>, although I think they're less attractive. But I'll be guided by the
    consensus, assuming there is one ;-)

    cheers

    andrew
  • Benedikt Grundmann at Feb 4, 2013 at 4:19 pm
    On Mon, Feb 4, 2013 at 4:10 PM, Andrew Dunstan wrote:
    On 02/04/2013 10:47 AM, Robert Haas wrote:


    The SQL standards considerations seem worth thinking about, too.
    We've certainly gone through a lot of pain working toward eliminating
    => as an operator name, and if the SQL standard has commandeered ->
    for some purpose or other, I'd really rather not add to the headaches
    involved should we ever decide to reclaim it.

    OK, but I'd like to know what is going to be safe. There's no way to
    future-proof the language. I'm quite prepared to replace -> with something
    else, and if I do then ->> will need to be adjusted accordingly, I think.

    My suggestion would be ~> and ~>>. I know David Wheeler didn't like that
    on the ground that some fonts elevate ~ rather than aligning it in the
    middle as most monospaced fonts do, but I'm tempted just to say "then use a
    different font." Other possibilities that come to mind are +> and +>>,
    although I think they're less attractive. But I'll be guided by the
    consensus, assuming there is one ;-)

    As a user I would be much in favor of just functions and no additional
    operators if the sole difference is syntactical. I think custom operators
    are much harder to remember than function names (assuming reasonably well
    chosen function names).

    Now Robert seems to suggest that there will also be speed / planner
    difference which seems sad (I would have expected operators to be just
    syntactical sugar for specially named functions and once we are past the
    parser there should be no difference).
  • Merlin Moncure at Feb 4, 2013 at 5:57 pm

    On Mon, Feb 4, 2013 at 10:18 AM, Benedikt Grundmann wrote:


    On Mon, Feb 4, 2013 at 4:10 PM, Andrew Dunstan wrote:

    On 02/04/2013 10:47 AM, Robert Haas wrote:


    The SQL standards considerations seem worth thinking about, too.
    We've certainly gone through a lot of pain working toward eliminating
    => as an operator name, and if the SQL standard has commandeered ->
    for some purpose or other, I'd really rather not add to the headaches
    involved should we ever decide to reclaim it.


    OK, but I'd like to know what is going to be safe. There's no way to
    future-proof the language. I'm quite prepared to replace -> with something
    else, and if I do then ->> will need to be adjusted accordingly, I think.

    My suggestion would be ~> and ~>>. I know David Wheeler didn't like that
    on the ground that some fonts elevate ~ rather than aligning it in the
    middle as most monospaced fonts do, but I'm tempted just to say "then use a
    different font." Other possibilities that come to mind are +> and +>>,
    although I think they're less attractive. But I'll be guided by the
    consensus, assuming there is one ;-)
    As a user I would be much in favor of just functions and no additional
    operators if the sole difference is syntactical. I think custom operators
    are much harder to remember than function names (assuming reasonably well
    chosen function names).
    couple quick observations:

    *) just about all postgres extension types expose operators -- problem
    is not specific to json (and therefore IMO irrelevant to 9.3 release
    of enhancements)

    *) hstore exposes ->. I use it all over the place. I find operator
    to be terse and readable -- much more so than function definition.
    Ok, operator such as "@-@" is pretty silly, but "->" for get is
    natural. The cat is out of the bag, so removing -> for 9.3 for
    production seems pretty fruitless.

    *) Removing -> (breaking all my and countless others' hstore dependent
    code) should not happen until there is a very good reason. This was
    extensively discussed in development of hstore. Breaking
    compatibility sucks -- my company is just wrapping up a 12 month code
    overhaul so we could get off 8.1.

    *) it's bad enough that we drift from sql naming conventions and all
    type manipulation functions (except in hstore) with type name.
    json_get etc. at least using operators allow avoiding some of that
    unnecessary verbosity. what's next: text_concatenate('foo', 'bar')?

    merlin
  • Andrew Dunstan at Feb 4, 2013 at 6:07 pm

    On 02/04/2013 12:57 PM, Merlin Moncure wrote:

    *) it's bad enough that we drift from sql naming conventions and all
    type manipulation functions (except in hstore) with type name.
    json_get etc. at least using operators allow avoiding some of that
    unnecessary verbosity. what's next: text_concatenate('foo', 'bar')?
    This names aren't set in stone either. I've been expecting some
    bikeshedding there, and I'm surprised there hasn't been more.

    cheers

    andrew

Related Discussions

People

Translate

site design / logo © 2021 Grokbase