Hi,

Back in 2002 these were proposed, what happened to them?

http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php


Also I note:

colin@ruby:~/workspace/eyedb$ psql
psql (8.4.4)
Type "help" for help.

colin=> select to_date('731332', 'YYMMDD');
to_date
------------
1974-02-01
(1 row)

colin=>


The fact that this wraps would seem to me to make the implementation of
is_date() difficult.


I'm trying to query character strings for valid dates but can't see how to
do this quickly... but for that discussion I will move to pgsql-general :-)

Cheers,

Colin

Search Discussions

  • Andrew Dunstan at Sep 20, 2010 at 2:54 pm

    On 09/20/2010 10:29 AM, Colin 't Hart wrote:
    Hi,

    Back in 2002 these were proposed, what happened to them?

    http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php

    2002 is a looooooooong time ago.

    Also I note:

    colin@ruby:~/workspace/eyedb$ psql
    psql (8.4.4)
    Type "help" for help.

    colin=> select to_date('731332', 'YYMMDD');
    to_date
    ------------
    1974-02-01
    (1 row)

    colin=>


    The fact that this wraps would seem to me to make the implementation
    of is_date() difficult.

    I think to_date is the wrong gadget to use here. You should probably be
    using the date input routine and trapping any data exception. e.g.:

    test_date := date_in(textout(some_text));

    In plpgsql you'd put that inside a begin/exception/end block that traps
    SQLSTATE '22000' which is the class covering data exceptions.

    cheers

    andrew
  • Colin 't Hart at Sep 20, 2010 at 3:31 pm

    On 20 September 2010 16:54, Andrew Dunstan wrote:
    On 09/20/2010 10:29 AM, Colin 't Hart wrote:

    Hi,

    Back in 2002 these were proposed, what happened to them?

    http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php

    2002 is a looooooooong time ago. <snip>
    I think to_date is the wrong gadget to use here. You should probably be using the date input routine and trapping any data exception. e.g.:

    test_date := date_in(textout(some_text));

    In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering data exceptions.
    So it's not possible using pure SQL unless one writes a function?

    Are the is_<type> family of functions still desired?

    Also, where are the to_<type> conversions done?

    Thanks,

    Colin
  • Robert Haas at Sep 21, 2010 at 3:03 pm

    On Mon, Sep 20, 2010 at 11:31 AM, Colin 't Hart wrote:
    I think to_date is the wrong gadget to use here. You should probably be using the date input routine and trapping any data exception. e.g.:

    test_date := date_in(textout(some_text));

    In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering data exceptions.
    So it's not possible using pure SQL unless one writes a function?
    I think that is true.
    Are the is_<type> family of functions still desired?
    I think it would be useful to have a way of testing whether a cast to
    a given type will succeed. The biggest problem with the
    exception-catching method is not that it requires writing a function
    (which, IMHO, is no big deal) but that exception handling is pretty
    slow and inefficient. You end up doing things like... write a regexp
    to see whether the data is in approximately the right format and then
    if it is try the cast inside an exception block. Yuck.

    (On the other hand, whether the work that was done in 2002 is still
    relevant to today's code is questionable. Things have changed a lot.)
    Also, where are the to_<type> conversions done?
    I think maybe you are looking for the type input functions?

    select typname, typinput::regprocedure from pg_type;

    There are also some functions with names of the form to_<type>. You
    can get a list of those with the following psql command:

    \dfS to_*

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Tom Lane at Sep 21, 2010 at 3:50 pm

    Robert Haas writes:
    I think it would be useful to have a way of testing whether a cast to
    a given type will succeed. The biggest problem with the
    exception-catching method is not that it requires writing a function
    (which, IMHO, is no big deal) but that exception handling is pretty
    slow and inefficient. You end up doing things like... write a regexp
    to see whether the data is in approximately the right format and then
    if it is try the cast inside an exception block. Yuck.
    The problem here is that putting the exception handling in C doesn't
    make things any better: it's still slow and inefficient. And in the
    general case the only way to be sure that a string will be accepted by
    the input function is to try it.

    regards, tom lane
  • Robert Haas at Sep 21, 2010 at 3:56 pm

    On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    I think it would be useful to have a way of testing whether a cast to
    a given type will succeed.  The biggest problem with the
    exception-catching method is not that it requires writing a function
    (which, IMHO, is no big deal) but that exception handling is pretty
    slow and inefficient.  You end up doing things like... write a regexp
    to see whether the data is in approximately the right format and then
    if it is try the cast inside an exception block.  Yuck.
    The problem here is that putting the exception handling in C doesn't
    make things any better: it's still slow and inefficient.  And in the
    general case the only way to be sure that a string will be accepted by
    the input function is to try it.
    Given the current API, that is true.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Alvaro Herrera at Sep 21, 2010 at 5:02 pm

    Excerpts from Robert Haas's message of mar sep 21 11:56:51 -0400 2010:
    On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane wrote:

    The problem here is that putting the exception handling in C doesn't
    make things any better: it's still slow and inefficient.  And in the
    general case the only way to be sure that a string will be accepted by
    the input function is to try it.
    Given the current API, that is true.
    So we could refactor the input functions so that there's an internal
    function that returns the accepted datum in the OK case and an ErrorData
    for the failure case. The regular input function would just throw the
    error data in the latter case; but this would allow another function to
    just return whether it worked or not.

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Tom Lane at Sep 21, 2010 at 5:42 pm

    Alvaro Herrera writes:
    On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane wrote:
    The problem here is that putting the exception handling in C doesn't
    make things any better:
    So we could refactor the input functions so that there's an internal
    function that returns the accepted datum in the OK case and an ErrorData
    for the failure case.
    This makes the untenable assumption that there are no elog(ERROR)s in
    the "internal" input function *or anything it calls*. Short of truly
    massive restructuring, including uglifying many internal APIs to have
    error return codes instead of allowing elog within the callee, you will
    never make this work for anything more complicated than say float8in().

    regards, tom lane
  • Alvaro Herrera at Sep 21, 2010 at 6:09 pm

    Excerpts from Tom Lane's message of mar sep 21 13:41:32 -0400 2010:
    Alvaro Herrera <alvherre@commandprompt.com> writes:
    On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane wrote:
    The problem here is that putting the exception handling in C doesn't
    make things any better:
    So we could refactor the input functions so that there's an internal
    function that returns the accepted datum in the OK case and an ErrorData
    for the failure case.
    This makes the untenable assumption that there are no elog(ERROR)s in
    the "internal" input function *or anything it calls*. Short of truly
    massive restructuring, including uglifying many internal APIs to have
    error return codes instead of allowing elog within the callee, you will
    never make this work for anything more complicated than say float8in().
    ... which is what people want anyway. I mean, the day someone requests
    is_sthcomplex, we could happily tell them that they need to use the
    expensive workaround involving savepoints. I don't think we really need
    to support the ones that would require truly expensive refactoring; the
    simple ones would cover 99% of the use cases.

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Tom Lane at Sep 21, 2010 at 11:10 pm

    Alvaro Herrera writes:
    Excerpts from Tom Lane's message of mar sep 21 13:41:32 -0400 2010:
    ...never make this work for anything more complicated than say float8in().
    ... which is what people want anyway. I mean, the day someone requests
    is_sthcomplex, we could happily tell them that they need to use the
    expensive workaround involving savepoints. I don't think we really need
    to support the ones that would require truly expensive refactoring; the
    simple ones would cover 99% of the use cases.
    Robert was complaining about COPY in particular. It's hard to see how
    you make any progress on that if you don't have pretty near 100%
    coverage of datatypes. I don't object if someone puts in is_integer,
    is_float, etc; but that's nowhere near a general purpose solution.

    regards, tom lane
  • Greg Stark at Sep 21, 2010 at 5:45 pm

    On Tue, Sep 21, 2010 at 6:02 PM, Alvaro Herrera wrote:
    So we could refactor the input functions so that there's an internal
    function that returns the accepted datum in the OK case and an ErrorData
    for the failure case.  The regular input function would just throw the
    error data in the latter case; but this would allow another function to
    just return whether it worked or not.
    You're assuming the input function won't have any work it has to undo
    which it would need the savepoint for anyways. For most of the
    built-in datatypes -- all of the ones intended for holding real data
    -- that's true. But for things like regclass or regtype it might not
    be and for user-defined data types.... who knows?

    Of course all people really want is to test whether something is a
    valid integer, floating point value, etc.

    --
    greg
  • Robert Haas at Sep 21, 2010 at 5:56 pm

    On Tue, Sep 21, 2010 at 1:45 PM, Greg Stark wrote:
    On Tue, Sep 21, 2010 at 6:02 PM, Alvaro Herrera
    wrote:
    So we could refactor the input functions so that there's an internal
    function that returns the accepted datum in the OK case and an ErrorData
    for the failure case.  The regular input function would just throw the
    error data in the latter case; but this would allow another function to
    just return whether it worked or not.
    You're assuming the input function won't have any work it has to undo
    which it would need the savepoint for anyways. For most of the
    built-in datatypes -- all of the ones intended for holding real data
    -- that's true. But for things like regclass or regtype it might not
    be and for user-defined data types.... who knows?

    Of course all people really want is to test whether something is a
    valid integer, floating point value, etc.
    Right. Or a date - that's a case that comes up for me pretty
    frequently. It's not too hard to write a regular expression to test
    whether something is an integer -- although there is the question of
    whether it will overflow, which is sometimes relevant -- but a date or
    timestamp field is a bit harder.

    I don't understand the argument that we need type input functions to
    be protected by a savepoint. That seems crazy to me. We're taking a
    huge performance penalty here to protect against something that seems
    insane to me in the first instance. Not to mention cutting ourselves
    off from really important features, like the ability to recover from
    errors during COPY. I don't understand why we can't just make some
    rules about what type input functions are allowed to do. And if you
    break those rules then you get to keep both pieces. Why is this
    unreasonable? A savepoint can hardly protect you against damage
    inflicted by the execution of arbitrary code; IOW, we're already
    relying on the user to follow some rules.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Tom Lane at Sep 21, 2010 at 11:07 pm

    Robert Haas writes:
    I don't understand the argument that we need type input functions to
    be protected by a savepoint. That seems crazy to me. We're taking a
    huge performance penalty here to protect against something that seems
    insane to me in the first instance. Not to mention cutting ourselves
    off from really important features, like the ability to recover from
    errors during COPY. I don't understand why we can't just make some
    rules about what type input functions are allowed to do.
    There are many rules that you could possibly make for type input
    functions. But "you cannot throw an error" is not one of them ---
    or at least, not one that you can usefully expect to be followed
    for anything more than trivial straightline code.

    The poster child for this is of course domain_in(). But even without
    that, I don't think you can realistically legislate that no errors be
    thrown by something of the complexity of, say, the timestamp input
    functions. Just for starters, what of a palloc() failure?

    regards, tom lane
  • Andres Freund at Sep 21, 2010 at 11:09 pm

    On Wednesday 22 September 2010 01:05:39 Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    I don't understand the argument that we need type input functions to
    be protected by a savepoint. That seems crazy to me. We're taking a
    huge performance penalty here to protect against something that seems
    insane to me in the first instance. Not to mention cutting ourselves
    off from really important features, like the ability to recover from
    errors during COPY. I don't understand why we can't just make some
    rules about what type input functions are allowed to do.
    There are many rules that you could possibly make for type input
    functions. But "you cannot throw an error" is not one of them ---
    or at least, not one that you can usefully expect to be followed
    for anything more than trivial straightline code.

    The poster child for this is of course domain_in(). But even without
    that, I don't think you can realistically legislate that no errors be
    thrown by something of the complexity of, say, the timestamp input
    functions. Just for starters, what of a palloc() failure?
    Uhm. Isnt a palloc failure a really, really bad example because it will kill
    the session anyway? FATAL+ is not relevant in that context, right?

    Andres
  • Tom Lane at Sep 21, 2010 at 11:35 pm

    Andres Freund writes:
    On Wednesday 22 September 2010 01:05:39 Tom Lane wrote:
    Just for starters, what of a palloc() failure?
    Uhm. Isnt a palloc failure a really, really bad example because it will kill
    the session anyway? FATAL+ is not relevant in that context, right?
    Huh? It's not fatal, just elog(ERROR, "out of memory").

    regards, tom lane
  • Robert Haas at Sep 21, 2010 at 11:55 pm

    On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    I don't understand the argument that we need type input functions to
    be protected by a savepoint.  That seems crazy to me.  We're taking a
    huge performance penalty here to protect against something that seems
    insane to me in the first instance.  Not to mention cutting ourselves
    off from really important features, like the ability to recover from
    errors during COPY.  I don't understand why we can't just make some
    rules about what type input functions are allowed to do.
    There are many rules that you could possibly make for type input
    functions.  But "you cannot throw an error" is not one of them ---
    or at least, not one that you can usefully expect to be followed
    for anything more than trivial straightline code.
    OK. This is one of the things I don't understand. Why does throwing
    an error imply that we need to abort the current transaction? Why
    can't we just catch the longjmp() and trundle onwards? Obviously,
    that's unsafe if a pretty wide variety of cases, but if you're just
    scrutinizing the input string (even with a little bit of read-only
    database access) it's not obvious to me what can go wrong. (I assume
    there is something, but I don't know what it is...)

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Bruce Momjian at Sep 21, 2010 at 11:57 pm

    Robert Haas wrote:
    On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    I don't understand the argument that we need type input functions to
    be protected by a savepoint. ?That seems crazy to me. ?We're taking a
    huge performance penalty here to protect against something that seems
    insane to me in the first instance. ?Not to mention cutting ourselves
    off from really important features, like the ability to recover from
    errors during COPY. ?I don't understand why we can't just make some
    rules about what type input functions are allowed to do.
    There are many rules that you could possibly make for type input
    functions. ?But "you cannot throw an error" is not one of them ---
    or at least, not one that you can usefully expect to be followed
    for anything more than trivial straightline code.
    OK. This is one of the things I don't understand. Why does throwing
    an error imply that we need to abort the current transaction? Why
    can't we just catch the longjmp() and trundle onwards? Obviously,
    that's unsafe if a pretty wide variety of cases, but if you're just
    scrutinizing the input string (even with a little bit of read-only
    database access) it's not obvious to me what can go wrong. (I assume
    there is something, but I don't know what it is...)
    That would be interesting. You would need to flag that this was not a
    longjump requiring cleanup, but the harder part would be getting back to
    where the error occurred. I guess you could rerun the query. :-|

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Jeff Davis at Sep 22, 2010 at 1:54 am

    On Tue, 2010-09-21 at 19:55 -0400, Robert Haas wrote:
    OK. This is one of the things I don't understand. Why does throwing
    an error imply that we need to abort the current transaction? Why
    can't we just catch the longjmp() and trundle onwards? Obviously,
    that's unsafe if a pretty wide variety of cases, but if you're just
    scrutinizing the input string (even with a little bit of read-only
    database access) it's not obvious to me what can go wrong. (I assume
    there is something, but I don't know what it is...)
    The worry (from me) would be the "little bit of read-only database
    access". If you SPI_connect() without an SPI_finish(), that sounds like
    a potential problem (as would anything else that requires cleanup).

    Regards,
    Jeff Davis
  • Tom Lane at Sep 24, 2010 at 7:42 pm

    Robert Haas writes:
    On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane wrote:
    There are many rules that you could possibly make for type input
    functions.  But "you cannot throw an error" is not one of them ---
    or at least, not one that you can usefully expect to be followed
    for anything more than trivial straightline code.
    OK. This is one of the things I don't understand. Why does throwing
    an error imply that we need to abort the current transaction? Why
    can't we just catch the longjmp() and trundle onwards? Obviously,
    that's unsafe if a pretty wide variety of cases, but if you're just
    scrutinizing the input string (even with a little bit of read-only
    database access) it's not obvious to me what can go wrong.
    The problem is to know that "all you did" was scrutinize the input
    string. If it's simple straightline code (even with some C library
    calls) then you can know that, but then you can write such code without
    including any elog(ERROR) in it in the first place. If you are trapping
    longjmps then what you'd need to assert is that no error thrown from
    anywhere in any of the code reachable from that place represents a
    problem that requires transaction abort to clean up after. This gets
    unmaintainable remarkably quickly, especially if you invoke anything
    as complicated as database access. And then there are asynchronous
    error reasons (query cancel) which you shouldn't trap in any case.

    regards, tom lane
  • Robert Haas at Sep 25, 2010 at 3:15 am

    On Fri, Sep 24, 2010 at 3:41 PM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane wrote:
    There are many rules that you could possibly make for type input
    functions.  But "you cannot throw an error" is not one of them ---
    or at least, not one that you can usefully expect to be followed
    for anything more than trivial straightline code.
    OK.  This is one of the things I don't understand.  Why does throwing
    an error imply that we need to abort the current transaction?  Why
    can't we just catch the longjmp() and trundle onwards?  Obviously,
    that's unsafe if a pretty wide variety of cases, but if you're just
    scrutinizing the input string (even with a little bit of read-only
    database access) it's not obvious to me what can go wrong.
    The problem is to know that "all you did" was scrutinize the input
    string.  If it's simple straightline code (even with some C library
    calls) then you can know that, but then you can write such code without
    including any elog(ERROR) in it in the first place.  If you are trapping
    longjmps then what you'd need to assert is that no error thrown from
    anywhere in any of the code reachable from that place represents a
    problem that requires transaction abort to clean up after.  This gets
    unmaintainable remarkably quickly, especially if you invoke anything
    as complicated as database access.  And then there are asynchronous
    error reasons (query cancel) which you shouldn't trap in any case.
    Hmm. So the problem is that we don't want to accidentally catch an
    error that isn't actually safe to catch. We could probably mitigate
    this problem to a considerable degree by throwing data validation
    errors using some special flag that say "this is a recoverable error".
    And if that flag isn't set then we abort the whole transaction, but
    if it is then we continue on. It's still possible for the person
    writing the typinput function to set that flag when they should not,
    but at least it's less likely to happen by accident. Another
    alternative would be to create some kind of explicit way for the
    function to RETURN an error instead of throwing it.

    But neither of these things is totally bullet-proof, because you could
    still do something that requires clean-up and then lie about it. To
    protect against that, you'd presumably need to set some kind of a flag
    whenever, say, a heap tuple gets modified, and then you could assert
    said flag false. What, other than writing to the database, requires
    subtransaction cleanup?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Gurjeet Singh at Sep 25, 2010 at 8:36 am

    On Fri, Sep 24, 2010 at 11:15 PM, Robert Haas wrote:
    On Fri, Sep 24, 2010 at 3:41 PM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane wrote:
    There are many rules that you could possibly make for type input
    functions. But "you cannot throw an error" is not one of them ---
    or at least, not one that you can usefully expect to be followed
    for anything more than trivial straightline code.
    OK. This is one of the things I don't understand. Why does throwing
    an error imply that we need to abort the current transaction? Why
    can't we just catch the longjmp() and trundle onwards? Obviously,
    that's unsafe if a pretty wide variety of cases, but if you're just
    scrutinizing the input string (even with a little bit of read-only
    database access) it's not obvious to me what can go wrong.
    The problem is to know that "all you did" was scrutinize the input
    string. If it's simple straightline code (even with some C library
    calls) then you can know that, but then you can write such code without
    including any elog(ERROR) in it in the first place. If you are trapping
    longjmps then what you'd need to assert is that no error thrown from
    anywhere in any of the code reachable from that place represents a
    problem that requires transaction abort to clean up after. This gets
    unmaintainable remarkably quickly, especially if you invoke anything
    as complicated as database access. And then there are asynchronous
    error reasons (query cancel) which you shouldn't trap in any case.
    Hmm. So the problem is that we don't want to accidentally catch an
    error that isn't actually safe to catch. We could probably mitigate
    this problem to a considerable degree by throwing data validation
    errors using some special flag that say "this is a recoverable error".
    And if that flag isn't set then we abort the whole transaction, but
    if it is then we continue on. It's still possible for the person
    writing the typinput function to set that flag when they should not,
    but at least it's less likely to happen by accident. Another
    alternative would be to create some kind of explicit way for the
    function to RETURN an error instead of throwing it.

    But neither of these things is totally bullet-proof, because you could
    still do something that requires clean-up and then lie about it. To
    protect against that, you'd presumably need to set some kind of a flag
    whenever, say, a heap tuple gets modified, and then you could assert
    said flag false. What, other than writing to the database, requires
    subtransaction cleanup?
    Andrew suggested upthread:

    <snip>
    test_date := date_in(textout(some_text));

    In plpgsql you'd put that inside a begin/exception/end block that traps
    SQLSTATE '22000' which is the class covering data exceptions.
    </snip>


    In context of COPY, can we check for this SQLSTATE in PG_CATCH() and
    avoid PG_RE_THROW(). This would require that all type input functions call
    errcode() with proper MAKE_SQLSTATE() as part of ereport()/elog().

    Regards,
    --
    gurjeet.singh
    @ EnterpriseDB - The Enterprise Postgres Company
    http://www.EnterpriseDB.com

    singh.gurjeet@{ gmail | yahoo }.com
    Twitter/Skype: singh_gurjeet

    Mail sent from my BlackLaptop device
  • Tom Lane at Sep 25, 2010 at 2:39 pm

    Gurjeet Singh writes:
    On Fri, Sep 24, 2010 at 11:15 PM, Robert Haas wrote:
    Hmm. So the problem is that we don't want to accidentally catch an
    error that isn't actually safe to catch. We could probably mitigate
    this problem to a considerable degree by throwing data validation
    errors using some special flag that say "this is a recoverable error".
    Andrew suggested upthread:
    <snip>
    test_date := date_in(textout(some_text));
    In plpgsql you'd put that inside a begin/exception/end block that traps
    SQLSTATE '22000' which is the class covering data exceptions.
    </snip>
    In context of COPY, can we check for this SQLSTATE in PG_CATCH() and
    avoid PG_RE_THROW(). This would require that all type input functions call
    errcode() with proper MAKE_SQLSTATE() as part of ereport()/elog().
    This is all pretty much a dead end, because it offers no confidence
    whatsoever. Suppose that COPY calls type X's input function, which
    calls function Y, which calls function Z. Z doesn't like what it sees
    so it throws an error, which it marks "recoverable" since Z hasn't
    done anything dangerous. Unfortunately, Y *did* do something that
    requires cleanup. If COPY catches the longjmp and decides that it
    can skip doing a transaction abort, you're screwed.

    Again, the problem doesn't really arise as long as you're thinking about
    simple code. You can fairly easily verify that no unsafe situation
    arises so long as there's only one or two layers of functions involved.
    But this approach just doesn't scale to anything complicated.


    What I'm wondering is whether we can fix this by reducing the overhead
    of subtransactions, enough so that we can afford to run each row's input
    function calls within a subxact. In the past that was dismissed because
    you'd run out of subxact XIDs at 4G rows. But we have "lazy" assignment
    of XIDs now, so a subxact that didn't actually try to modify the
    database shouldn't need to consume any permanent resources. Then we're
    just looking at the time needed to call all the per-module subxact start
    and subxact cleanup functions, which seems like something that might be
    optimizable for the typical case where nothing actually needs to be
    done.

    regards, tom lane
  • Robert Haas at Sep 26, 2010 at 3:03 am

    On Sat, Sep 25, 2010 at 10:34 AM, Tom Lane wrote:
    This is all pretty much a dead end, because it offers no confidence
    whatsoever.  Suppose that COPY calls type X's input function, which
    calls function Y, which calls function Z.  Z doesn't like what it sees
    so it throws an error, which it marks "recoverable" since Z hasn't
    done anything dangerous.  Unfortunately, Y *did* do something that
    requires cleanup.  If COPY catches the longjmp and decides that it
    can skip doing a transaction abort, you're screwed.
    Yep. Although it seems a bit pathological for Z to do that, because
    if Y is doing something like taking an LWLock then Z is some low-level
    internals function that is not in a good position to judge whether
    error recovery is feasible. The point is not to recover from as many
    errors as possible, but to recover specifically from *data validation*
    errors, which I would not expect to be the sort of thing thrown from
    someplace deep down in the call stack where we're deep in the middle
    of things. The toplevel typinput is a pretty good position to know
    whether it's done anything shady.
    What I'm wondering is whether we can fix this by reducing the overhead
    of subtransactions, enough so that we can afford to run each row's input
    function calls within a subxact.  In the past that was dismissed because
    you'd run out of subxact XIDs at 4G rows.  But we have "lazy" assignment
    of XIDs now, so a subxact that didn't actually try to modify the
    database shouldn't need to consume any permanent resources.  Then we're
    just looking at the time needed to call all the per-module subxact start
    and subxact cleanup functions, which seems like something that might be
    optimizable for the typical case where nothing actually needs to be
    done.
    Well, reducing the overhead of subtransaction cleanup would certainly
    be VERY nice, as it would benefit a FAR broader set of use cases than
    just typinput functions. It seems a bit tricky though, because
    AbortSubTransaction() calls a whole LOT of cleanup functions, and many
    of them already have fast-paths. Where do you anticipate getting a
    further large speed-up out of that? The problem seems particularly
    tricky because those functions are cleaning up different subsystems.
    Maybe you could group them in some way and figure out some method of
    skipping entire groups with some kind of super-duper fast path, but
    it's not obvious to me how to make that work. And I think you'd need
    a pretty considerable speed-up, too. My gut says that even knocking
    50% off, while it might be really nice for other reasons, is not going
    to be enough to make sticking it inside COPY workable. I bet you need
    an order-of-magnitude speed-up, maybe more.

    It seems like a good slice of the problem here comes from the
    difficulties of being certain what the state is after a longjmp. It
    seems like you could get around all of these difficulties almost
    completely if the type input function were empowered to return either
    (1) a Datum which is the result of the conversion or (2) an SQLSTATE
    and error message indicating what went wrong. We're already willing
    to believe that cleanup isn't required when the function returns
    successfully, so we ought to also believe it when the function returns
    a failure result (as opposed to throwing an error indicating a
    failure). The conditions that require cleanup here are probably
    transient: take an LWLock, do something, release the LWLock. As long
    as you know that you haven't stopped somewhere in the middle of that
    sequence, it seems like it should be reasonably safe.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Darren Duncan at Sep 25, 2010 at 3:51 am

    Colin 't Hart wrote:
    The fact that this wraps would seem to me to make the implementation of
    is_date() difficult.
    Having separate is_foo() syntax per type is a bad design idea, same as having a
    different equality test like eq_int() or assignment syntax like assign_str() per
    type.

    There should just be a single syntax that works for all types, in the general
    case, for testing whether a value is a member of that type, or alternately
    whether a value can be cast to a particular type.

    For example, one could say "is_type( <value>, <type-name> )" or it could be
    spelled "isa()" or if you wanted to be more ambitious it could be an infix op,
    like "<value> isa <type-name>" to test when a value is of a type already.

    Pg already gets it right in this regard by having a single general syntax for
    type casting, the "<value>::<type-name>" and value membership of a type should
    be likewise.

    Maybe to test if a value can be cast as a type, you can continue the ::
    mnemonic, say adding a "?" for yes and a "!" for no.

    For example, "<value>?::<type-name>" tests if the value can be cast as the type
    and "<value>!::<type-name>" or "not <value>?::<type-name>" tests the opposite.
    An expression like this results in a boolean.

    -- Darren Duncan
  • Craig Ringer at Sep 25, 2010 at 7:01 am

    On 25/09/2010 11:51 AM, Darren Duncan wrote:
    Colin 't Hart wrote:
    The fact that this wraps would seem to me to make the implementation
    of is_date() difficult.
    Having separate is_foo() syntax per type is a bad design idea, same as
    having a different equality test like eq_int() or assignment syntax like
    assign_str() per type.

    There should just be a single syntax that works for all types, in the
    general case, for testing whether a value is a member of that type, or
    alternately whether a value can be cast to a particular type.
    Good point. That also permits a general-case implementation that catches
    any exceptions thrown, with optimized exception-free cases for
    int/date/string etc behind the scenes.

    That'd do a good job of protecting the SQL programmer from having to
    deal with which types had tests and which they had to use exception
    handling for. It'd have to be documented due to the performance
    differences, but it'd be great to be able to do this in a general
    purpose way at the SQL level.
    Pg already gets it right in this regard by having a single general
    syntax for type casting, the "<value>::<type-name>" and value membership
    of a type should be likewise.
    or the standard:

    CAST(value AS typename)
    Maybe to test if a value can be cast as a type, you can continue the ::
    mnemonic, say adding a "?" for yes and a "!" for no.

    For example, "<value>?::<type-name>" tests if the value can be cast as
    the type and "<value>!::<type-name>" or "not <value>?::<type-name>"
    tests the opposite. An expression like this results in a boolean.
    Personal opinion here: Blech, if I wanted to use Perl6 I'd do so ;-)

    Good shorthand, I guess, but a CAST syntax extension or alternate CAST
    version would be a bonus for readability.

    --
    Craig Ringer

    Tech-related writing at http://soapyfrogs.blogspot.com/
  • Darren Duncan at Sep 25, 2010 at 8:05 am

    Craig Ringer wrote:
    On 25/09/2010 11:51 AM, Darren Duncan wrote:
    There should just be a single syntax that works for all types, in the
    general case, for testing whether a value is a member of that type, or
    alternately whether a value can be cast to a particular type.
    <snip>
    Pg already gets it right in this regard by having a single general
    syntax for type casting, the "<value>::<type-name>" and value membership
    of a type should be likewise.
    or the standard:

    CAST(value AS typename)
    Indeed. The exact syntax doesn't matter to me but the point is that the type
    name is its own lexical element, conceptually a function argument, rather than
    being a substring of another one.
    Maybe to test if a value can be cast as a type, you can continue the ::
    mnemonic, say adding a "?" for yes and a "!" for no.

    For example, "<value>?::<type-name>" tests if the value can be cast as
    the type and "<value>!::<type-name>" or "not <value>?::<type-name>"
    tests the opposite. An expression like this results in a boolean.
    Personal opinion here: Blech, if I wanted to use Perl6 I'd do so ;-)
    I see that someone has been paying attention.

    Yes, the idea of using ? or ! to derive a boolean expression from some other
    expression did indeed come from Perl 6. The ? means "is so", ! means "is not".
    A very useful mnemonic in general.
    Good shorthand, I guess, but a CAST syntax extension or alternate CAST
    version would be a bonus for readability.
    Well, sure. But something consistent with cast syntax that Pg provides.

    -- Darren Duncan

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 20, '10 at 2:30p
activeSep 26, '10 at 3:03a
posts26
users12
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase