FAQ
The following bug has been logged online:

Bug reference: 5490
Logged by: Farid Zidan
Email address: farid@zidsoft.com
PostgreSQL version: 8.4.1
Operating system: Windows XP 32-bit
Description: Using distinct for select list causes insert of
timestamp string literal to fail
Details:

If you use keyword 'distinct' for the source select of the insert statement
the insert fails. Insert succeeds if 'distinct' is not used in select list.
Example,

create table test_insert (
col1 char(8) not null,
col2 TIMESTAMP not null default CURRENT_TIMESTAMP);

insert into
test_insert
(col1, col2) values
('a', '2010-04-30 00:00:00');

insert into test_insert
(col1, col2)
select distinct
'b',
'2010-04-30 00:00:00'


ERROR: column "col2" is of type timestamp without time zone but expression
is of type text
LINE 16: '2010-04-30 00:00:00'
^
HINT: You will need to rewrite or cast the expression.

********** Error **********

ERROR: column "col2" is of type timestamp without time zone but expression
is of type text
SQL state: 42804
Hint: You will need to rewrite or cast the expression.
Character: 282

Search Discussions

  • Tom Lane at Jun 4, 2010 at 3:53 pm

    "Farid Zidan" <farid@zidsoft.com> writes:
    If you use keyword 'distinct' for the source select of the insert statement
    the insert fails. Insert succeeds if 'distinct' is not used in select list.
    This isn't a bug, it's a consequence of the fact that you're not
    specifying the types of the literal constants. DISTINCT forces
    the parser to assign a data type to the constants (otherwise there
    is no way to understand what duplicate-elimination means) and what
    it will fall back to is "text". Try attaching an explicit cast,
    eg
    '2010-04-30 00:00:00'::timestamp

    regards, tom lane
  • Kevin Grittner at Jun 4, 2010 at 3:57 pm

    "Farid Zidan" wrote:

    insert into test_insert
    (col1, col2)
    select distinct
    'b',
    '2010-04-30 00:00:00'


    ERROR: column "col2" is of type timestamp without time zone but
    expression is of type text
    LINE 16: '2010-04-30 00:00:00'
    ^
    HINT: You will need to rewrite or cast the expression.
    Try using a timestamp literal instead of a bare literal:

    insert into test_insert
    (col1, col2)
    select distinct
    'b',
    timestamp '2010-04-30 00:00:00'

    This is actually working as intended in all the cases you showed, so
    it isn't a bug. If we were strictly complying with the SQL
    standard, your first example would also fail, but we are more
    lenient than the standard where we can be, to allow an unadorned
    literal to be an UNKNOWN type until something causes it to be
    resolved, to allow people to omit the type decoration in many cases.
    To determine that something is a distinct value, you have to
    determine a type for it (otherwise you won't know if '2010-04-30
    00:00:00' is the same as '2010-04-30 00:00:00.0', for example), so
    if you don't tell it otherwise, it will assume text -- leading to
    the behavior you saw.

    -Kevin
  • Farid Zidan at Jun 4, 2010 at 4:54 pm
    On 6/4/2010 11:53 AM, Tom Lane wrote: DISTINCT forces the parser to assign a data type to the
    constants (otherwise there is no way to understand what duplicate-elimination means) and
    what it will fall back to is "text" I am including the column list for the insert, so parser knows
    col2 data type is TIMESTAMP and it has to convert from text to timestamp to do the insert.

    It should be able to do that without generating an error. It is the same select list, the same
    data types, nothing has changed except using the 'distinct' keyword to eliminate
    duplicates. The parse behavior after duplicates have been eliminated should be the same as
    when 'distinct' is not used.

    Whether 'distinct' is used or not should not affect the semantics of the insert statement (it
    should only remove duplicate rows).

    I have used this statement in Firebrid, MS SQL Server, Oracle, MySQL, SQLAnywhere, DB2,
    Derby, Informix, etc, and all of them do not generate an error because I need to use 'distinct'
    to eliminate duplicates from being inserted.

    If we were strictly complying with the SQL standard, Considering the statement works in all
    the 9 DBMS systems+ that I have tested so far as mentioned above, I would say PostgreSQL is not
    compliant with SQL standard in this regard.

    I guess, what I am saying, is that what the parser is doing is not the desired behavior. I
    understand there are technical things going on behind the scene, but that's what needs to be
    fixed to ensure PostgreSQL compatibility with SQL standard and interoperability with
    generic sql statements.

    best regards,
    Farid

    On 6/4/2010 11:57 AM, Kevin Grittner wrote: "Farid Zidan" wrote:
    insert into test_insert (col1, col2) select distinct 'b', '2010-04-30 00:00:00' ERROR:
    column "col2" is of type timestamp without time zone but expression is of type text LINE 16:
    '2010-04-30 00:00:00' ^ HINT: You will need to rewrite or cast the expression. Try using a
    timestamp literal instead of a bare literal: insert into test_insert (col1, col2) select
    distinct 'b', timestamp '2010-04-30 00:00:00' This is actually working as intended in all
    the cases you showed, so it isn't a bug. If we were strictly complying with the SQL standard,
    your first example would also fail, but we are more lenient than the standard where we can be, to
    allow an unadorned literal to be an UNKNOWN type until something causes it to be resolved, to
    allow people to omit the type decoration in many cases. To determine that something is a
    distinct value, you have to determine a type for it (otherwise you won't know if '2010-04-30
    00:00:00' is the same as '2010-04-30 00:00:00.0', for example), so if you don't tell it
    otherwise, it will assume text -- leading to the behavior you saw. -Kevin
    --
    Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually
    between two databases using ODBC drivers
  • Kevin Grittner at Jun 4, 2010 at 4:52 pm
    Farid Zidan wrote:
    If we were strictly complying with the SQL standard,
    Considering the statement works in all the 9 DBMS systems+ that I
    have tested so far as mentioned above, I would say PostgreSQL is
    not compliant with SQL standard in this regard.
    The SQL standard is a document published by the International
    Standards Organization (ISO) and also adopted by the American
    National Standards Institute (ANSI). Those documents don't require
    a query in either of the forms you presented to work. Because of
    the convenience factor, most database products have non-standard
    extensions to omit type specification in some places. PostgreSQL's
    extensions are oriented more toward user-installable data types
    (such as geometric shapes or global coordinates), so the particulars
    of our non-standard extensions differ so that use of those features
    is as easy as practicable. That does result in some non-standard
    extensions which work in other products not working in PostgreSQL.

    I think you'll find that the syntax I suggested (using the standard
    timestamp literal instead of a bare character string literal) will
    work in all of the databases you mentioned; if you want portable
    code, it is best to follow the standard rather than some inferred
    popular convention.

    I hope this helps.

    -Kevin
  • Farid Zidan at Jun 4, 2010 at 5:09 pm
    Hello Kevin,

    My bug report is about using 'distinct' in the select list which is causing a side-effect.
    That's why I classify this as a bug. Distinct should not have unintended side-effects.

    This side-effect is implementation-dependent and is manifested in the current PostgreSQL
    query processing but can be eliminated by appropriately handling the distinct keyword and
    does not have to occur.

    The ISO-datetime string literal format I am using the most general/standard for
    datetime/timestamp and is not the issue here. The 'distinct' keyword is causing the error.

    Farid

    On 6/4/2010 12:52 PM, Kevin Grittner wrote: Farid Zidan wrote: If we
    were strictly complying with the SQL standard, Considering the statement works in all the 9
    DBMS systems+ that I have tested so far as mentioned above, I would say PostgreSQL is not
    compliant with SQL standard in this regard. The SQL standard is a document published by the
    International Standards Organization (ISO) and also adopted by the American National
    Standards Institute (ANSI). Those documents don't require a query in either of the forms you
    presented to work. Because of the convenience factor, most database products have
    non-standard extensions to omit type specification in some places. PostgreSQL's
    extensions are oriented more toward user-installable data types (such as geometric shapes
    or global coordinates), so the particulars of our non-standard extensions differ so that use
    of those features is as easy as practicable. That does result in some non-standard extensions
    which work in other products not working in PostgreSQL. I think you'll find that the syntax I
    suggested (using the standard timestamp literal instead of a bare character string literal)
    will work in all of the databases you mentioned; if you want portable code, it is best to follow
    the standard rather than some inferred popular convention. I hope this helps. -Kevin
    --
    Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually
    between two databases using ODBC drivers
  • Kevin Grittner at Jun 4, 2010 at 5:36 pm

    Farid Zidan wrote:
    can be eliminated by appropriately handling the distinct keyword
    and does not have to occur.
    Based on previous discussions around our approaching data types, I
    don't think any of the regular PostgreSQL developers are likely to
    agree with you; but if you see a way to make it work, feel free to
    submit a patch. See this page for the process:

    http://wiki.postgresql.org/wiki/Submitting_a_Patch
    The ISO-datetime string literal format I am using the most
    general/standard for datetime/timestamp and is not the issue here.
    The format in your string literal is the portable one; however, a
    timestamp literal requires the TIMESTAMP keyword ahead of the string
    literal, which you have chosen to omit. Did you try the query with
    a proper timestamp literal, as I suggested, against all these
    databases? If using standard syntax works, why not use it?
    The 'distinct' keyword is causing the error.
    No, non-standard syntax is causing the error in the case of
    DISTINCT, because our extension to the standard does not cover that
    case, even though it covers the other. There are good reasons for
    that, which you'll probably discover in short order if you work on a
    patch for the issue.

    -Kevin
  • Farid Zidan at Jun 4, 2010 at 6:18 pm
    Hello Kevin,

    I strongly disagree with your analysis of this issue. Like I said, this syntax works with 9
    different databases, so obviously whatever PosgreSQL query procesor is doing in this case is
    not the desired behavior.

    To ensure PosgreSQL success, the query processor must behave in a compliant manner with
    established standards whether those standards are set by SQL ISO specs or are de facto
    standards.

    It is too much asking developers to change their sql to overcome implementation-dependent
    side-effects of PostgreSQL query processor. If a simple SQL statement works on 9+ different
    databases, then it should also work in PostreSQL with no need for developers to special-code
    for PostgreSQL. Very basic feature is converting a string literal to a datetime/timestamp
    value and developers should not do any special coding to accomplish this simple conversion.
    '2010-04-30 00:00:00' should convert to timestamp in PostgreSQL with no other flags or
    syntax decoration (it already does except when 'distinct' is used).

    Compatibility is very high on desired features for a DBMS and is a requirement for smooth
    porting of applications from other databases to PostreSQL and cross-dbms applications. It
    really boils down to making it work, technical details are what developers love and I am sure
    PostgreSQL developers can make this simple sql insert work on PostreSQL just like all the
    other developers have done for the other DBMSs.

    Anyway, I have reported this issue because I encountered it and it negatively impacts my
    project. I don't expect it to be fixed right now, that's something that PostgreSQL developers
    can debate and prioritize. I only ask that this issue is identified, since it does not work in my
    case when the target dbms is PostgreSQL and I am sure it can impact other developers projects
    and it would need to be addressed at some point in the future with a solution where it just work
    like it does in all the other DBMSs.

    Farid

    On 6/4/2010 1:36 PM, Kevin Grittner wrote: Farid Zidan wrote: can be
    eliminated by appropriately handling the distinct keyword and does not have to occur. Based
    on previous discussions around our approaching data types, I don't think any of the regular
    PostgreSQL developers are likely to agree with you; but if you see a way to make it work, feel
    free to submit a patch. See this page for the process:
    http://wiki.postgresql.org/wiki/Submitting_a_Patch The ISO-datetime string literal
    format I am using the most general/standard for datetime/timestamp and is not the issue here.
    The format in your string literal is the portable one; however, a timestamp literal requires
    the TIMESTAMP keyword ahead of the string literal, which you have chosen to omit. Did you try
    the query with a proper timestamp literal, as I suggested, against all these databases? If
    using standard syntax works, why not use it? The 'distinct' keyword is causing the error. No,
    non-standard syntax is causing the error in the case of DISTINCT, because our extension to the
    standard does not cover that case, even though it covers the other. There are good reasons for
    that, which you'll probably discover in short order if you work on a patch for the issue. -Kevin
    --
    Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually
    between two databases using ODBC drivers
  • Greg Stark at Jun 4, 2010 at 8:18 pm

    On Fri, Jun 4, 2010 at 7:18 PM, Farid Zidan wrote:
    If a simple SQL statement works on 9+ different databases
    For what it's worth are you sure it works as you expect in these other
    databases?

    I suspect what they're doing is doing a DISTINCT of the text values
    and then converting the results. That's not the same as what you're
    suggesting it do (insert distinct timestamp values) since different
    text values can represent the same timestamp. For example look at what
    this does:

    select cast(x as timestamp with time zone) from (select distinct x
    from (values ('2010-01-01 12:00PM UTC'), ('2010-01-01 7:00AM EST')) as
    x(x)) as y;

    If you inserted those values into a table with a timestamp with time
    zone column you would get duplicate values even with the distinct.

    This is the problem with depending on non-standard extensions. You're
    never really sure that they're working. They be working on some
    systems but doing something unexpected on other systems.

    --
    greg
  • Farid Zidan at Jun 4, 2010 at 9:04 pm
    Hello Greg,

    I suspect what they're doing is doing a DISTINCT of the text values and then converting the
    results. That's not the same as what you're suggesting it do (insert distinct timestamp
    values) since different text values can represent the same timestamp. For example look at
    what That's a good point. I think you are correct. When the query parser is in the nested
    subselect it only sees string literals for the timestamp column values (does not know it is a
    timestamp yet). However, when it gets to do the insert it then must convert the string literals
    to timestamp values because at that point it knows that the string literal is to be inserted
    into a timestamp column.

    Since I am using a constant string literal for the timestamp it really does not matter when the
    conversion takes place.

    select distinct
    <col1>,
    <col2>,
    ..
    '2010-04-30 00:00:00'
    from
    ....

    the timestamp string literal is a constant and really does not affect the distinct resultset
    in anyway. I do need to stamp all the inserts with a specific timestamp value and that's why I am
    using a constant string literal. If I used an expression such as current_timestamp/(ODBC {fn
    now()}, then that would factor into the distinct clause and pollute the distinctness of
    subquery reulsultset.

    Here is actual statements I am running and like I said they work for all 9+ DBMSs (I use ODBC
    and{fn user()} is the ODBC cross-dbms syntax for the current user ID):
    insert into in_sync_node_toolbar (node_no, sync_cd, toolbar_cd, ctrl_ins_dtm,
    ctrl_upd_dtm, ctrl_usr_id) select distinct isnr.node_no, case when isr.rs_type_cd =
    'TABLELS' then 'CMPTS' when isr.rs_type_cd = 'PROCLS' then 'CMPPROCS' when isr.rs_type_cd
    = 'SEQLS' then 'CMPSEQS' else null end, 'TBCSCPT1', '2010-04-30 00:00:00', '2010-04-30
    00:00:00', {fn user() } from in_sync_node_resultset isnr, in_sync_object_pattern isop,
    in_sync_resultset isr where (isnr.rs_oid = isr.rs_oid or isnr.rs_oid_other = isr.rs_oid)
    and isr.rs_oid = isop.rs_oid and isr.rs_type_cd in ('TABLELS', 'PROCLS', 'SEQLS'); insert
    into in_sync_node_toolbar (node_no, sync_cd, toolbar_cd, ctrl_ins_dtm, ctrl_upd_dtm,
    ctrl_usr_id) select distinct isnr.node_no, case when isr.rs_type_cd = 'TBLVIEW' then
    'CMPTABLE' when isr.rs_type_cd = 'PROC' then 'CMPPROC' when isr.rs_type_cd = 'SEQ' then
    'CMPSEQ' else null end, 'TBCSCPT1', '2010-04-30 00:00:00', '2010-04-30 00:00:00', {fn
    user() } from in_sync_node_resultset isnr, in_sync_object iso, in_sync_resultset isr
    where (isnr.rs_oid = isr.rs_oid or isnr.rs_oid_other = isr.rs_oid) and isr.rs_oid =
    iso.rs_oid and isr.rs_type_cd in ('TBLVIEW', 'PROC', 'SEQ'); This is the problem with
    depending on non-standard extensions. You're never really sure that they're working. They
    be working on some systems but doing something unexpected on other systems. All the other
    DBMSs doing is a select distinct on the subquery that has the constant timestamp string
    literals. There is nothing non-standard or ambiguous there. As far as the DBMS is concerned
    the constant string expression is just a string literal and can represent anything.

    Now the issue is that when the other DBMSs get to do the insert part they are able, as one would
    expect, to convert the subquery resultset string literal column to a timestamp column. I
    think PostreSQL is doing the first part (subquery with distinct clause correctly), but when
    it gets to use the resultset of the subquery in the insert it "forgets" how to convert
    '2010-04-30 00:00:00' to timestamp value (but forgets only when 'distinct' is used in the
    subquery!)

    Farid

    On 6/4/2010 4:18 PM, Greg Stark wrote: On Fri, Jun 4, 2010 at 7:18 PM, Farid Zidan
    wrote: If a simple SQL statement works on 9+ different databases For
    what it's worth are you sure it works as you expect in these other databases? I suspect what
    they're doing is doing a DISTINCT of the text values and then converting the results. That's
    not the same as what you're suggesting it do (insert distinct timestamp values) since
    different text values can represent the same timestamp. For example look at what this does:
    select cast(x as timestamp with time zone) from (select distinct x from (values ('2010-01-01
    12:00PM UTC'), ('2010-01-01 7:00AM EST')) as x(x)) as y; If you inserted those values into a
    table with a timestamp with time zone column you would get duplicate values even with the
    distinct. This is the problem with depending on non-standard extensions. You're never
    really sure that they're working. They be working on some systems but doing something
    unexpected on other systems.
    --
    Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually
    between two databases using ODBC drivers
  • Kevin Grittner at Jun 4, 2010 at 9:40 pm

    Farid Zidan wrote:
    but when it gets to use the resultset of the subquery in the
    insert it "forgets" how to convert '2010-04-30 00:00:00' to
    timestamp value
    Not really. In versions prior to 8.3 it did automagically convert
    like that. PostgreSQL has some pretty fancy features involving
    custom data types where this magic caused problems, so a deliberate
    decision was taken to no longer provide automatic casts from text to
    other data types.
    (but forgets only when 'distinct' is used in the subquery!)
    That is because (as I tried to explain earlier, but apparently
    didn't do a good job of communicating), an unadorned literal in
    single quotes is *not* taken to be a character string in PostgreSQL.
    Its type is held as "unknown" until it is forced to be resolved in
    some operation. This allows easier coding of custom data types, but
    does create a few deviations from standard behavior in corner cases,
    and breaks from the non-standard "conventional" behavior of many
    other databases. Because of this design choice, for example, the
    FAA can more easily write the code they use to map their runways and
    other airport facilities.

    The cost is that in situations such as you describe, you need to
    force the type before it is used in the comparisons necessary to
    determine a distinct value. The only way to get the behavior you
    want without breaking a great many useful cases, would be to
    determine where the result was going to be later used, and use that
    information to force the type to something other than text (the
    default, when no other information is available). That would be a
    *major* and destabilizing change.

    For those reasons, the chance of getting *anybody* here to consider
    this a bug are close to nil. The choice to more conveniently handle
    advanced cases at the expense of occasionally needing to specify a
    type is unlikely to be reversed, to put it mildly.

    I can't help but wonder why you resist using the standard syntax.
    The reason the standard exists is to help those trying to write
    portable code, so they don't have to count on the vagaries of
    "parallel evolution."

    -Kevin
  • Farid Zidan at Jun 4, 2010 at 10:15 pm
    Hello Kevin,

    I can't help but wonder why you resist using the standard syntax. I am using the standard
    syntax. Single quote in sql denotes a string. so '2010-04-30 00:00:00' is string literal.
    That's universal. Now you want me to use PG-specific timestamps and that's like I said is not
    standard/cross-dbms.

    I have just finished testing with Ingre 9.2 and it works there too. That's 10 DBMSs systems that
    use single quotes to denote a string literal and can covert ISO-standard datetime string
    literal to timestamp.

    You can't not interpret string literals one way in one statement and just because user uses the
    word 'distinct' decide to switch paradigms. That's not good design or planning. Of course you
    can decide to do whatever you want, just do not expect developers to start special-coding just
    for PostreSQL because you decide to cast correctly or not correctly depending on whim.

    Let me reiterate the example, maybe it was too terse and you did not read it carefully,

    create table test_insert ( col1 char(8) not null, col2 TIMESTAMP not null default
    CURRENT_TIMESTAMP); >create the test table. No issue.

    insert into test_insert (col1, col2) values ('a', '2010-04-30 00:00:00'); >Works like
    expected, PG correctly converts standard ISO-datetime string literal to timestamp. No
    issue.

    insert into test_insert (col1, col2) select 'b', '2010-04-30 00:00:00' >That works too. No
    issue.

    insert into test_insert (col1, col2) select distinct 'b', '2010-04-30 00:00:00' >Does not
    work. That's a bug.

    Now this not rocket science, it's simple insert statement where we do not want duplicates
    inserted. Works on 10 other DBMSs.

    FAA stuff and other is not related to this bug. I would think the FAA and other organizations
    want a standard-compliant DBMS system that knows how to convert a simple ISO-formatted valid
    string literal to a timestamp value in more than one variation of sql statement.

    You can ignore this bug report and do whatever you want, just do not say this is an accepted,
    standard or desired behavior of the server or is by design. It's not by design that the error
    happens it is by faulty handling of the distinct keyword.

    I think you have all the information you need to debate and resolve this issue. If you need any
    other information you can contact me and I will be happy to oblige.

    Farid

    On 6/4/2010 5:40 PM, Kevin Grittner wrote: I can't help but wonder why you resist using the
    standard syntax. The reason the standard exists is to help those trying to write portable
    code, so they don't have to count on the vagaries of "parallel evolution."
    --
    Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually
    between two databases using ODBC drivers
  • Greg Stark at Jun 5, 2010 at 2:42 am

    On Fri, Jun 4, 2010 at 11:15 PM, Farid Zidan wrote:
    Now this not rocket science, it's simple insert statement where we do not
    want duplicates inserted. Works on 10 other DBMSs.
    I find usually when one person is arguing something is complex and
    someone else is arguing it's simple it's the person who's claiming
    it's simple who is wrong.

    The other databases are not, I believe, preventing duplicates from
    being inserted as you describe. They are removing duplicates from the
    string constants and then silently converting to a different datatype
    before inserting. When postgres removed these default casts to text it
    turned up many instances where users had buggy code and Postgres had
    been hiding from them by silently using string operators which was not
    what users were expecting. In other words, while it might not matter
    in this case, in general if you code in this style your code is buggy
    and these other database implementations are not doing you any favours
    by making it appear to work correctly most of the time.

    --
    greg
  • Farid Zidan at Jun 5, 2010 at 3:27 am
    Greg,

    Obviously I do not agree. When 14 different databases by 14 different DBMS vendors from the
    largest to the smallest in the market can do a simple thing as a using a subquery that has
    distinct keyword and your DBMS can't, I would say your DBMS is at fault and is not better, rather
    is lacking in this respect. I am not expecting favors from the DBMS by its doing what I expect it
    to do.

    I do not want to beat an already dead horse, but if you review my example, you will see that it is
    very simple, PG already does conversion correctly from ISO string to timestamp column for
    inserting so you can't say we removed all conversions and that is a good thing, it is not. Basic
    feature of DBMS is allowing data entry into different data type columns using plain string
    literals. PG already does that and all other DBMS do that as well. For reference, although ODBC
    is not a DBMS, ODBC specification requires that an ODBC driver can convert all source DBMS data
    types from/to chars. This is not by accident, it is a necessity and is by design. I can
    understand that having multiple data formats for conversion to native data types from text
    can cause bugs and that's why we have established standards such as ISO for
    datetime/timestamp string formats and PG supports the conversion already.

    The issue is the PG is not doing it correctly when 'distinct' keyword is used in the select
    statement. There is nothing buggy with using ISO datetime string literals to insert into a
    table timestamp column. There is no behind the scene magic going on.

    1 Execute subquery: string literals are just that can be 'aa', 'bb', '2010-04-30 00:00:00',
    whatever, it does not matter what the string literal is.

    2 Eliminate duplicates

    3 Now a string literal is being inserted into a timestamp column, you have a string literal and
    you are asked to insert into a timestamp colum -> convert string literal to timestamp and do the
    insert

    As you can see there is nothing buggy or heinous here, just simple select with distinct keyword
    in step 1, 2 and conversion from string literal to timestamp value in step 3

    There is no ambiguity or magic to happen. Obviously in PG case there is some design or fault
    somewhere in this use-case when distinct keyword is used and is processed in step 2, that's
    all.

    Farid

    On 6/4/2010 10:41 PM, Greg Stark wrote: On Fri, Jun 4, 2010 at 11:15 PM, Farid Zidan
    wrote: Now this not rocket science, it's simple insert statement where
    we do not want duplicates inserted. Works on 10 other DBMSs. I find usually when one person is
    arguing something is complex and someone else is arguing it's simple it's the person who's
    claiming it's simple who is wrong. The other databases are not, I believe, preventing
    duplicates from being inserted as you describe. They are removing duplicates from the string
    constants and then silently converting to a different datatype before inserting. When
    postgres removed these default casts to text it turned up many instances where users had buggy
    code and Postgres had been hiding from them by silently using string operators which was not
    what users were expecting. In other words, while it might not matter in this case, in general if
    you code in this style your code is buggy and these other database implementations are not
    doing you any favours by making it appear to work correctly most of the time.
    --
    Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually
    between two databases using ODBC drivers
  • Tomas at Jun 5, 2010 at 4:25 am
    On Fri, Jun 04, 2010 at 06:15:09PM -0400, Farid Zidan wrote:

    [...]
    Let me reiterate the example, maybe it was too terse and you did not
    read it carefully,<br>
    No. I think most readers here have understood your problem perfectly.
    Don't underestimate the folks here.

    [...]
    Now this not rocket science, it's simple insert statement where we do
    not want duplicates inserted. Works on 10 other DBMSs.<br>
    Except on those "other 10 DBMSs" you are most probably getting
    (silently!) something different as you'd expect (DSTINCT interpreted as
    text, whereas you are "seeing" timestamps). How is that better?
    FAA stuff and other is not related to this bug. I would think the FAA
    and other organizations want a standard-compliant DBMS system that
    knows how to convert a simple ISO-formatted valid string literal to a
    timestamp value in more than one variation of sql statement.<br>
    Except that the behaviour of those "other 10 DBMSs" is *beyond
    standard*, the standard just rules the case where you state explicitly
    the type of the constant. You will find multitude of cases where DMBSs
    differ on those cases beyond standard -- that's due to different design
    decisions.

    What Kevin was trying to convey is that PostgreSQL's design decisions
    allow its users to do things other DBMSs can't -- and that's the price
    we'll have to pay. Note that behaviour is still within the standard
    (and not, as you seem to suggest), so not really a problem: you can
    write the query in a way which will work on "all those 11 DBMSs": just
    stick to the standard.
    You can ignore this bug report and do whatever you want, just do not
    say this is an accepted, standard or desired behavior of the server or
    is by design. It's not by design that the error happens it is by faulty
    handling of the distinct keyword.<br>
    Accepted -- by whom? Standard -- which standard? (because it is not
    required by ISO/ANSI, and there is no other "SQL standard" that I'm
    aware of).

    Regards
    - -- tomás
  • Craig Ringer at Jun 5, 2010 at 7:30 am

    On 05/06/10 06:15, Farid Zidan wrote:
    insert into test_insert
    (col1, col2)
    select *distinct*
    'b',
    '2010-04-30 00:00:00'
    Does not work. That's a bug.
    Not really.

    select distinct * from (VALUES
    ('b','2010-04-30 00:00:00'),
    ('b','2010-04-30 00:00:00'),
    ('b','20100430 000000')
    ) AS x(a,b);

    Does that produce the result you expected? It certainly didn't
    deduplicate the timestamps, yet it's doing exactly the correct thing.

    So this won't work:

    create table test_insert (
    col1 char(8) not null,
    col2 TIMESTAMP not null default CURRENT_TIMESTAMP,
    UNIQUE(col2)
    );

    insert into test_insert
    (col1, col2)
    select a, b::timestamp from (
    select distinct * from (VALUES
    ('b','2010-04-30 00:00:00'),
    ('b','2010-04-30 00:00:00'),
    ('b','20100430 000000')
    ) AS x(a,b)) AS y;


    ... which is why your example is unsafe, and even if it appears to work
    on other databases it is buggy. Instead, write:

    insert into test_insert
    (col1, col2)
    select distinct
    'b',
    CAST('2010-04-30 00:00:00' AS timestamp);

    ... which will be safe on any database, is (AFAIK) perfectly standard,
    and is fuss free.

    --
    Craig Ringer

    Tech-related writing: http://soapyfrogs.blogspot.com/
  • Farid Zidan at Jun 5, 2010 at 1:40 pm
    Craig

    I am not asking you to re-write my sql so the bug will not show. I am presenting you with sql that
    fails and shows the bug. If every time someone reported a bug you ask them to re-write their sql
    so the bug is not hit, that would not eliminate the bug.

    Also, you are using different timestamp string literals in your subquery. I am using the same
    constant datetime string literal in my example that the query processor does not need to cast
    to timestamp or anything to do the distinct part and eliminate duplicates.
    insert into test_insert (col1, col2) select distinct 'b', cast('2010-04-30 00:00:00' as
    timestamp) >This works as expected. However is not an option because it is not generic sql. In
    PG timestamp data type is called 'timestamp' but in another DBMS it may be called 'datetime',
    etc.

    ... which is why your example is unsafe, and even if it appears to work on other databases it is
    buggy. Instead, write: My example is safe and is cross-dbms. I am not doing anything
    extra-ordinary just select distinct where a constant string expression is used in the select
    list.
    select distinct 'b', '2010-04-30 00:00:00' Why is the sql above unsafe? It is not. It is simple
    select statement with two constant string expressions and distinct keyword. Now use the
    result of the sql above as source for inserting into test_table (col1, col2):
    insert into test_insert (col1, col2) select distinct 'b', '2010-04-30 00:00:00' There is
    nothing unsafe here. You have a resultset that has one row with the values 'b',
    '2010-04-30 00:00:00' being used to insert int col1, col2. Why would you say that's unsafe?
    '2010-04-30 00:00:00' is an ISO string literal being inserted into col2 whose data type is
    timestamp, perfectly safe.

    Farid

    On 6/5/2010 3:26 AM, Craig Ringer wrote: On 05/06/10 06:15, Farid Zidan wrote: insert into
    test_insert (col1, col2) select *distinct* 'b', '2010-04-30 00:00:00' Does not work.
    That's a bug. Not really. select distinct * from (VALUES ('b','2010-04-30 00:00:00'),
    ('b','2010-04-30 00:00:00'), ('b','20100430 000000') ) AS x(a,b); Does that produce the
    result you expected? It certainly didn't deduplicate the timestamps, yet it's doing exactly
    the correct thing. So this won't work: create table test_insert ( col1 char(8) not null, col2
    TIMESTAMP not null default CURRENT_TIMESTAMP, UNIQUE(col2) ); insert into test_insert
    (col1, col2) select a, b::timestamp from ( select distinct * from (VALUES ('b','2010-04-30
    00:00:00'), ('b','2010-04-30 00:00:00'), ('b','20100430 000000') ) AS x(a,b)) AS y; ...
    which is why your example is unsafe, and even if it appears to work on other databases it is
    buggy. Instead, write: insert into test_insert (col1, col2) select distinct 'b',
    CAST('2010-04-30 00:00:00' AS timestamp); ... which will be safe on any database, is (AFAIK)
    perfectly standard, and is fuss free.
    --
    Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually
    between two databases using ODBC drivers
  • Dimitri Fontaine at Jun 5, 2010 at 8:17 pm

    Farid Zidan writes:
    I am not asking you to re-write my sql so the bug will not show. I am
    presenting you with sql that fails and shows the bug. If every time
    someone reported a bug you ask them to re-write their sql so the bug
    is not hit, that would not eliminate the bug.
    You're not reading us.
    Your example makes no sense at all, and hides the problem you have. You
    are failing to understand the situation because of this.

    If you want to understand the problem you have, please go test your 14
    databases with the meaningful tests that have been offered to you.

    Until and unless you do so, I don't think posting here again on this
    issue has any value, for you nor for us. You're abusing the time of lots
    of people who are both busy and helpful, so please try to understand the
    advices and analysis they've been offering to you.

    Regards,
    --
    dim
  • Farid Zidan at Jun 6, 2010 at 1:12 am
    Dimitri

    This is the last you will hear from me about this issue. I would have
    stopped repsonding long time ago, but I kept getting responses that
    required reply. Do not knock on the door if you don't want someone to
    answer the door.

    There is no point in spending more of my time explaining to you simple
    things you seem incapable of understanding. You make no technical point,
    you make a personal opinion. You are entitled to your opinion. Software
    is not built on personal opinions, software is built on good
    technical/logical analysis of issues of which you offer none.

    I really did not expect a debate. I reported an issue that is clearly a
    bug. I expected "yes, we see, we will address this at some point, thank
    you." I really don't mind explaining the issue, but what I do mind is
    people like you who don't have a technical understanding of the issue
    and nonetheless appoint themselves to speak for PostgreSQL/MySQL or any
    organization about things they do not understand.

    Dmitri, it is good you don't work for me, for if you did you would not
    have a job. The plight of the software industry is people who when
    confronted with a logical argument revert to ad hominem response because
    they have no valid technical response.

    And of course you should speak only for yourself because you are not
    PostgreSQL and you don't not represent PostgreSQL or anbody else in the
    collaborative software development spirit. The arrogance you show is
    appalling.
    Until and unless you do so, I don't think posting here again on this
    issue has any value, for you nor for us. You're abusing the time of lots
    of people who are both busy and helpful, so please try to understand the
    advices and analysis they've been offering to you
    I see no technical analysis in your response. If you too busy to engage
    in logical debate, you should remove yourself from the bug list.
    Software is built on logical analysis. You are too busy, do not participate.

    Farid
    On 6/5/2010 4:16 PM, Dimitri Fontaine wrote:
    Farid Zidan<farid@zidsoft.com> writes:
    I am not asking you to re-write my sql so the bug will not show. I am
    presenting you with sql that fails and shows the bug. If every time
    someone reported a bug you ask them to re-write their sql so the bug
    is not hit, that would not eliminate the bug.
    You're not reading us.
    Your example makes no sense at all, and hides the problem you have. You
    are failing to understand the situation because of this.

    If you want to understand the problem you have, please go test your 14
    databases with the meaningful tests that have been offered to you.

    Until and unless you do so, I don't think posting here again on this
    issue has any value, for you nor for us. You're abusing the time of lots
    of people who are both busy and helpful, so please try to understand the
    advices and analysis they've been offering to you.

    Regards,
    --
    www.zidsoft.com <http://www.zidsoft.com/> CompareData: compare and
    synchronize SQL DBMS data visually between two databases using ODBC drivers
  • Dimitri Fontaine at Jun 6, 2010 at 7:51 pm

    Farid Zidan writes:
    I see no technical analysis in your response.
    I made a clear reference to what others have been saying, and that you
    have been refusing to read. I don't see any point in getting technical
    again, as I'm sure you will refuse to understand what is happening to
    you here.

    But as you insist, let's try once more:

    - your consider a bug where PostgreSQL wants to know more about the
    data type you're using in the query in order to be able to enforce a
    DISTINCT restriction

    - other products are happy to solve the DISTINCT restriction without
    any hint as far as what the datatype really is

    - the error message is perfectly clear about what PostgreSQL needs from
    you

    - the reason why PostgreSQL wants you to give it details is clear to:
    what means DISTINCT depends on the datatype, you can easily have two
    different text representations of the same timestamptz, for example

    - it could be considered a possible area of improvement in the system
    that the SELECT part of the INSERT INTO ... SELECT ... could
    determine the SELECT output columns type by looking at the INSERT
    target table definition

    - it would be asking for a new feature to be considered, not for a bug
    to be solved: the system currently works exactly as designed.

    Now if you keep wanting not to understand how the system is currently
    designed, that makes it your problem, no one else.

    Regards,
    --
    dim
  • Craig Ringer at Jun 7, 2010 at 2:57 am

    On 7/06/2010 3:51 AM, Dimitri Fontaine wrote:

    - other products are happy to solve the DISTINCT restriction without
    any hint as far as what the datatype really is
    ... and so is Pg. That's not the problem; Pg complains after resolving
    the SELECT DISTINCT, when it finds that it's trying to insert values of
    type 'text' into a column with type 'timestamp'.

    You'll get exactly the same error if you replace the OP's SELECT
    DISTINCT subquery with a VALUES list that explicitly specifies TEXT type.

    The other clue as to what's happening is that if you run the SELECT part
    of the query standalone, it executes fine, treating the passed values as
    'text'.

    Personally, I do think this is a bit of a wart. I know why the explicit
    casts around text were removed, but in the case of INSERT I'm not sure
    the current behaviour is desirable.

    I initially thought the OP was asking for Pg to infer the type of the
    timestamp literals from the surrounding INSERT, and for that reason was
    very much against the idea. After realizing that what they really expect
    is for the SELECT to interpret the literals as 'text' (just as it does)
    then Pg to implicitly cast the 'text' query results to 'timestamp', I
    can see why they want it and why they're frustrated with the current
    behaviour.
    - the error message is perfectly clear about what PostgreSQL needs from
    you
    Apparently not, as you seem to have misunderstood it ;-)
    - the reason why PostgreSQL wants you to give it details is clear to:
    what means DISTINCT depends on the datatype, you can easily have two
    different text representations of the same timestamptz, for example
    That's not why Pg reports an error. If it was, then the following query
    would not be legal:

    SELECT DISTINCT x.* FROM (VALUES ('a'),('b'),('c')) AS x;

    ... since there's no explicit type info provided.

    Pg follows the SQL rules and interprets literals as text if there's no
    explicit type info provided and no immediate context that requires a
    particular type. So the above work fine, treating 'a', 'b', and 'c' as
    if they were written:

    (TEXT 'a'), (TEXT 'b'), (TEXT 'c')
    - it could be considered a possible area of improvement in the system
    that the SELECT part of the INSERT INTO ... SELECT ... could
    determine the SELECT output columns type by looking at the INSERT
    target table definition
    I'd say that'd be a ghastly "solution". It'd change the meaning of the
    SELECT based on the surrounding INSERT. Imagine trying to figure out
    what was going on with a query that wasn't doing what you wanted when
    you couldn't run it standalone and know the results were the same!

    --
    Craig Ringer
  • Dimitri Fontaine at Jun 7, 2010 at 8:19 am

    Craig Ringer writes:
    - other products are happy to solve the DISTINCT restriction without
    any hint as far as what the datatype really is
    ... and so is Pg. That's not the problem; Pg complains after resolving the
    SELECT DISTINCT, when it finds that it's trying to insert values of type
    text' into a column with type 'timestamp'.
    Ah yes. I've been paying more attention to the energy people have been
    willing to put into helping the OP than into what the problem really
    is. Thanks for clarifying.

    --
    dim
  • Craig Ringer at Jun 6, 2010 at 3:09 am

    On 5/06/2010 9:39 PM, Farid Zidan wrote:
    Craig

    I am not asking you to re-write my sql so the bug will not show. I am
    presenting you with sql that fails and shows the bug.
    Before writing this off completely, please read on. I suspect I may've
    misunderstood your argument, and on further examination think there's
    something that bears design consideration here. It's not a bug, as it's
    working exactly as designed, but it _is_ something where the design
    might benefit from a tweak.

    In the mean time, you have a system that doesn't work how you want it do
    - design choice or bug, workaround or bug-hiding, the effect is the
    same. You must already be handing differing names of timestamp/datetime
    types in your DDL, so why not use the same mapping in your DML?

    As for the root of the issue: Here's why implicitly casting those
    literals to timestamps would be a bad idea for solid, technical reasons:

    - The semantics of a nested/sub query should not be affected by the
    calling context, ie the surrounding query. To have a function affected
    by what the caller is doing would be bizarre and confusing; so it is
    with a subquery. You cannot test functional units or rely on any kind of
    consistent behaviour if calling context changes callee behaviour.

    - A SELECT as a value-supplier to an INSERT is a type of subquery

    - If your SELECT ran differently in the context of the INSERT to how it
    ran standalone, that would not only be incredibly confusing but also
    clearly a bug.


    ... so we clearly can't use type information from the surrounding INSERT
    to determine the data type of the literals used in the SELECT. With no
    type information to the contrary they must be interpreted as text. So,
    after SELECT evaluation your query goes from:

    insert into test_insert
    (col1, col2)
    select distinct
    'b',
    '2010-04-30 00:00:00';

    to effectively:

    insert into test_insert
    (col1, col2)
    values ('b'::text, '2010-04-30 00:00:00'::text);

    Now, when executed with Pg both those queries result in:

    ERROR: column "col2" is of type timestamp without time zone but
    expression is of type text

    showing that your issue isn't actually with DISTINCT at all, but with
    Pg's unwillingness to *implicitly* cast a value of explict text type to
    another type.

    This is the real core of your complaint. You want Pg to cast from 'text'
    to 'timestamp' on INSERT. Pg doesn't. In the general case (not specific
    to INSERT) there are some pretty good reasons for that, but for
    INSERTs... should it? In the context of an INSERT, where the desired
    data type is explicitly specified and obvious, I'm not sure, but I
    suspect that it should.

    Anyone familiar with the details of the implicit-text-cast bugs want to
    pipe up with whether it'd be safe to force a cast in the context of an
    INSERT?


    ( As for why I adjusted your example: Your example query was
    unrealistically simplistic and clearly couldn't be what you were using
    in your app. Using 'DISTINCT' with one value is pointless. So, my
    example added some dummy values to illustrate why it might be unsafe to
    use it how you're trying to. As it turns out, in your app's case you can
    guarantee input formatting consistency, so it's safe for you so long as
    you stick very closely to timestamp formatting specifics, but I'm sure
    you can see that a database's behaviour depending on the formatting of
    timestamps is probably not something that most people with most apps
    would be happy with. Even if INSERT did convert the results of the
    SELECT DISTINCT subquery to timestamps, it'd still be pretty unsafe.)

    --
    Craig Ringer
  • Kris Jurka at Jun 5, 2010 at 1:42 am

    On Fri, 4 Jun 2010, Farid Zidan wrote:

    Here is actual statements I am running and like I said they work for all 9+
    DBMSs (I use ODBC and{fn user()} is the ODBC cross-dbms syntax for the
    current user ID):

    '2010-04-30 00:00:00',
    '2010-04-30 00:00:00',
    {fn user() }
    If you're into using standard ODBC escapes for portability, shouldn't you
    be using {ts '2010-04-30 00:00:00'}?

    http://msdn.microsoft.com/en-us/library/ms712360%28VS.85%29.aspx

    Kris Jurka
  • Farid Zidan at Jun 5, 2010 at 2:54 am
    I only use ODBC escape sequences when necessary. Obviously I want to use standard sql syntax as
    much as possible. {fn user() } is handy because it works in all the databases that I work with and
    there is no substitute standard sql function for getting current userid that is cross-dbms.

    I also use {fn now()} which works across most ODBC drivers, but I can't in this case because I
    need to use a constant timestamp value so as not to change distinctness of the subquery that is
    the source for the insert.

    The datetime ISO-standard string format I am using works in all the databases I use 14+
    (including PG), except in this case where 'distinct' is used with subquery in PG.

    Also not all PG clients use ODBC, so other PG clients will encounter this issue using standard
    ISO datetime string format when not using ODBC. I don't want to limit users to using ODBC for
    loading/updating the database by running sql scripts (which is what the sql for this issue is
    used for) so almost all of the database update/load scripts use generic sql where
    timestamp/datetime values are are written as ISO datetime format strings same format as
    '2010-04-30 00:00:00'

    BTW, I have also tested the sql in question with SQLite, MS Access, MS Excel and Sybase Adaptive
    Server 15 and it works with no error, so now that's 14 different DBMSs that have no issue with the
    ISO standard string format and distinct keyword.

    I guess I can find some workaround for this to work with ODBC just for the specific sql
    statements causing errors with PG, but that does not resolve the issue for PG clients not using
    ODBC.

    Like I said, I am reporting this issue so it can be identified and hopefully addressed at some
    point in the future, it is not critical for me for it to work right now, but that would be nice
    otherwise user will see a bunch of one-time errors and lose some ease of use but otherwise will
    not be too badly affected.

    Farid

    On 6/4/2010 9:42 PM, Kris Jurka wrote:

    On Fri, 4 Jun 2010, Farid Zidan wrote:

    Here is actual statements I am running and like I said they work for all 9+
    DBMSs (I use ODBC and{fn user()} is the ODBC cross-dbms syntax for the
    current user ID):

    '2010-04-30 00:00:00',
    '2010-04-30 00:00:00',
    {fn user() }

    If you're into using standard ODBC escapes for portability, shouldn't you be using {ts
    '2010-04-30 00:00:00'}?

    http://msdn.microsoft.com/en-us/library/ms712360%28VS.85%29.aspx

    Kris Jurka

    --
    Signature www.zidsoft.com CompareData: compare and synchronize SQL DBMS data visually
    between two databases using ODBC drivers
  • Craig Ringer at Jun 7, 2010 at 3:17 am
    I know it's not what you want (Pg behaving how you expect out of the
    box) but creating implicit casts to the desired types will resolve your
    immediate issue. You still have to run some Pg-specific code, but it can
    be restricted to your DDL where there's (presumably) already plenty.

    See:

    http://www.depesz.com/index.php/2008/05/05/error-operator-does-not-exist-integer-text-how-to-fix-it/

    http://petereisentraut.blogspot.com/2008/03/readding-implicit-casts-in-postgresql.html

    http://wiki.postgresql.org/images/d/d1/Pg83-implicit-casts.sql

    --
    Craig Ringer
  • Farid Zidan at Jun 7, 2010 at 11:32 am
    Craig,
    I know it's not what you want (Pg behaving how you expect out of the
    box) but creating implicit casts to the desired types will resolve
    your immediate issue. You still have to run some Pg-specific code, but
    it can be restricted to your DDL where there's (presumably) already
    plenty.
    My main concern is identifying this issue by PosgtreSQL developers so it
    can be addressed as some point in the future. I believe that is done now
    and the issue is identified.

    As it happens the sql in question is part of a common generic sql script
    file where the same script file is run against 10+ different DBMSs to
    update the database data from a previous release (all 10+ DBMS schemas
    have the same table/column names, logical structure, logical data types,
    etc). Having said that, I can introduce a PG-only data update file as a
    workaround for PG in this case but not without complicating things quite
    a bit. However, due to limited time and schedule, I will be just
    accepting the error (based on cost-analysis of the error) in this case.

    Best regards,
    Farid

    On 6/6/2010 11:17 PM, Craig Ringer wrote:
    I know it's not what you want (Pg behaving how you expect out of the
    box) but creating implicit casts to the desired types will resolve
    your immediate issue. You still have to run some Pg-specific code, but
    it can be restricted to your DDL where there's (presumably) already
    plenty.

    See:

    http://www.depesz.com/index.php/2008/05/05/error-operator-does-not-exist-integer-text-how-to-fix-it/


    http://petereisentraut.blogspot.com/2008/03/readding-implicit-casts-in-postgresql.html


    http://wiki.postgresql.org/images/d/d1/Pg83-implicit-casts.sql

    --
    Craig Ringer
    --
    www.zidsoft.com <http://www.zidsoft.com/> CompareData: compare and
    synchronize SQL DBMS data visually between two databases using ODBC drivers

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-bugs @
categoriespostgresql
postedJun 4, '10 at 2:32p
activeJun 7, '10 at 11:32a
posts27
users8
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase