The following used to work in 6.5, works in Oracle, and is
very useful:

donb=# create table foo(c varchar);
CREATE
donb=# insert into foo values('abc');
INSERT 72649 1

donb=# select distinct c from foo order by upper(c);
ERROR: For SELECT DISTINCT, ORDER BY expressions must appear in target list
donb=#

In other words, we want to order ignoring case - in this case, users
within the Ars Digita Community system. We want don baccus to appear
next to Joe Blow rather than following Xena Xenophoba.

Is this now refused because it is non-standard? It seems a pity...

Of course, one can do "select distinct c, upper(c) as ignore ..."

but that forces the return of more data, so is slower, etc...

BTW the very fact that my testing of our partial port of this web
toolkit under V7 pre-beta has gotten this far is a very good sign.

Among other things, it makes heavy (if simple) use of referential
integrity, which has already uncovered two bugs in the port that
I've fixed.



- Don Baccus, Portland OR <dhogaza@pacifier.com>
Nature photos, on-line guides, Pacific Northwest
Rare Bird Alert Service and other goodies at
http://donb.photo.net.

Search Discussions

  • Tom Lane at Feb 7, 2000 at 5:26 am

    Don Baccus writes:
    The following used to work in 6.5, works in Oracle, and is
    very useful:
    donb=# select distinct c from foo order by upper(c);
    ERROR: For SELECT DISTINCT, ORDER BY expressions must appear in target list
    Well, it's not a bug --- it was an entirely deliberate change. It
    might be a misfeature though. The case we were concerned about was

    select distinct x from foo order by y;

    which produces ill-defined results. If I recall the thread correctly,
    Oracle and a number of other DBMSs reject this. I think your point is
    that

    select distinct x from foo order by f(x);

    *is* well-defined, and useful. I think you are right, but how
    far should we go in detecting common subexpressions? You might
    want to contemplate the difference in these examples:

    select distinct sin(x) from foo order by abs(sin(x));

    select distinct random(x) from foo order by abs(random(x));

    It would be interesting to poke at Oracle to find out just what they
    consider a legitimate ORDER BY expression for a SELECT DISTINCT.

    regards, tom lane
  • Taral at Feb 7, 2000 at 6:12 am

    On Mon, 7 Feb 2000, Tom Lane wrote:

    Well, it's not a bug --- it was an entirely deliberate change. It
    might be a misfeature though. The case we were concerned about was

    select distinct x from foo order by y;

    which produces ill-defined results.
    Okay, I can understand this...
    select distinct sin(x) from foo order by abs(sin(x));

    select distinct random(x) from foo order by abs(random(x));
    The thing here is that random() is not deterministic on its inputs,
    whereas sin() is. Perhaps we should only allow fully deterministic ORDER
    BY? (Ugh, another flag for functions...)

    Taral
  • Don Baccus at Feb 7, 2000 at 6:20 am

    At 12:12 AM 2/7/00 -0600, Taral wrote:
    The thing here is that random() is not deterministic on its inputs,
    whereas sin() is. Perhaps we should only allow fully deterministic ORDER
    BY? (Ugh, another flag for functions...)
    Which, by it's nature is probably a misnomer, because I imagine that
    PL/pgSQL functions would always have to be non deterministic whatever
    their inputs? Given that unrecognized syntax is just tossed the
    query executor. Thus calling any 'ole function without PL/pgSQL
    really knowing what's going on?

    So you probably end up with a LIST of functions by name that are built-in
    and deterministic.

    Or ... you simply say that results are really weird if the function has
    undeterministic behavior and document it.

    Tom's on the right path asking what the standard might say and what
    delphic, incomprehensible answer the Oracle might have for us.

    (the more I learn about the SQL standard, the more I appreciate the irony
    of Oracle's corporate name!)



    - Don Baccus, Portland OR <dhogaza@pacifier.com>
    Nature photos, on-line guides, Pacific Northwest
    Rare Bird Alert Service and other goodies at
    http://donb.photo.net.
  • Don Baccus at Feb 7, 2000 at 6:15 am

    At 12:26 AM 2/7/00 -0500, Tom Lane wrote:
    Well, it's not a bug --- it was an entirely deliberate change. It
    might be a misfeature though.
    Ahhh...getting subtle, are we? :)
    The case we were concerned about was

    select distinct x from foo order by y;
    Yes...I remember some discussion regarding this.
    which produces ill-defined results. If I recall the thread correctly,
    Oracle and a number of other DBMSs reject this. I think your point is
    that
    select distinct x from foo order by f(x);
    *is* well-defined, and useful. I think you are right, but how
    far should we go in detecting common subexpressions?
    Not sure...having not been into that part of the code (and busy at
    the moment testing my rewrites of small portions of RI trigger
    code I rewrote at Jan's request, after our "dispute" [which was more
    or less "I'm 50% certain you're right!" "No! I'm 50% you're right!"
    until I found the paragraph in Date's book which proved we were both
    just about 50% right]) I can't really say.

    I was hoping the standard might give some guidance?
    You might
    want to contemplate the difference in these examples:

    select distinct sin(x) from foo order by abs(sin(x));
    I'm not sure I see a problem here. My (brief) reading of the
    standard tells me that "order by" follows everything else,
    in other words, you get

    select ... arbitrary complexity, with group by and all sorts of
    cruft ...

    then you take that result and apply the "order by" clause.

    You'd get all the negative values followed by the positive
    values, but you'd also get -1.0 and 1.0 if the database had
    those values. Because they're distinct, and therefore live to
    be ordered.

    But I'm not sure about it...if you push me, I'll probably go dig
    into the standard again (I was so successful with referential
    "NO ACTION" last time, yeah, right, I sleep with Date's book under
    my pillow at the moment!)
    select distinct random(x) from foo order by abs(random(x));
    Of course, real compiler systems (like I've spent my life working
    on) have heuristic or, more modernly, other ways of deciding if a
    function returns different values depending on when it is called.
    In such systems, you only have to guarantee the correct answer, so
    choosing wrong simply means the code runs slower.

    "upper(column_value)" does not within a specific select. Column
    value won't change. I can think of rules to think of but the
    simplest might be that internal functions that are invariant when
    their parameters are unchanged might be considered safe. Others,
    not.

    Also, the standard might simply say the result is implementation
    dependent or (slightly worse) defined if the function returns
    different values for a call with the same parameter list in a
    single query. I don't know...it's an interesting question.

    The other approach is to simply state that the function has one
    and only one value during statement (SQL-statement, in this case)
    execution, and yank the sucker out of there, execute it, and stuff
    it in a temp variable. But that's probably too naive. Still, the
    standard might say it is implementation defined as to whether or
    not the function will be called once or more than once. The standard
    only cares about embedded SQL but it might give guidance...
    It would be interesting to poke at Oracle to find out just what they
    consider a legitimate ORDER BY expression for a SELECT DISTINCT.
    I have full-time access to an Oracle installation, so fire away
    regarding examples and questions.

    Not just on this narrow subject, but in general. I'm probably not the
    ONLY person here with Oracle access, but I do have it, and my poking
    at it won't hurt anything but Oracle's pride...



    - Don Baccus, Portland OR <dhogaza@pacifier.com>
    Nature photos, on-line guides, Pacific Northwest
    Rare Bird Alert Service and other goodies at
    http://donb.photo.net.
  • Tom Lane at Feb 7, 2000 at 6:37 am

    Don Baccus writes:
    At 12:26 AM 2/7/00 -0500, Tom Lane wrote:
    It would be interesting to poke at Oracle to find out just what they
    consider a legitimate ORDER BY expression for a SELECT DISTINCT.
    I have full-time access to an Oracle installation, so fire away
    regarding examples and questions.
    Well, try these on for size:

    select distinct x from foo order by x+1;

    select distinct x+1 from foo order by x+1;

    select distinct x+1 from foo order by x;

    select distinct x+1 from foo order by x+2;

    select distinct x+y from foo order by x+y;

    select distinct x,y from foo order by x+y;

    select distinct x+y from foo order by x,y;

    select distinct x+y from foo order by x-y;

    A human can easily see that all but the last two are well-defined,
    but I'll be a little surprised if Oracle knows it...

    regards, tom lane
  • Don Baccus at Feb 7, 2000 at 3:20 pm

    At 01:36 AM 2/7/00 -0500, Tom Lane wrote:
    Don Baccus <dhogaza@pacifier.com> writes:
    At 12:26 AM 2/7/00 -0500, Tom Lane wrote:
    It would be interesting to poke at Oracle to find out just what they
    consider a legitimate ORDER BY expression for a SELECT DISTINCT.
    I have full-time access to an Oracle installation, so fire away
    regarding examples and questions.
    Well, try these on for size:
    Here's what the Oracle proclaims:

    select distinct x from foo order by x+1;
    no rows selected

    select distinct x+1 from foo order by x+1;
    no rows selected

    select distinct x+1 from foo order by x;
    SQL> select distinct x+1 from foo order by x
    *
    ERROR at line 1:
    ORA-01791: not a SELECTed expression

    select distinct x+1 from foo order by x+2;
    SQL> select distinct x+1 from foo order by x+2
    *
    ERROR at line 1:
    ORA-01791: not a SELECTed expression

    select distinct x+y from foo order by x+y;
    SQL>
    no rows selected

    I also tried: select distinct x+y from foo order by y+x,
    which fails.

    select distinct x,y from foo order by x+y;
    SQL>
    no rows selected

    select distinct x+y from foo order by x,y;
    SQL> select distinct x+y from foo order by x,y
    *
    ERROR at line 1:
    ORA-01791: not a SELECTed expression

    select distinct x+y from foo order by x-y;
    SQL> select distinct x+y from foo order by x-y
    *
    ERROR at line 1:
    ORA-01791: not a SELECTed expression

    My first thought is that it is following a simple rule:

    For arithmetic "order by" expressions, either:

    1. The exact expression must also appear in the "select" list,
    and it must be exact, not just an expression that computes
    the same value as the "order by" expression

    or

    2. all of the variables used by the expression must be listed
    in the "select" list as simple column names, not as part of
    an expression.

    Must be true.

    At least, the rule is simple if you can compare expression trees.

    At this point I still am clueless regarding the standard, I think I'll
    make Date my morning coffee date again.



    - Don Baccus, Portland OR <dhogaza@pacifier.com>
    Nature photos, on-line guides, Pacific Northwest
    Rare Bird Alert Service and other goodies at
    http://donb.photo.net.
  • Tom Lane at Feb 7, 2000 at 4:03 pm

    Don Baccus writes:
    My first thought is that it is following a simple rule:
    For arithmetic "order by" expressions, either:
    1. The exact expression must also appear in the "select" list,
    and it must be exact, not just an expression that computes
    the same value as the "order by" expression
    or
    2. all of the variables used by the expression must be listed
    in the "select" list as simple column names, not as part of
    an expression.
    Could be. How about cases like

    select distinct x,y+1 from foo order by x+y+1;
    At least, the rule is simple if you can compare expression trees.
    I think we have something pretty similar for GROUP BY, actually,
    so it may not be hard to make this work.

    regards, tom lane
  • Don Baccus at Feb 7, 2000 at 4:27 pm

    At 11:03 AM 2/7/00 -0500, Tom Lane wrote:
    Could be. How about cases like

    select distinct x,y+1 from foo order by x+y+1;

    SQL> select distinct x,y+1 from foo order by x+y+1
    *
    ERROR at line 1:
    ORA-01791: not a SELECTed expression
    At least, the rule is simple if you can compare expression trees.
    I think we have something pretty similar for GROUP BY, actually,
    so it may not be hard to make this work.
    Actually, yes, you're probably right...



    - Don Baccus, Portland OR <dhogaza@pacifier.com>
    Nature photos, on-line guides, Pacific Northwest
    Rare Bird Alert Service and other goodies at
    http://donb.photo.net.
  • Tom Lane at Feb 7, 2000 at 5:11 pm
    Don Baccus writes:
    SQL> select distinct x,y+1 from foo order by x+y+1
    *
    ERROR at line 1:
    ORA-01791: not a SELECTed expression
    Actually, that was a little unfair, since their parser no doubt parsed
    "x+y+1" as (x+y)+1, leaving no common subexpression visible. Do they
    accept

    select distinct x,y+1 from foo order by x+(y+1)
    At least, the rule is simple if you can compare expression trees.
    I think we have something pretty similar for GROUP BY, actually,
    so it may not be hard to make this work.
    On further thought, I think the real implementation issue is that
    doing SELECT DISTINCT ORDER BY requires either two sorting steps
    (sort by DISTINCT fields, "uniq" filter, sort again by ORDER BY fields)
    or else some very hairy logic to figure out that ORDER BY x+1
    "implies" ORDER BY x. In fact I'm not sure it does imply it
    in the general case. In your original example, the requested sort
    was ORDER BY upper(x), but that doesn't guarantee that the tuples
    will be ordered adequately for duplicate-x elimination. For example,
    that ORDER BY might yield

    Ansel Adams
    Don Baccus
    DON BACCUS
    Don Baccus
    Joe Blow
    ...

    which is a valid sort by upper(x), but a uniq filter on plain x
    will fail to get rid of the second occurrence of "Don Baccus" as
    it should.

    Possibly we could make this work by implicitly expanding the ORDER BY
    to "ORDER BY upper(x), x" which would ensure that the duplicate x's
    are brought together. I am not sure this will give the right results
    always, but it seems promising. We are assuming here that upper(x)
    gives equal outputs for equal inputs, so it would fall down on random(x)
    --- I suppose we could refuse to do this if we see a function that is
    marked non-constant-foldable in pg_proc...

    regards, tom lane
  • Don Baccus at Feb 7, 2000 at 6:53 pm

    At 12:10 PM 2/7/00 -0500, Tom Lane wrote:
    Don Baccus <dhogaza@pacifier.com> writes:
    SQL> select distinct x,y+1 from foo order by x+y+1
    *
    ERROR at line 1:
    ORA-01791: not a SELECTed expression
    Actually, that was a little unfair, since their parser no doubt parsed
    "x+y+1" as (x+y)+1, leaving no common subexpression visible. Do they
    accept

    select distinct x,y+1 from foo order by x+(y+1)
    Yes, it does. So, they must be doing some level of common expression
    analysis, for real.
    At least, the rule is simple if you can compare expression trees.
    I think we have something pretty similar for GROUP BY, actually,
    so it may not be hard to make this work.
    On further thought, I think the real implementation issue is that
    doing SELECT DISTINCT ORDER BY requires either two sorting steps
    (sort by DISTINCT fields, "uniq" filter, sort again by ORDER BY fields) Yes.
    or else some very hairy logic to figure out that ORDER BY x+1
    "implies" ORDER BY x. In fact I'm not sure it does imply it
    in the general case. In your original example, the requested sort
    was ORDER BY upper(x), but that doesn't guarantee that the tuples
    will be ordered adequately for duplicate-x elimination.
    I realize that. I would assume that a double-sort penalty might
    be incurred, i.e. the select distinct ... is executed followed by
    the order by.
    Possibly we could make this work by implicitly expanding the ORDER BY
    to "ORDER BY upper(x), x" which would ensure that the duplicate x's
    are brought together.
    That would be another approach, too, if it works for all cases...
    I am not sure this will give the right results
    always, but it seems promising. We are assuming here that upper(x)
    gives equal outputs for equal inputs, so it would fall down on random(x)
    --- I suppose we could refuse to do this if we see a function that is
    marked non-constant-foldable in pg_proc...
    Something like that, yes.

    I just checked Date while off having coffee, and it is clear that the
    SQL standard specifies that ORDER BY operates on COLUMNS, not expressions.
    So the restriction that's now imposed is indeed standard compliant. However,
    some level of extension in this area would be very useful, and my guess is
    that examples like the one that started this discussion are very common.



    - Don Baccus, Portland OR <dhogaza@pacifier.com>
    Nature photos, on-line guides, Pacific Northwest
    Rare Bird Alert Service and other goodies at
    http://donb.photo.net.
  • Chris at Feb 7, 2000 at 8:32 am

    select distinct x from foo order by y;

    which produces ill-defined results.
    Why is this ill-defined? If y is in x then it is also distinct and
    there's no logic problem sorting on it.
  • Tom Lane at Feb 7, 2000 at 3:57 pm

    Chris writes:
    select distinct x from foo order by y;

    which produces ill-defined results.
    Why is this ill-defined? If y is in x then it is also distinct
    Huh? The query specifies distinct values of x, and only x.
    Consider
    x y

    1 1
    1 10
    2 0
    2 11

    "select distinct x" ought to produce one row with x=1, and one row with
    x=2, and nothing else. If it implicitly did the distinct on y as well,
    you'd get four rows with two x=1 and two x=2, which is not my idea of
    "distinct x". But if you don't have four rows out, then there's no
    meaningful way to order by y.

    6.5.3 in fact produces four rows from this query, which is generally
    conceded to be broken behavior.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedFeb 7, '00 at 3:06a
activeFeb 7, '00 at 6:53p
posts13
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase