Hi all,
I have a view v_table defined as following:

select a,b,c,d,e,f
from t_table
sort by a,b,c;

the usage pattern of this view is the following:

select distinct(a) from v_table;
select distinct(b) from v_table where a = "XXX";
select distinct(c) from v_table where a = "XXX" and b = "YYYY";

because of that sort in the view definition the first query above
takes not less than 3 seconds. I have solved this performance issue
removing the sort from the view definition and putting it in the
select reducing the time from > 3secons to < 150ms.

Can not the optimizer take rid of that useless sort on those
kind of queries ?


Regards
Gaetano Mendola

Search Discussions

  • Tom Lane at Sep 13, 2010 at 2:45 pm

    Gaetano Mendola writes:
    because of that sort in the view definition the first query above
    takes not less than 3 seconds. I have solved this performance issue
    removing the sort from the view definition and putting it in the
    select reducing the time from > 3secons to < 150ms.
    Can not the optimizer take rid of that useless sort on those
    kind of queries ?
    It is not the optimizer's job to second-guess the user on whether a sort
    is really needed there. If we did make it throw away non-top-level
    sorts, we'd have hundreds of users screaming loudly.

    regards, tom lane
  • Gaetano Mendola at Sep 13, 2010 at 4:24 pm

    On 09/13/2010 04:44 PM, Tom Lane wrote:
    Gaetano Mendola <mendola@gmail.com> writes:
    because of that sort in the view definition the first query above
    takes not less than 3 seconds. I have solved this performance issue
    removing the sort from the view definition and putting it in the
    select reducing the time from > 3secons to < 150ms.
    Can not the optimizer take rid of that useless sort on those
    kind of queries ?
    It is not the optimizer's job to second-guess the user on whether a sort
    is really needed there. If we did make it throw away non-top-level
    sorts, we'd have hundreds of users screaming loudly.
    Of course I'm not suggesting to take away the "sort by" and give the user
    an unsorted result, I'm asking why the the optimizer in cases like:

    select unique(a) from v_table_with_order_by;

    doesn't takes away the "order by" inside the view and puts it back "rewriting the
    query like this:

    select unique(a) from v_table_without_order_by
    order by a;

    then the user will not know about it. The result is the same but 30 times
    faster (in my case).

    Regards
    Gaetano Mendola
  • Tom Lane at Sep 13, 2010 at 4:48 pm

    Gaetano Mendola writes:
    Of course I'm not suggesting to take away the "sort by" and give the user
    an unsorted result, I'm asking why the the optimizer in cases like:
    select unique(a) from v_table_with_order_by;
    doesn't takes away the "order by" inside the view and puts it back "rewriting the
    query like this:
    select unique(a) from v_table_without_order_by
    order by a;
    That changes the order in which the rows are fed to unique(a). The
    principal real-world use for a non-top-level ORDER BY is exactly to
    determine the order in which rows are fed to a function, so we will
    have a revolt on our hands if we break that.

    regards, tom lane
  • Gaetano Mendola at Sep 13, 2010 at 5:09 pm

    On Mon, Sep 13, 2010 at 6:48 PM, Tom Lane wrote:
    Gaetano Mendola <mendola@gmail.com> writes:
    Of course I'm not suggesting to take away the "sort by" and give the user
    an unsorted result, I'm asking why the the optimizer in cases like:
    select unique(a) from v_table_with_order_by;
    doesn't takes away the "order by" inside the view and puts it back "rewriting the
    query like this:
    select unique(a) from v_table_without_order_by
    order by a;
    That changes the order in which the rows are fed to unique(a).  The
    principal real-world use for a non-top-level ORDER BY is exactly to
    determine the order in which rows are fed to a function, so we will
    have a revolt on our hands if we break that.
    I see your point, but some functions like: unique, count are not affected
    by the order of values fed, and I don't think either that unique has to
    give out the unique values in the same fed order.


    Regards
    Gaetano Mendola

    --
    cpp-today.blogspot.com
  • Maciek Sakrejda at Sep 13, 2010 at 7:02 pm

    I see your point, but some functions like: unique, count are not affected
    by the order of values fed, and I don't think either that unique has to
    give out the unique values in the same fed order.
    Sure. You'd need additional metadata about which aggregates care about
    sort order and which don't. Our system is more sensitive to this sort
    of thing and so we've actually implemented this, but in the absence of
    this "order-sensitive" flag, you have to assume sorts matter (or
    you're leaving a *lot* of room for shooting yourself in the foot).

    Even with this, it seems a little dodgy to mess up sort order in a
    top-level query. Relational databases are ostensibly relational, but I
    imagine in practice, it may be a toss-up in the trade-off between the
    performance benefits of what you are suggesting and the breaking of
    implicit non-relational behaviors that users have been taking for
    granted.

    ---
    Maciek Sakrejda | System Architect | Truviso

    1065 E. Hillsdale Blvd., Suite 215
    Foster City, CA 94404
    (650) 242-3500 Main
    www.truviso.com
  • Scott Marlowe at Sep 13, 2010 at 7:08 pm

    On Mon, Sep 13, 2010 at 11:09 AM, Gaetano Mendola wrote:
    On Mon, Sep 13, 2010 at 6:48 PM, Tom Lane wrote:
    Gaetano Mendola <mendola@gmail.com> writes:
    Of course I'm not suggesting to take away the "sort by" and give the user
    an unsorted result, I'm asking why the the optimizer in cases like:
    select unique(a) from v_table_with_order_by;
    doesn't takes away the "order by" inside the view and puts it back "rewriting the
    query like this:
    select unique(a) from v_table_without_order_by
    order by a;
    That changes the order in which the rows are fed to unique(a).  The
    principal real-world use for a non-top-level ORDER BY is exactly to
    determine the order in which rows are fed to a function, so we will
    have a revolt on our hands if we break that.
    I see your point, but some functions like:  unique, count are not affected
    by the order of values fed, and I don't think either that unique has to
    give out the unique values in the same fed order.
    First off, having a top level order by in a view is considered poor
    practice. It adds an overhead you may or may not need each time the
    view is accessed, and there's no simple way to avoid it once it's in
    there.

    On top of that you'd be adding complexity to the planner that would
    make it slower and more likely to make mistakes, all to fix a problem
    that I and most others don't have.

    --
    To understand recursion, one must first understand recursion.
  • Robert Haas at Sep 23, 2010 at 12:54 am

    On Mon, Sep 13, 2010 at 1:09 PM, Gaetano Mendola wrote:
    I see your point, but some functions like:  unique, count are not affected
    by the order of values fed, and I don't think either that unique has to
    give out the unique values in the same fed order.
    Gee, I'd sure expect it to.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Gnuoytr at Sep 23, 2010 at 3:05 am
    ---- Original message ----
    Date: Wed, 22 Sep 2010 20:54:22 -0400
    From: pgsql-performance-owner@postgresql.org (on behalf of Robert Haas <robertmhaas@gmail.com>)
    Subject: Re: [PERFORM] Useless sort by
    To: Gaetano Mendola <mendola@gmail.com>
    Cc: Tom Lane <tgl@sss.pgh.pa.us>,pgsql-performance@postgresql.org
    On Mon, Sep 13, 2010 at 1:09 PM, Gaetano Mendola wrote:
    I see your point, but some functions like:  unique, count are not affected
    by the order of values fed, and I don't think either that unique has to
    give out the unique values in the same fed order.
    Gee, I'd sure expect it to.
    Spoken like a dyed in the wool COBOL coder. The RM has no need for order; it's set based. I've dabbled in PG for some time, and my sense is increasingly that PG developers are truly code oriented, not database (set) oriented.

    robert
    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company

    --
    Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-performance
  • Greg Smith at Sep 23, 2010 at 4:01 am

    gnuoytr@rcn.com wrote:
    Spoken like a dyed in the wool COBOL coder. The RM has no need for order; it's set based. I've dabbled in PG for some time, and my sense is increasingly that PG developers are truly code oriented, not database (set) oriented.
    I can't tell if you meant for this to be insulting or my reading it that
    way is wrong, but it certainly wasn't put in a helpful tone. Let me
    summarize for you. You've been told that putting ORDER BY into a view
    is a generally poor idea anyway, that it's better to find ways avoid
    this class of concern altogether. There are significant non-obvious
    technical challenges behind actually implementing the behavior you'd
    like to see; the concerns raised by Tom and Maciek make your idea
    impractical even if it were desired. And for every person like yourself
    who'd see the benefit you're looking for, there are far more that would
    find a change in this area a major problem. The concerns around
    breakage due to assumed but not required aspects of the relational model
    are the ones the users of the software will be confused by, not the
    developers of it. You have the classification wrong; the feedback
    you've gotten here is from the developers being user oriented, not
    theory oriented or code oriented.

    --
    Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services and Support www.2ndQuadrant.us
    Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
    https://www.packtpub.com/postgresql-9-0-high-performance/book
  • Scott Marlowe at Sep 23, 2010 at 4:18 am

    On Wed, Sep 22, 2010 at 10:01 PM, Greg Smith wrote:
    gnuoytr@rcn.com wrote:
    Spoken like a dyed in the wool COBOL coder.  The RM has no need for order;
    it's set based.  I've dabbled in PG for some time, and my sense is
    increasingly that PG developers are truly code oriented, not database (set)
    oriented.
    I can't tell if you meant for this to be insulting or my reading it that way
    is wrong, but it certainly wasn't put in a helpful tone.  Let me summarize
    for you.  You've been told that putting ORDER BY into a view is a generally
    poor idea anyway, that it's better to find ways avoid this class of concern
    altogether.
    It's been a few years since I've read the SQL spec, but doesn't it
    actually forbid order by in views but pgsql allows it anyway?

    Like you said, order by in a view is a bad practice to get into, and
    it's definitely not what a "set oriented" person would do. it's what
    a code oriented person would do.
  • Robert Haas at Sep 23, 2010 at 4:46 am

    On Wed, Sep 22, 2010 at 11:05 PM, wrote:
    Spoken like a dyed in the wool COBOL coder.  The RM has no need for order; it's set based.  I've dabbled in PG for some time, and my sense is increasingly that PG developers are truly code oriented, not database (set) oriented.
    I'm struggling to think of an adequate response to this. I think I'm
    going to go with: huh?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Gaetano Mendola at Oct 25, 2010 at 10:24 am

    On 09/23/2010 05:05 AM, gnuoytr@rcn.com wrote:
    Spoken like a dyed in the wool COBOL coder. The RM has no need for order; it's set based. I've dabbled in PG for some time, and my sense is increasingly that PG developers are truly code oriented, not database (set) oriented.
    That's a bit harsh. Your sense if fooling you.

    Regards
    Gaetano Mendola
  • Heikki Linnakangas at Sep 14, 2010 at 7:10 am

    On 13/09/10 19:48, Tom Lane wrote:
    Gaetano Mendola<mendola@gmail.com> writes:
    Of course I'm not suggesting to take away the "sort by" and give the user
    an unsorted result, I'm asking why the the optimizer in cases like:
    select unique(a) from v_table_with_order_by;
    doesn't takes away the "order by" inside the view and puts it back "rewriting the
    query like this:
    select unique(a) from v_table_without_order_by
    order by a;
    That changes the order in which the rows are fed to unique(a). The
    principal real-world use for a non-top-level ORDER BY is exactly to
    determine the order in which rows are fed to a function, so we will
    have a revolt on our hands if we break that.
    You could check for volatile functions. I think this could be done
    safely. However, it doesn't seem worthwhile, it would be a fair amount
    of code, and it's not usually a good idea to put an ORDER BY in a view
    or subquery anyway unless you also have volatile functions in there, or
    you want to coerce the optimizer to choose a certain plan.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Maciek Sakrejda at Sep 14, 2010 at 3:09 pm
    You could check for volatile functions. I think this could be done safely.
    I don't think that's enough. A UDA like last() could have an immutable
    sfunc, but still be sensitive to the sort order. I think you'd need
    something like a special order-sensitive aggregate definition flag.

    ---
    Maciek Sakrejda | System Architect | Truviso

    1065 E. Hillsdale Blvd., Suite 215
    Foster City, CA 94404
    (650) 242-3500 Main
    www.truviso.com
  • Dave Crooke at Sep 14, 2010 at 4:15 pm
    I presume there is more usage of this view than just those 3 queries
    (otherwise, for a start there would be no need for d, e, f in the view
    definition)

    Why not just rewrite these 3 queries to go directly off the main table? Or,
    create a different view without the sort_by in its definition?

    Or, if these are used very frequently and performance is critical, consider
    (i) caching these results in the application layer, with logic to understand
    when they need to be updated, or (b) maintaining extra tables that just
    contain (a) (a,b) and (a,b,c)

    Objectively, it's always better to optimize the SQL and application level
    for the specific needs of the situation before concluding that the
    underlying database engine should do these optimizations automatically, and
    it seems like there are a number of options you could explore here.

    Cheers
    Dave
    On Mon, Sep 13, 2010 at 4:47 AM, Gaetano Mendola wrote:

    Hi all,
    I have a view v_table defined as following:

    select a,b,c,d,e,f
    from t_table
    sort by a,b,c;

    the usage pattern of this view is the following:

    select distinct(a) from v_table;
    select distinct(b) from v_table where a = "XXX";
    select distinct(c) from v_table where a = "XXX" and b = "YYYY";

    because of that sort in the view definition the first query above
    takes not less than 3 seconds. I have solved this performance issue
    removing the sort from the view definition and putting it in the
    select reducing the time from > 3secons to < 150ms.

    Can not the optimizer take rid of that useless sort on those
    kind of queries ?


    Regards
    Gaetano Mendola




    --
    Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-performance
  • Gaetano Mendola at Sep 14, 2010 at 5:07 pm

    On Tue, Sep 14, 2010 at 6:15 PM, Dave Crooke wrote:
    I presume there is more usage of this view than just those 3 queries
    (otherwise, for a start there would be no need for d, e, f in the view
    definition)

    Why not just rewrite these 3 queries to go directly off the main table? Or,
    create a different view without the sort_by in its definition?

    Or, if these are used very frequently and performance is critical, consider
    (i) caching these results in the application layer, with logic to understand
    when they need to be updated, or (b) maintaining extra tables that just
    contain (a) (a,b) and (a,b,c)

    Objectively, it's always better to optimize the SQL and application level
    for the specific needs of the situation before concluding that the
    underlying database engine should do these optimizations automatically, and
    it seems like there are a number of options you could explore here.
    Question here is not how to do it right, but how to make the optimizer smarter
    than it is now, taking rid of work not needed.

    Regards
    Gaetano Mendola

    --
    cpp-today.blogspot.com
  • Gnuoytr at Sep 23, 2010 at 1:51 pm
    I can't tell if you meant for this to be insulting or my reading it that way is wrong, but it certainly wasn't put in a helpful tone. Let me summarize for you. You've been told that putting ORDER BY into a view is a generally poor idea anyway, that it's better to find ways avoid this class of concern altogether. There are significant non-obvious technical challenges behind actually implementing the behavior you'd like to see; the concerns raised by Tom and Maciek make your idea impractical even if it were desired. And for every person like yourself who'd see the benefit you're looking for, there are far more that would find a change in this area a major problem. The concerns around breakage due to assumed but not required aspects of the relational model are the ones the users of the software will be confused by, not the developers of it. You have the classification wrong; the feedback you've gotten here is from the developers being user oriented, not theory oriented or
    c!
    ode oriented.
    --
    Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services and Support www.2ndQuadrant.us
    Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
    https://www.packtpub.com/postgresql-9-0-high-performance/book


    Not insulting, just amused bemusement. PG portrays itself as the best OS database, which it may well be. But it does so by stressing the row-by-agonizing-row approach to data. In other words, as just a record paradigm filestore for COBOL/java/C coders. I was expecting more Relational oomph. As Dr. Codd says: "A Relational Model of Data for Large Shared Data Banks". Less code, more data.

    robert
  • Scott Marlowe at Sep 23, 2010 at 5:11 pm

    On Thu, Sep 23, 2010 at 7:51 AM, wrote:
    Not insulting, just amused bemusement.  PG portrays itself as the best OS database, which it may well be.  But it does so by stressing the row-by-agonizing-row approach to data.  In other words, as just a record paradigm filestore for COBOL/java/C coders.  I was expecting more Relational oomph.  As Dr. Codd says:  "A Relational Model of Data for Large Shared Data Banks".  Less code, more data.
    So what, exactly, would give pgsql more relationally "oomph"?

    Your assertion feels pretty hand wavy right now.

    --
    To understand recursion, one must first understand recursion.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedSep 13, '10 at 9:53a
activeOct 25, '10 at 10:24a
posts19
users9
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase