FAQ
Hi,

in postgresql you have several possibilites to get the rank of items. A thread
earlier this year shows correlated subqueries (not very performant) and other
tricks and techniques to solve the ranking problem:

http://archives.postgresql.org/pgsql-general/2005-05/msg00157.php

The possibility to use a SHARED variable in plperl can be another nice way to
get a rank of items. ( good example showing SHARED in use is at
http://www.varlena.com/varlena/GeneralBits/114.php)

So i wrote two functions for my purpose:

CREATE OR REPLACE function ranking(int4) RETURNS int4 LANGUAGE 'plperl' AS $$
my %this;

$this{'punkte'} = shift;
$this{'ranking'} = $_SHARED{'prev'}{'ranking'}; # defaults to 0
$this{'count'} = $_SHARED{'prev'}{'count'} + 1; # defaults to 1

$this{'ranking'} = $this{'count'} unless
$this{'punkte'} == $_SHARED{'prev'}{'punkte'};

$_SHARED{'prev'} = \%this;
return $this{'ranking'};
$$;

CREATE OR REPLACE FUNCTION reset_ranking() RETURNS void LANGUAGE 'plperl' AS
$$
$_SHARED{'prev'} = undef;
$$;


Nice Thing: the function drops rankings which other ranking solutions in the
given thread can't. Like this:

rank | points
-------------
1 | 10
2 | 9
2 | 9
4 | 8
5 | 7

It drops rank 3 because we have to entries for second rank.

It would be even nice if you can write a genric ranking() function which takes
anyarray as an argument, but as far as i know you can't pass an "anyarray" to
a plperl function, right?


Now i can do the following in plpsql Function which updates a caching table
for me and it works fine:

PERFORM reset_ranking();

CREATE TEMP TABLE ranking AS
SELECT
*,
ranking(r1.gc_gesamtpunkte) AS rank
FROM (
SELECT
mg_name,
gc_gesamtpunkte
FROM temp_gc
ORDER BY gc_gesamtpunkte DESC, mg_name ASC
) AS r1
;

EXECUTE '
UPDATE temp_gc
SET gc_rank = ranking.rank
FROM ranking
WHERE temp_gc.mg_name = ranking.mg_name;
';

DROP TABLE ranking;


Problems arrise when you try to do the select and update step together without
any temporary table in between:

PERFORM reset_ranking();

UPDATE temp_gc SET gc_rank = ranking.rank
FROM (
SELECT
*,
ranking(r1.gc_gesamtpunkte) AS rank
FROM (
SELECT
mg_name,
gc_gesamtpunkte
FROM temp_gc
ORDER BY gc_gesamtpunkte DESC, mg_name ASC
) AS r1
) AS ranking
WHERE temp_gc.mg_name = ranking.mg_name;
';

I have a guess, what happens here: The order of the subselect statement is
dropped by the optimizer because the optimizer doesn't see the "side-effect"
of the ranking function. that's ok because using such functions isn't SQLish,
i guess.

Is there a way to FORCE the optimizer to keep things orders like the sql
statement author wanted it?

kind regards,
janning

Search Discussions

  • Tom Lane at Jul 12, 2005 at 10:03 pm

    Janning Vygen writes:
    I have a guess, what happens here: The order of the subselect statement is
    dropped by the optimizer because the optimizer doesn't see the "side-effect"
    of the ranking function.
    That guess is wrong.

    I think the problem is that you are trying to update multiple rows in
    the same statement, which would require a "reset ranking" between each
    row, which this approach doesn't provide for.

    The whole thing looks mighty fragile in other ways; anything involving a
    single global variable isn't going to work nicely in very many cases.
    Consider casting your solution as an aggregate instead...

    regards, tom lane
  • Tony Smith at Jul 12, 2005 at 10:40 pm
    When I was trying to connect my databse with jdbc, I
    got the following error message:

    org.postgresql.util.PSQLException: Connection
    rejected: FATAL: no pg_hba.conf entry for host
    "mydomain", user "xxxx", database "myDB", SSL off.

    When I run in dos console "psql myDB..." it works
    fine.

    My jdbc code is at a different at different domain
    from the database machine.

    Thanks,




    __________________________________
    Do you Yahoo!?
    Yahoo! Mail - Find what you need with new enhanced search.
    http://info.mail.yahoo.com/mail_250
  • Joshua D. Drake at Jul 12, 2005 at 10:52 pm

    Tony Smith wrote:
    When I was trying to connect my databse with jdbc, I
    got the following error message:

    org.postgresql.util.PSQLException: Connection
    rejected: FATAL: no pg_hba.conf entry for host
    "mydomain", user "xxxx", database "myDB", SSL off.
    You need to setup your pg_hba.conf to allow remote connections to
    the database for the user.

    http://www.postgresql.org/docs/7.4/static/client-authentication.html


    Sincerely,

    Joshua D. Drake


    When I run in dos console "psql myDB..." it works
    fine.

    My jdbc code is at a different at different domain
    from the database machine.

    Thanks,




    __________________________________
    Do you Yahoo!?
    Yahoo! Mail - Find what you need with new enhanced search.
    http://info.mail.yahoo.com/mail_250

    ---------------------------(end of broadcast)---------------------------
    TIP 4: Have you searched our list archives?

    http://archives.postgresql.org

    --
    Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
    PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
    Managed Services, Shared and Dedicated Hosting
    Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
  • Janning Vygen at Jul 13, 2005 at 1:18 pm
    Hi,

    Am Mittwoch, 13. Juli 2005 00:03 schrieb Tom Lane:
    Janning Vygen <vygen@gmx.de> writes:
    I have a guess, what happens here: The order of the subselect statement
    is dropped by the optimizer because the optimizer doesn't see the
    "side-effect" of the ranking function.
    That guess is wrong.
    ah, and i already thought to be already on a higher level of understanding
    postgresql...
    I think the problem is that you are trying to update multiple rows in
    the same statement, which would require a "reset ranking" between each
    row, which this approach doesn't provide for.
    no thats not the point, i guess (which might be wrong again)

    but i still don't understand why it doesn't work:

    this is my important query named *Q* :=

    SELECT
    *,
    ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank
    FROM (
    SELECT
    mg_name,
    gc_gesamtpunkte,
    gc_gesamtsiege
    FROM temp_gc
    ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC
    ) AS r1

    this way it works:

    CREATE TEMP TABLE ranking AS *Q*;
    EXECUTE 'UPDATE temp_gc SET gc_rank = ranking.rank
    FROM ranking WHERE temp_gc.mg_name = ranking.mg_name;';

    and this way it doesn't:

    UPDATE temp_gc
    SET gc_rank = ranking.rank
    FROM (*Q*)
    ranking
    WHERE temp_gc.mg_name = ranking.mg_name;
    ;

    i want to update multiple rows, but the all data in table temp_gc doesnt need
    a reset of the ranking.
    The whole thing looks mighty fragile in other ways; anything involving a
    single global variable isn't going to work nicely in very many cases.
    Consider casting your solution as an aggregate instead...
    I know that this is not the best solution but it is the fastest. A corrolated
    subquery with aggregates takes ages in opposite to the ranking function
    solution.

    But by the time of writing i have a new problem with my solution posted today
    with subject "strange error with temp table: pg_type_typname_nsp_index"

    kind regards,
    janning
  • Tom Lane at Jul 13, 2005 at 1:36 pm

    Janning Vygen writes:
    this way it works:
    CREATE TEMP TABLE ranking AS *Q*;
    EXECUTE 'UPDATE temp_gc SET gc_rank = ranking.rank
    FROM ranking WHERE temp_gc.mg_name = ranking.mg_name;';
    and this way it doesn't:
    UPDATE temp_gc
    SET gc_rank = ranking.rank
    FROM (*Q*)
    ranking
    WHERE temp_gc.mg_name = ranking.mg_name;
    It's difficult to be sure without looking at EXPLAIN output, but I would
    guess that the second query is being done with a plan that involves
    multiple scans of "*Q*", and that's confusing your function.

    regards, tom lane
  • Janning Vygen at Jul 14, 2005 at 4:16 pm

    Am Mittwoch, 13. Juli 2005 15:35 schrieb Tom Lane:
    Janning Vygen <vygen@gmx.de> writes:
    this way it works:

    CREATE TEMP TABLE ranking AS *Q*;
    EXECUTE 'UPDATE temp_gc SET gc_rank = ranking.rank
    FROM ranking WHERE temp_gc.mg_name = ranking.mg_name;';

    and this way it doesn't:

    UPDATE temp_gc
    SET gc_rank = ranking.rank
    FROM (*Q*)
    ranking
    WHERE temp_gc.mg_name = ranking.mg_name;
    It's difficult to be sure without looking at EXPLAIN output, but I would
    guess that the second query is being done with a plan that involves
    multiple scans of "*Q*", and that's confusing your function.

    regards, tom lane
    here you are. both versions with explain output
    first version creates temp table (explain no 1) and updates afterwards
    (explain no. 2).
    second version combines both (explain no.3 )

    [whats the best way to post explain output? My mailclient wraps the output. i
    hope it is still readable]

    no 1 ***** first create temp table *****
    QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------
    Subquery Scan r1 (cost=58.54..70.99 rows=830 width=44) (actual
    time=0.186..0.403 rows=7 loops=1)
    -> Sort (cost=58.54..60.62 rows=830 width=44) (actual time=0.106..0.137
    rows=7 loops=1)
    Sort Key: gc_gesamtpunkte, gc_gesamtsiege, mg_name
    -> Seq Scan on temp_gc (cost=0.00..18.30 rows=830 width=44) (actual
    time=0.015..0.052 rows=7 loops=1)
    Total runtime: 0.470 ms
    (5 Zeilen)

    no 2 **** update statement references temp table ****
    QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------
    Merge Join (cost=127.70..193.49 rows=4109 width=70) (actual
    time=0.221..0.404 rows=7 loops=1)
    Merge Cond: ("outer".mg_name = "inner".mg_name)
    -> Sort (cost=58.54..60.62 rows=830 width=66) (actual time=0.110..0.142
    rows=7 loops=1)
    Sort Key: temp_gc.mg_name
    -> Seq Scan on temp_gc (cost=0.00..18.30 rows=830 width=66) (actual
    time=0.013..0.055 rows=7 loops=1)
    -> Sort (cost=69.16..71.63 rows=990 width=36) (actual time=0.089..0.119
    rows=7 loops=1)
    Sort Key: ranking.mg_name
    -> Seq Scan on ranking (cost=0.00..19.90 rows=990 width=36) (actual
    time=0.006..0.042 rows=7 loops=1)
    Total runtime: 0.525 ms
    (9 Zeilen)

    no 3 **** combined update statement ****
    QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------
    Merge Join (cost=167.70..232.14 rows=3445 width=78) (actual
    time=0.455..0.774 rows=7 loops=1)
    Merge Cond: ("outer".mg_name = "inner".mg_name)
    -> Sort (cost=58.54..60.62 rows=830 width=66) (actual time=0.111..0.142
    rows=7 loops=1)
    Sort Key: temp_gc.mg_name
    -> Seq Scan on temp_gc (cost=0.00..18.30 rows=830 width=66) (actual
    time=0.016..0.057 rows=7 loops=1)
    -> Sort (cost=109.16..111.23 rows=830 width=44) (actual time=0.248..0.281
    rows=7 loops=1)
    Sort Key: r1.mg_name
    -> Subquery Scan r1 (cost=58.54..68.92 rows=830 width=44) (actual
    time=0.102..0.201 rows=7 loops=1)
    -> Sort (cost=58.54..60.62 rows=830 width=44) (actual
    time=0.092..0.125 rows=7 loops=1)
    Sort Key: gc_gesamtpunkte, gc_gesamtsiege, mg_name
    -> Seq Scan on temp_gc (cost=0.00..18.30 rows=830
    width=44) (actual time=0.008..0.045 rows=7 loops=1)
    Total runtime: 0.886 ms
    (12 Zeilen)

    and this is the combined statement:

    UPDATE temp_gc
    SET gc_rank = ranking.rank
    FROM (

    SELECT
    *,
    ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank
    FROM (
    SELECT
    mg_name,
    gc_gesamtpunkte,
    gc_gesamtsiege
    FROM temp_gc
    ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC
    ) AS r1
    ) AS ranking
    WHERE temp_gc.mg_name = ranking.mg_name;


    to me it looks like the call to ranking() is moved from the subquery plan
    "upwards". but i really can't interpret this explain output regarding to
    "where" the ranking funcion is called.

    my problem is that getting a rank out of items is very expensive with
    aggregate functions, so i try to do a trick here which is not very relational
    indeed and this of course can lead to trouble.

    Maybe i have to rethink the whole stuff. (because i get some temp table errors
    anyway as mentioned in the other thread)

    Kind regards,
    Janning Vygen
  • Tom Lane at Jul 14, 2005 at 4:54 pm

    Janning Vygen writes:
    and this is the combined statement:
    UPDATE temp_gc
    SET gc_rank = ranking.rank
    FROM (
    SELECT
    *,
    ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank
    FROM (
    SELECT
    mg_name,
    gc_gesamtpunkte,
    gc_gesamtsiege
    FROM temp_gc
    ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC
    ) AS r1
    ) AS ranking
    WHERE temp_gc.mg_name = ranking.mg_name;
    to me it looks like the call to ranking() is moved from the subquery plan
    "upwards". but i really can't interpret this explain output regarding to
    "where" the ranking funcion is called.
    Yeah, I think you are exactly right. The EXPLAIN output doesn't show
    targetlist contents (EXPLAIN VERBOSE would, but not very readably :-(),
    but the small difference in the estimated costs seems to indicate that
    the function evaluation is not happening at the level of the "subquery
    scan" plan node in the combined plan. This is consistent with the
    planner's behavior in general. It will honor the "order by" in the
    sense that the output of the "r1" subquery is delivered to the upper
    plan level in that order, but it feels no compunction about flattening
    the "ranking" subquery into the outer query, and after that you cannot
    be certain about the order in which the executions of the ranking()
    function happen.

    What you need is to prevent the flattening of the intermediate subquery.
    The current favorite technique is to insert a dummy OFFSET clause:

    UPDATE temp_gc
    SET gc_rank = ranking.rank
    FROM (

    SELECT
    *,
    ranking(r1.gc_gesamtpunkte, r1.gc_gesamtsiege) AS rank
    FROM (
    SELECT
    mg_name,
    gc_gesamtpunkte,
    gc_gesamtsiege
    FROM temp_gc
    ORDER BY gc_gesamtpunkte DESC, gc_gesamtsiege DESC, mg_name ASC
    ) AS r1
    OFFSET 0
    ) AS ranking
    WHERE temp_gc.mg_name = ranking.mg_name;

    although you could also do it by inserting an ORDER BY at that level.

    BTW, there's been some discussion of preventing flattening of subqueries
    whose SELECT lists contain volatile functions. If we did that then
    declaring ranking() as volatile would be enough to avoid the problem.
    I've been hesitant to make the change because I'm concerned about the
    probable loss of optimization in cases where the function is labeled
    volatile by default, merely because the author didn't think about how
    to mark it. And in any case "volatile" doesn't really describe the
    issue with your function...
    my problem is that getting a rank out of items is very expensive with
    aggregate functions,
    I'm unconvinced --- I don't see a reason that you can't implement it
    exactly the same way as an aggregate. The only difference is that the
    state value is an aggregate state value instead of a global variable.
    There might be some extra palloc overhead, but nothing worse.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedJul 12, '05 at 9:08p
activeJul 14, '05 at 4:54p
posts8
users5
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase