Hello,

I have a typical many to many join table, in this instance it is
capturing the multiplicity described as "one person can have many
rooms and one room can have many persons". Further the join expresses
where in the room the person is sitting, a seat number. I am creating
a function to abstract this away, if there is no record with the same
person and room the insert otherwise if it already exists update the
record with the new seat value.

create table person_room (
id serial,
person_id int,
room_id int,
seat varchar(255),
unique (person_id, room_id)
);

-- version 1
create or replace function add_person_to_room(person int, room int, s
varchar(255)) as $$
begin
insert into person_room(person_id, room_id, seat) values (person, room, s);
exception when unique_violation then
update person_room set seat = s where (person_id = person) and
(room_id = room);
end;
$$ language 'plpgsql';

-- version 2
create or replace function add_person_to_room(person int, room int, s
varchar(255)) as $$
declare
i int;
begin
select into i id from person_room where (person_id = person) and
(room_id = room);
if (not found) then
insert into person_room(person_id, room_id, seat) values
(person, room, s);
else
update person_room set seat = s where (person_id = person) and
(room_id = room);
end;
$$ language 'plpgsql';


Which version is faster?
Does the exception mechanism add any overhead?
Which is more cleaner?

-ravi

Search Discussions

  • Stephen Frost at Apr 1, 2008 at 12:52 am

    * Ravi Chemudugunta (chemuduguntar@gmail.com) wrote:
    Which version is faster?
    In general I would recommend that you benchmark them using
    as-close-to-real load as possible again as-real-as-possible data.
    Does the exception mechanism add any overhead?
    Yes, using exceptions adds a fair bit of overhead. Quote from the
    documentation found here:
    http://www.postgresql.org/docs/8.3/static/plpgsql-control-structures.html

    Tip: A block containing an EXCEPTION clause is significantly more
    expensive to enter and exit than a block without one. Therefore, don't
    use EXCEPTION without need.
    Which is more cleaner?
    That would be in the eye of the beholder, generally. Given the lack of
    complexity, I don't think 'cleanness' in this case really matters all
    that much.

    Enjoy,

    Stephen
  • Stephen Denne at Apr 1, 2008 at 12:56 am
    Stephen Frost wrote
    * Ravi Chemudugunta (chemuduguntar@gmail.com) wrote:
    Which version is faster?
    In general I would recommend that you benchmark them using
    as-close-to-real load as possible again as-real-as-possible data.
    Does the exception mechanism add any overhead?
    Yes, using exceptions adds a fair bit of overhead. Quote from the
    documentation found here:
    http://www.postgresql.org/docs/8.3/static/plpgsql-control-stru
    ctures.html

    Tip: A block containing an EXCEPTION clause is significantly more
    expensive to enter and exit than a block without one. Therefore, don't
    use EXCEPTION without need.
    Which is more cleaner?
    That would be in the eye of the beholder, generally. Given
    the lack of
    complexity, I don't think 'cleanness' in this case really matters all
    that much.
    A third option is to update, if not found, insert.

    Regards,
    Stephen Denne.

    Disclaimer:
    At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.
    __________________________________________________________________
    This email has been scanned by the DMZGlobal Business Quality
    Electronic Messaging Suite.
    Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
    __________________________________________________________________
  • Robins Tharakan at Apr 1, 2008 at 2:26 am
    Coming to think of it.

    Would it fine to consider that an UPDATE query that found no records to
    update is (performance wise) the same as a SELECT query with the same WHERE
    clause ?

    As in, does an UPDATE query perform additional overhead even before it finds
    the record to work on ?

    *Robins*

    On Tue, Apr 1, 2008 at 7:53 AM, Robins Tharakan wrote:

    I get into these situations quite often and use exactly what stephen
    pointed out.

    Do an Update, but if not found, do an insert. Its (by and large) better
    than your version 2 since here you may skip running the second query (if the
    record exists) but in version 2, two queries are *always* run. And
    considering that exception is heavy, this may be a good attempt to give a
    try as well.

    update person_room set seat = s where (person_id = person) and (room_id =
    room);
    if not found then
    insert into person_room(person_id, room_id, seat) values (person, room,
    s);
    end if

    Robins



    On Tue, Apr 1, 2008 at 6:26 AM, Stephen Denne <
    Stephen.Denne@datamail.co.nz> wrote:
    Stephen Frost wrote
    * Ravi Chemudugunta (chemuduguntar@gmail.com) wrote:
    Which version is faster?
    In general I would recommend that you benchmark them using
    as-close-to-real load as possible again as-real-as-possible data.
    Does the exception mechanism add any overhead?
    Yes, using exceptions adds a fair bit of overhead. Quote from the
    documentation found here:
    http://www.postgresql.org/docs/8.3/static/plpgsql-control-stru
    ctures.html

    Tip: A block containing an EXCEPTION clause is significantly more
    expensive to enter and exit than a block without one. Therefore, don't
    use EXCEPTION without need.
    Which is more cleaner?
    That would be in the eye of the beholder, generally. Given
    the lack of
    complexity, I don't think 'cleanness' in this case really matters all
    that much.
    A third option is to update, if not found, insert.

    Regards,
    Stephen Denne.

    Disclaimer:
    At the Datamail Group we value team commitment, respect, achievement,
    customer focus, and courage. This email with any attachments is confidential
    and may be subject to legal privilege. If it is not intended for you please
    advise by reply immediately, destroy it and do not copy, disclose or use it
    in any way.
    __________________________________________________________________
    This email has been scanned by the DMZGlobal Business Quality
    Electronic Messaging Suite.
    Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
    __________________________________________________________________



    --
    Sent via pgsql-performance mailing list (
    pgsql-performance@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-performance
  • Tom Lane at Apr 1, 2008 at 4:29 am

    "Robins Tharakan" <tharakan@gmail.com> writes:
    Would it fine to consider that an UPDATE query that found no records to
    update is (performance wise) the same as a SELECT query with the same WHERE
    clause ?
    As in, does an UPDATE query perform additional overhead even before it finds
    the record to work on ?
    The UPDATE would fire BEFORE STATEMENT and AFTER STATEMENT triggers, if
    there are any. Also, it would take a slightly stronger lock on the
    table, which might result in blocking either the UPDATE itself or some
    concurrent query where a plain SELECT would not've.

    There might be some other corner cases I've forgotten. But in the basic
    case I think your assumption is correct.

    regards, tom lane
  • James Mansion at Apr 2, 2008 at 7:16 pm

    Stephen Denne wrote:
    A third option is to update, if not found, insert.
    I find myself having to do this in Sybase, but it sucks because there's
    a race - if there's no row updated then there's no lock and you race
    another thread doing the same thing. So you grab a row lock on a
    sacrificial row used as a mutex, or just a table lock. Or you just
    accept that sometimes you have to detect the insert fail and retry the
    whole transaction. Which is sucky however you look at it.

    I think the 'update or insert' or 'merge' extensions make a degree
    of sense. At least in psql one can use the lightweight lock manager.
  • Chemuduguntar at Apr 3, 2008 at 9:24 am

    I find myself having to do this in Sybase, but it sucks because there's
    a race - if there's no row updated then there's no lock and you race
    another thread doing the same thing. So you grab a row lock on a
    sacrificial row used as a mutex, or just a table lock. Or you just
    accept that sometimes you have to detect the insert fail and retry the
    whole transaction. Which is sucky however you look at it.
    hmm should I be worried ?

    I am doing an 'update if not found insert', in some cases I have found
    that I need to select anyway, for e.g. take away 20 dollars from this
    person;

    (all variables prefixed with _ are local variables)

    select into _money money from person_money where person_id = _person;
    if (not found) then
    insert into person_money (person_id, money) values (_person, -
    _requested_amount);
    else
    update person_money set money = money - _requested_amount where
    person_id = _person;
    -- return new quantity
    return _money - _requested_quantity; -- <- i need the quantity so I
    have to select here.
    end if;

    if I am not mistaken your are saying that between the select and the
    if (not found) then ... end if; block ... another concurrent process
    could be executing the same thing and insert ... while in the first
    thread found is still 'false' and so it ends up inserting and over
    writing / causing a unique violation or some kind?

    BTW, I did a benchmark with and without exceptions, the exceptions
    version was very slow, so slow that I ended up killing it ... I am
    sure it would have taken atleast 5 hours (was already 3 hours in) ...
    versus, 25 mins! I guess the trouble was that I was using exceptions
    to overload 'normal' flow ... i.e. update if exists else update is not
    an exceptional circumstance and so exceptions are a bad choice.

    It would be interesting to see how much overhead exception containing
    functions present when they do not throw any exceptions ... for never
    to every few records to all the time ... maybe I will try it with my
    parsing functions (which catch exceptions thrown by substring()).
  • Robins Tharakan at Apr 3, 2008 at 1:28 pm
    I think James was talking about Sybase. Postgresql on the other hand has a
    slightly better way to do this.

    SELECT ... FOR UPDATE allows you to lock a given row (based on the SELECT
    ... WHERE clause) and update it... without worrying about a concurrent
    modification. Of course, if the SELECT ... WHERE didn't bring up any rows,
    you would need to do an INSERT anyway.

    Read more about SELECT ... FOR UPDATE here:
    http://www.postgresql.org/docs/8.3/static/sql-select.html#SQL-FOR-UPDATE-SHARE

    *Robins*
    On Thu, Apr 3, 2008 at 2:48 PM, chemuduguntar@gmail.com wrote:

    I find myself having to do this in Sybase, but it sucks because there's
    a race - if there's no row updated then there's no lock and you race
    another thread doing the same thing. So you grab a row lock on a
    sacrificial row used as a mutex, or just a table lock. Or you just
    accept that sometimes you have to detect the insert fail and retry the
    whole transaction. Which is sucky however you look at it.
    hmm should I be worried ?

    I am doing an 'update if not found insert', in some cases I have found
    that I need to select anyway, for e.g. take away 20 dollars from this
    person;

    (all variables prefixed with _ are local variables)

    select into _money money from person_money where person_id = _person;
    if (not found) then
    insert into person_money (person_id, money) values (_person, -
    _requested_amount);
    else
    update person_money set money = money - _requested_amount where
    person_id = _person;
    -- return new quantity
    return _money - _requested_quantity; -- <- i need the quantity so I
    have to select here.
    end if;

    if I am not mistaken your are saying that between the select and the
    if (not found) then ... end if; block ... another concurrent process
    could be executing the same thing and insert ... while in the first
    thread found is still 'false' and so it ends up inserting and over
    writing / causing a unique violation or some kind?

    BTW, I did a benchmark with and without exceptions, the exceptions
    version was very slow, so slow that I ended up killing it ... I am
    sure it would have taken atleast 5 hours (was already 3 hours in) ...
    versus, 25 mins! I guess the trouble was that I was using exceptions
    to overload 'normal' flow ... i.e. update if exists else update is not
    an exceptional circumstance and so exceptions are a bad choice.

    It would be interesting to see how much overhead exception containing
    functions present when they do not throw any exceptions ... for never
    to every few records to all the time ... maybe I will try it with my
    parsing functions (which catch exceptions thrown by substring()).

    --
    Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-performance
  • James Mansion at Apr 6, 2008 at 8:17 pm

    Robins Tharakan wrote:
    I think James was talking about Sybase. Postgresql on the other hand
    has a slightly better way to do this.

    SELECT ... FOR UPDATE allows you to lock a given row (based on the
    SELECT ... WHERE clause) and update it... without worrying about a
    concurrent modification. Of course, if the SELECT ... WHERE didn't
    bring up any rows, you would need to do an INSERT anyway.
    How does that help?

    If the matching row doesn't exist at that point - what is there to get
    locked?

    The problem is that you need to effectively assert a lock on the primary
    key so that you can update
    the row (if it exists) or insert a row with that key (if it doesn't)
    without checking and then inserting and
    finding that some other guy you were racing performed the insert and you
    get a duplicate key error.

    How does Postgresql protect against this?

    James
  • Ravi Chemudugunta at Apr 1, 2008 at 1:23 am
    Hi, thanks for the quick reply.
    In general I would recommend that you benchmark them using
    as-close-to-real load as possible again as-real-as-possible data.
    I am running a benchmark with around 900,000 odd records (real-load on
    the live machine :o ) ... should show hopefully some good benchmarking
    results for the two methods.
    That would be in the eye of the beholder, generally. Given the lack of
    complexity, I don't think 'cleanness' in this case really matters all
    that much.
    I would like to make a comment that is that the only downside I saw of
    using the exception approach was that if for some reason someone
    forgot to add the unique constraint to the table, it would be a bit of
    a nightmare-ness. (I am porting some code into the server where the
    schema does not have these constraints setup, only in the devel
    database).

    Will reply back with my conclusions, I am expecting a large difference.

    Cheers,

    ravi
  • Decibel! at Apr 9, 2008 at 8:00 pm

    On Mar 31, 2008, at 8:23 PM, Ravi Chemudugunta wrote:
    In general I would recommend that you benchmark them using
    as-close-to-real load as possible again as-real-as-possible data.
    I am running a benchmark with around 900,000 odd records (real-load on
    the live machine :o ) ... should show hopefully some good benchmarking
    results for the two methods.

    Please do, and please share. I know the docs say that exception
    blocks make things "significantly" more expensive, but I think that
    the community also sometimes loses the forest for the tree. Setting
    up a savepoint (AFAIK that's the actual expense in the exception
    block) is fairly CPU-intensive, but it's not common for a database
    server to be CPU-bound, even for OLTP. You're usually still waiting
    on disk.
    --
    Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org
    Give your computer some brain candy! www.distributed.net Team #1828

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-performance @
categoriespostgresql
postedApr 1, '08 at 12:23a
activeApr 9, '08 at 8:00p
posts11
users7
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase