Dear All,

I have just generate a new patch of MERGE command.

One main change in this edition is the removal of RASIE ERROR action from
MEREG, because its semantics is not well defined yet.

I also rewrote the regress test file merge.sql, to make it consistent with
the examples I used in my wiki page.

Some little (error and warning) bugs are fixed.

In this patch, all the main features of MERGE (sublinks, explain, rule and
trigger, inheritance) are not changed. And so is the DO NOTHING action.

I do wish the MERGE command can be added into psql 9.1. And I wonder what
improvement should be made on current edition.

Could you please have a review on this patch, if you have time and interest?

Your feedback will be highly appreciated.

Thanks

Yours Boxuan

Search Discussions

  • Thom Brown at Sep 23, 2010 at 11:00 am

    On 23 September 2010 11:31, Boxuan Zhai wrote:
    Dear All,
    I have just generate a new patch of MERGE command.
    One main change in this edition is the removal of RASIE ERROR action from
    MEREG, because its semantics is not well defined yet.
    I also rewrote the regress test file merge.sql, to make it consistent with
    the examples I used in my wiki page.
    Some little (error and warning) bugs are fixed.
    In this patch, all the main features of MERGE (sublinks, explain, rule and
    trigger, inheritance) are not changed. And so is the DO NOTHING action.
    I do wish the MERGE command can be added into psql 9.1. And I wonder what
    improvement should be made on current edition.
    Could you please have a review on this patch, if you have time and interest?
    Your feedback will be highly appreciated.
    Thanks

    Yours Boxuan
    A few corrections:

    in src/backend/executor/nodeModifyTable.c:
    s/excute/execute/
    s/orignial/original/

    in src/backend/optimizer/plan/planner.c
    s/expreesions/expressions/
    s/So,we/So, we/
    s/comand/command/
    s/fileds/fields/

    in src/backend/optimizer/prep/preptlist.c:
    s/aggresive/aggressive/

    in src/backend/optimizer/util/var.c
    s/targe/target/ -- this appears twice
    s/sourse/source/

    in src/backend/parser/analyze.c
    s/takend/taken/
    s/relaion/relation/
    s/targe/target/ -- this appears twice
    s/consitency/consistency/
    s/commond/command/
    s/seperate/separate/

    in src/backend/rewrite/rewriteHandler.c
    s/acton/action/

    in src/include/nodes/execnodes.h
    s/meger/merge/

    in src/include/nodes/parsenodes.h
    s/proecess/process/
    s/allwo/allow/
    s/elments/elements/

    in src/test/regress/expected/merge.out
    s/qulifications/qualifications/ -- this appears twice
    s/suceeds/succeeds/ -- this appears twice

    --
    Thom Brown
    Twitter: @darkixion
    IRC (freenode): dark_ixion
    Registered Linux user: #516935
  • Marko Tiikkaja at Sep 23, 2010 at 11:55 am

    On 2010-09-23 1:31 PM +0300, Boxuan Zhai wrote:
    I have just generate a new patch of MERGE command.
    I haven't followed the discussion very closely, but this part in the
    regression tests caught my attention:

    +-- we now have a duplicate key in Buy, so when we join to
    +-- Stock we will generate 2 matching rows, not one.
    +-- According to standard this command should fail.
    +-- But it suceeds in PostgreSQL implementation by simply ignoring the
    second

    It doesn't seem like a very good idea to go against the standard here.
    The "second" row is not well defined in this case so the results are
    unpredictable.


    The patch is also missing a (trivial) change to explain.c.


    Regards,
    Marko Tiikkaja
  • Boxuan Zhai at Sep 23, 2010 at 1:49 pm

    On Thu, Sep 23, 2010 at 7:55 PM, Marko Tiikkaja wrote:
    On 2010-09-23 1:31 PM +0300, Boxuan Zhai wrote:

    I have just generate a new patch of MERGE command.
    I haven't followed the discussion very closely, but this part in the
    regression tests caught my attention:

    +-- we now have a duplicate key in Buy, so when we join to
    +-- Stock we will generate 2 matching rows, not one.
    +-- According to standard this command should fail.
    +-- But it suceeds in PostgreSQL implementation by simply ignoring the
    second

    It doesn't seem like a very good idea to go against the standard here. The
    "second" row is not well defined in this case so the results are
    unpredictable.
    Yes, the result is uncertain. It depends on which row is scanned first,
    which is almost out of the control of users.

    But, in postgres, this is what the system do for UPDATE.

    For example, consider a simple update query like the following:

    CREATE TABLE target (id int, val int);
    INSERT INTO target VALUES (1, 10);

    CREATE TABLE source (id int, add int);
    INSERT INTO source VALUES (1, 100);
    INSERT INTO source VALUES (1, 100000);

    -- DO the update query with source table, which has multiple matched rows

    UPDATE target SET val = val + add FROM source
    WHERE source.id = target.id;

    t=# SELECT * FROM target;
    id | val
    ----+-----
    1 | 110
    (1 row)

    The target tuple has two matched source tuples, but it is only updated once.
    And, yet, this query is not forbidden by postgres. The result is also
    uncertain.

    The patch is also missing a (trivial) change to explain.c.
    Sorry, I massed up the files. Here comes the new patch file, with EXPLAIN in
    it.

    Regards,
    Marko Tiikkaja
  • Greg Smith at Sep 29, 2010 at 6:44 am
    Starting looking at the latest MERGE patch from Boxuan here tonight. The
    regression tests pass for me here, good starting sign. I expect to move
    onto trying to break it more creatively next, then onto performance
    tests. Nothing much more exciting than that to report so far.

    It had suffered some bit rot, I think because of the security label
    changes. Attached is a rebased version against the new git HEAD so
    nobody else has to duplicate that to apply the patch. Also, to provide
    an alternate interface for anyone who wants to do testing/browsing of
    this patch, I've made a Github fork with a merge branch in it. I plan to
    commit intermediate stuff to there that keeps up to date with review
    changes: http://github.com/greg2ndQuadrant/postgres/tree/merge

    Probably easier to read
    http://github.com/greg2ndQuadrant/postgres/compare/merge than most local
    patch viewers, so I gzip'ed the attached updated patch to save some bytes.

    One compiler warning I noticed that needs to get resolved:

    src/backend/commands/explain.c:

    explain.c: In function ‘ExplainMergeActions’:
    explain.c:1528: warning: comparison of distinct pointer types lacks a cast

    That is complaining about this section:

    if (mt_planstate->operation != CMD_MERGE ||
    mt_planstate->mt_mergeActPstates == NIL)
    return;

    So presumably that comparison with NIL needs a cast. Once I get more
    familiar with the code I'll fix that myself if Boxuan doesn't offer a
    suggestion first.

    The rest of the compiler warnings I saw didn't look related to his code,
    maybe stuff my picky Ubuntu compiler is noticing that was done recently
    to HEAD. I haven't checked HEAD without this patch yet to confirm, and
    am done for the night now. Here's the list if anyone is interested:

    Warning in src/backend/parser/scan.c:

    gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
    -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
    -fwrapv -g -I../../../src/include -D_GNU_SOURCE -c -o index.o index.c
    -MMD -MP -MF .deps/index.Po
    In file included from gram.y:12172:
    scan.c: In function ‘yy_try_NUL_trans’:
    scan.c:16256: warning: unused variable ‘yyg’

    Warning in src/backend/utils/error/elog.c:

    gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
    -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
    -fwrapv -g -I../../../../src/include -D_GNU_SOURCE -c -o ts_cache.o
    ts_cache.c -MMD -MP -MF .deps/ts_cache.Po
    elog.c: In function ‘write_console’:
    elog.c:1698: warning: ignoring return value of ‘write’, declared with
    attribute warn_unused_result
    elog.c: In function ‘write_pipe_chunks’:
    elog.c:2388: warning: ignoring return value of ‘write’, declared with
    attribute warn_unused_result
    elog.c:2397: warning: ignoring return value of ‘write’, declared with
    attribute warn_unused_result

    Warning in src/bin/scripts/common.c:

    gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
    -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
    -fwrapv -g -I. -I. -I../../../src/interfaces/libpq
    -I../../../src/bin/pg_dump -I../../../src/include -D_GNU_SOURCE -c -o
    input.o input.c -MMD -MP -MF .deps/input.Po
    common.c: In function ‘handle_sigint’:
    common.c:247: warning: ignoring return value of ‘write’, declared with
    attribute warn_unused_result
    common.c:250: warning: ignoring return value of ‘write’, declared with
    attribute warn_unused_result
    common.c:251: warning: ignoring return value of ‘write’, declared with
    attribute warn_unused_result

    --
    Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services and Support www.2ndQuadrant.us
    Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
    https://www.packtpub.com/postgresql-9-0-high-performance/book
  • Josh Kupershmidt at Sep 29, 2010 at 1:13 pm

    On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith wrote:

    The rest of the compiler warnings I saw didn't look related to his code,
    maybe stuff my picky Ubuntu compiler is noticing that was done recently to
    HEAD. I haven't checked HEAD without this patch yet to confirm, and am done
    for the night now. Here's the list if anyone is interested:

    Warning in src/backend/parser/scan.c:

    gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
    -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
    -I../../../src/include -D_GNU_SOURCE -c -o index.o index.c -MMD -MP -MF
    .deps/index.Po
    In file included from gram.y:12172:
    scan.c: In function ‘yy_try_NUL_trans’:
    scan.c:16256: warning: unused variable ‘yyg’
    Known problem: http://archives.postgresql.org/pgsql-hackers/2009-07/msg00657.php

    I'm pretty sure I've seen the warn_unused_result warnings on HEAD as well.

    Josh
  • Robert Haas at Sep 29, 2010 at 1:16 pm

    On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith wrote:
    One compiler warning I noticed that needs to get resolved:

    src/backend/commands/explain.c:

    explain.c: In function ‘ExplainMergeActions’:
    explain.c:1528: warning: comparison of distinct pointer types lacks a cast

    That is complaining about this section:

    if (mt_planstate->operation != CMD_MERGE ||
    mt_planstate->mt_mergeActPstates == NIL)
    return;

    So presumably that comparison with NIL needs a cast. Once I get more
    familiar with the code I'll fix that myself if Boxuan doesn't offer a
    suggestion first.
    Possibly NULL was meant instead of NIL. NIL is specifically for a List.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Boxuan Zhai at Sep 29, 2010 at 1:29 pm

    On Wed, Sep 29, 2010 at 9:16 PM, Robert Haas wrote:
    On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith wrote:
    One compiler warning I noticed that needs to get resolved:

    src/backend/commands/explain.c:

    explain.c: In function ‘ExplainMergeActions’:
    explain.c:1528: warning: comparison of distinct pointer types lacks a cast
    That is complaining about this section:

    if (mt_planstate->operation != CMD_MERGE ||
    mt_planstate->mt_mergeActPstates == NIL)
    return;

    So presumably that comparison with NIL needs a cast. Once I get more
    familiar with the code I'll fix that myself if Boxuan doesn't offer a
    suggestion first.
    Possibly NULL was meant instead of NIL. NIL is specifically for a List.
    Yes, it should be NULL instead of NIL.

    At first, I designed this filed as a List. But I changed it into an array in
    the last editions. This is why I have an unmatched assignment here. Sorry
    for that.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Greg Smith at Oct 18, 2010 at 12:20 am

    Boxuan Zhai wrote:
    Yes, it should be NULL instead of NIL.
    OK, I applied that patch to my local copy and pushed it out to github:

    http://github.com/greg2ndQuadrant/postgres/commit/9013ba9e81490e3623add1b029760817021297c0

    That represents what I tested against. However, when I tried to merge
    against HEAD once I was finished, I discovered this patch has bit-rotted
    significantly. If you have a local copy that works for you, I would not
    recommend pulling in the PostgreSQL repo updates done in the last couple
    of weeks yet. Friday's "Allow WITH clauses to be attached to INSERT,
    UPDATE, DELETE statements" commit in particular conflicts quite a bit
    with your changes. Attached is a rebased patch that applies to HEAD now
    after a round of fixes to resolve those. But it doesn't work yet,
    because of recent change to the ExecUpdate and ExecDelete functions
    you're calling from src/backend/executor/nodeModifyTable.c inside
    ExecMerge. If you can continue working on the patch without merging
    recent repo work, I can hack away at fixing that once I figure out what
    got changed there recently. It's taken some painful git work to sort
    out what I've done so far, there's more left to do, and I know that's
    not an area you specialize in.

    Back to the feature review...I dove into how I expected this to work,
    relative to what it actually does at the moment. That didn't really go
    too well so far, but I don't know that this represents any fundamental
    issue with the patch. Just situations the existing code didn't really
    anticipate we have to flush out. As a general community FYI here, while
    it's taken me a while to get up to speed on this whole feature, I expect
    to keep chugging away on this regardless of the CommitFest boundaries.
    This feature is both too big and too important to just stop working on
    it because a date has passed.

    Onto the test cases. The examples that Boxuan has been working with,
    and that the regression tests included with the patch exercise, all
    involve two tables being joined together using MERGE. The use case I
    decided to test instead was when you're trying to simulate an UPSERT
    where only a single table is involved. I couldn't get to this to work
    correctly. Maybe I'm just using MERGE wrong here, but I tried so many
    different variations without success (including one that's basically
    copied from Simon's original regression test set suggestions) that I
    suspect there may be a subtle problem with the implementation instead.

    To replicate the most straightforward variations of what I ran into, you
    can start with the same data that's populated by the regression test set:

    CREATE TABLE Stock(item_id int UNIQUE, balance int);
    INSERT INTO Stock VALUES (10, 2200);
    INSERT INTO Stock VALUES (20, 1900);
    SELECT * FROM Stock;

    item_id | balance
    ---------+---------
    10 | 2200
    20 | 1900

    If you now execute the following:

    MERGE INTO Stock t
    USING (SELECT * FROM Stock WHERE item_id=10) AS s
    ON s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (10,1)
    ;

    This works fine, and updates the matching row:

    item_id | balance
    ---------+---------
    20 | 1900
    10 | 2201


    But if I give it a key that doesn't exist instead:

    MERGE INTO Stock t
    USING (SELECT * FROM Stock WHERE item_id=30) AS s
    ON s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (30,1)
    ;

    This doesn't execute the NOT MATCHED case and INSERT the way I expected
    it to. It just gives back "MERGE 0".

    Since I wasn't sure if the whole "subquery in the USING clause" case was
    really implemented fully, I then tried to do this with something more
    like the working regression test examples. I expected this to do the
    same thing as the first example:

    MERGE INTO Stock t
    USING Stock s
    ON s.item_id=10 AND s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (10,1)
    ;

    But it gives back this:

    ERROR: duplicate key value violates unique constraint "stock_item_id_key"
    DETAIL: Key (item_id)=(10) already exists.

    Can't tell from that whether it's hitting the MATCHED or NOT MATCHED
    side of things to generate that. But it doesn't work any better if you
    give it an example that doesn't exist:

    MERGE INTO Stock t
    USING Stock s
    ON s.item_id=30 AND s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (30,1)
    ;

    ERROR: duplicate key value violates unique constraint "stock_item_id_key"
    DETAIL: Key (item_id)=(30) already exists.

    Now that one is really weird, because that key value certainly doesn't
    exist yet in the table. There seem to be a couple of issues in the area
    of joining a table with itself here. Feel free to tell me I just don't
    know how to use MERGE if that's the case in any of these.

    The other thing I noticed that may take some work to sort out is that I
    haven't had any luck getting MERGE to execute from within a plpgsql
    function. I was hoping I could use this to update the pgbench tables:

    CREATE OR REPLACE FUNCTION merge_account_balance(key INT, delta NUMERIC)
    RETURNS VOID AS
    $$
    BEGIN
    MERGE INTO pgbench_accounts t USING (SELECT * FROM pgbench_accounts
    WHERE aid = key) AS s ON t.aid=s.aid WHEN MATCHED THEN UPDATE SET
    abalance = s.abalance + delta WHEN NOT MATCHED THEN INSERT VALUES
    (key,1+(key / 100000)::integer,delta,'');
    END;
    $$
    LANGUAGE plpgsql;

    But I just get this instead:

    ERROR: "pgbench_accounts" is not a known variable
    LINE 4: MERGE INTO pgbench_accounts t USING (SELECT * FROM p...

    The other way I wrote the MERGE statement above (not using a subquery)
    does the same thing. I know that error messages is coming from the
    changes introduced in
    http://archives.postgresql.org/pgsql-committers/2010-01/msg00161.php but
    I'm not familiar enough with the whole grammar implementation to know
    what that means yet.

    That's what I found so far in my second pass over this. Once the first
    problem here is sorted out, I've already worked out how to test the
    performance of the code using pgbench. Have all the scripts ready to go
    once the correct MERGE statement is plugged into them, just ran into
    this same class of problems when I tried them. So far I was only able
    to see how fast the UPDATE path worked though, which isn't very helpful
    yet. My hope here is to test the MERGE implementation vs. the classic
    pl/pgsql implementation of UPSERT, calling both within a function so
    it's a fair comparison, and see how that goes. This may flush out
    concurrency bugs that are in the MERGE code as well.

    --
    Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services and Support www.2ndQuadrant.us
  • Robert Haas at Oct 18, 2010 at 1:54 pm
    I think that MERGE is supposed to trigger one rule for each row in the
    source data. So:
    On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith wrote:
    MERGE INTO Stock t
    USING (SELECT * FROM Stock WHERE item_id=10) AS s
    ON s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (10,1)
    ;

    This works fine, and updates the matching row:

    item_id | balance
    ---------+---------
    20 |    1900
    10 |    2201
    Here you have one row of source data, and you got one action (the WHEN
    MATCHED case).
    But if I give it a key that doesn't exist instead:

    MERGE INTO Stock t
    USING (SELECT * FROM Stock WHERE item_id=30) AS s
    ON s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (30,1)
    ;

    This doesn't execute the NOT MATCHED case and INSERT the way I expected it
    to.  It just gives back "MERGE 0".
    Here you have no rows of source data (the USING (SELECT ...) doesn't
    return anything, since no rows exist) so nothing happens.
    Since I wasn't sure if the whole "subquery in the USING clause" case was
    really implemented fully, I then tried to do this with something more like
    the working regression test examples.  I expected this to do the same thing
    as the first example:

    MERGE INTO Stock t
    USING Stock s
    ON s.item_id=10 AND s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (10,1)
    ;

    But it gives back this:

    ERROR:  duplicate key value violates unique constraint "stock_item_id_key"
    DETAIL:  Key (item_id)=(10) already exists.
    Here you have two rows of source data. The ON clause represents the
    join condition. The item_id=10 row matches - so you get an update,
    presumably, though we can't see that as things turn out - and the
    item_id=20 row doesn't match - so you try to insert (10, 1), which is
    a duplicate key, thus the error.
    Can't tell from that whether it's hitting the MATCHED or NOT MATCHED side of
    things to generate that.  But it doesn't work any better if you give it an
    example that doesn't exist:

    MERGE INTO Stock t
    USING Stock s
    ON s.item_id=30 AND s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (30,1)
    ;

    ERROR:  duplicate key value violates unique constraint "stock_item_id_key"
    DETAIL:  Key (item_id)=(30) already exists.
    In this case neither row of the source data matches the join condition
    (s.item_id=30 might as well be constant FALSE as far as the test data
    is concerned) so you attempt to execute the NOT MATCHED side twice.
    So this one also looks correct to me.
    The other thing I noticed that may take some work to sort out is that I
    haven't had any luck getting MERGE to execute from within a plpgsql
    function.  I was hoping I could use this to update the pgbench tables:
    Good catch. Considering the size of this patch, I have no problem
    leaving this to the eventual committer to fix, or to a subsequent
    commit.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Boxuan Zhai at Oct 18, 2010 at 2:09 pm

    On Mon, Oct 18, 2010 at 9:54 PM, Robert Haas wrote:

    I think that MERGE is supposed to trigger one rule for each row in the
    source data. So:
    On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith wrote:
    MERGE INTO Stock t
    USING (SELECT * FROM Stock WHERE item_id=10) AS s
    ON s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (10,1)
    ;

    This works fine, and updates the matching row:

    item_id | balance
    ---------+---------
    20 | 1900
    10 | 2201
    Here you have one row of source data, and you got one action (the WHEN
    MATCHED case).
    But if I give it a key that doesn't exist instead:

    MERGE INTO Stock t
    USING (SELECT * FROM Stock WHERE item_id=30) AS s
    ON s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (30,1)
    ;

    This doesn't execute the NOT MATCHED case and INSERT the way I expected it
    to. It just gives back "MERGE 0".
    Here you have no rows of source data (the USING (SELECT ...) doesn't
    return anything, since no rows exist) so nothing happens.
    Yes.
    The MERGE process is based on a left join between the source table and
    target table.
    Since here the source table is empty, no join is carried, and thus no MERGE
    action is taken.

    But, is it correct logically? I mean, should we insert some rows in the
    above example rather than do nothing?

    Since I wasn't sure if the whole "subquery in the USING clause" case was
    really implemented fully, I then tried to do this with something more like
    the working regression test examples. I expected this to do the same thing
    as the first example:

    MERGE INTO Stock t
    USING Stock s
    ON s.item_id=10 AND s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (10,1)
    ;

    But it gives back this:

    ERROR: duplicate key value violates unique constraint
    "stock_item_id_key"
    DETAIL: Key (item_id)=(10) already exists.
    Here you have two rows of source data. The ON clause represents the
    join condition. The item_id=10 row matches - so you get an update,
    presumably, though we can't see that as things turn out - and the
    item_id=20 row doesn't match - so you try to insert (10, 1), which is
    a duplicate key, thus the error.
    Can't tell from that whether it's hitting the MATCHED or NOT MATCHED side of
    things to generate that. But it doesn't work any better if you give it an
    example that doesn't exist:

    MERGE INTO Stock t
    USING Stock s
    ON s.item_id=30 AND s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (30,1)
    ;

    ERROR: duplicate key value violates unique constraint
    "stock_item_id_key"
    DETAIL: Key (item_id)=(30) already exists.
    In this case neither row of the source data matches the join condition
    (s.item_id=30 might as well be constant FALSE as far as the test data
    is concerned) so you attempt to execute the NOT MATCHED side twice.
    So this one also looks correct to me.

    Yes, that is what happened in the above two examples.
    The other thing I noticed that may take some work to sort out is that I
    haven't had any luck getting MERGE to execute from within a plpgsql
    function. I was hoping I could use this to update the pgbench tables:
    Good catch. Considering the size of this patch, I have no problem
    leaving this to the eventual committer to fix, or to a subsequent
    commit.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Robert Haas at Oct 18, 2010 at 2:17 pm

    On Mon, Oct 18, 2010 at 10:09 AM, Boxuan Zhai wrote:
    On Mon, Oct 18, 2010 at 9:54 PM, Robert Haas wrote:

    I think that MERGE is supposed to trigger one rule for each row in the
    source data.  So:
    On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith wrote:
    MERGE INTO Stock t
    USING (SELECT * FROM Stock WHERE item_id=10) AS s
    ON s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (10,1)
    ;

    This works fine, and updates the matching row:

    item_id | balance
    ---------+---------
    20 |    1900
    10 |    2201
    Here you have one row of source data, and you got one action (the WHEN
    MATCHED case).
    But if I give it a key that doesn't exist instead:

    MERGE INTO Stock t
    USING (SELECT * FROM Stock WHERE item_id=30) AS s
    ON s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
    WHEN NOT MATCHED THEN INSERT VALUES (30,1)
    ;

    This doesn't execute the NOT MATCHED case and INSERT the way I expected
    it
    to.  It just gives back "MERGE 0".
    Here you have no rows of source data (the USING (SELECT ...) doesn't
    return anything, since no rows exist) so nothing happens.
    Yes.
    The MERGE process is based on a left join between the source table and
    target table.
    Since here the source table is empty, no join is carried, and thus no MERGE
    action is taken.
    But, is it correct logically? I mean, should we insert some rows in the
    above example rather  than do nothing?
    I don't think so. I think the right way to write UPSERT is something
    along the lines of:

    MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
    s.item_id = t.item_id ...

    (untested)

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Boxuan Zhai at Oct 20, 2010 at 1:20 am
    Hi,

    I considered the empty source table situation again. I think it is correct
    to upsert nothing in this case.

    Back to the original logic of MERGE command, it is main purpose is to add
    the supplementary data from the source table into the target table. Thus, an
    empty source table means no input data is available, so no upsert is needed
    in target table.


    Regards,
    Boxuan
  • Greg Smith at Oct 21, 2010 at 6:36 pm

    Robert Haas wrote:
    I think the right way to write UPSERT is something
    along the lines of:

    MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
    s.item_id = t.item_id ...
    That led in the right direction, after a bit more fiddling I was finally
    able to get something that does what I wanted: a single table UPSERT
    implemented with this MERGE implementation. Here's a log of a test
    session, suitable for eventual inclusion in the regression tests:

    CREATE TABLE Stock(item_id int UNIQUE, balance int);
    INSERT INTO Stock VALUES (10, 2200);
    INSERT INTO Stock VALUES (20, 1900);
    SELECT * FROM Stock ORDER BY item_id;

    item_id | balance
    ---------+---------
    10 | 2200
    20 | 1900

    MERGE INTO Stock t
    USING (VALUES(10,100)) AS s(item_id,balance)
    ON s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
    WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
    ;

    MERGE 1

    SELECT * FROM Stock ORDER BY item_id;
    item_id | balance
    ---------+---------
    10 | 2300
    20 | 1900

    MERGE INTO Stock t
    USING (VALUES(30,2000)) AS s(item_id,balance)
    ON s.item_id=t.item_id
    WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
    WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
    ;

    MERGE 1
    SELECT * FROM Stock ORDER BY item_id;
    item_id | balance
    ---------+---------
    10 | 2300
    20 | 1900
    30 | 2000

    I'm still a little uncertain as to whether any of my other examples
    should have worked under the spec but just didn't work here, but I'll
    worry about that later.

    Here's what the query plan looks like on a MATCH:

    Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166
    rows=0 loops=1)
    Action 1: Update When Matched
    Action 2: Insert When Not Mactched
    MainPlan:
    -> Nested Loop Left Join (cost=0.00..8.29 rows=1 width=22) (actual
    time=0.050..0.061 rows=1 loops=1)
    -> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=8)
    (actual time=0.009..0.010 rows=1 loops=1)
    -> Index Scan using stock_item_id_key on stock t
    (cost=0.00..8.27 rows=1 width=14) (actual time=0.026..0.030 rows=1 loops=1)
    Index Cond: ("*VALUES*".column1 = item_id)
    Total runtime: 0.370 ms


    And here's a miss:

    Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.145..0.145
    rows=0 loops=1)
    Action 1: Update When Matched
    Action 2: Insert When Not Mactched
    MainPlan:
    -> Nested Loop Left Join (cost=0.00..8.29 rows=1 width=22) (actual
    time=0.028..0.033 rows=1 loops=1)
    -> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=8)
    (actual time=0.004..0.005 rows=1 loops=1)
    -> Index Scan using stock_item_id_key on stock t
    (cost=0.00..8.27 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1)
    Index Cond: ("*VALUES*".column1 = item_id)
    Total runtime: 0.255 ms

    Next steps here:
    1) Performance/concurrency tests against trigger-based UPSERT approach.
    2) Finish bit rot cleanup against HEAD.
    3) Work out more complicated test cases to try and fine more unexpected
    behavior edge cases and general bugs.

    --
    Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services and Support www.2ndQuadrant.us
  • Stefan Kaltenbrunner at Oct 22, 2010 at 3:35 pm

    On 10/21/2010 08:36 PM, Greg Smith wrote:
    Robert Haas wrote:
    I think the right way to write UPSERT is something
    along the lines of:

    MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
    s.item_id = t.item_id ...
    [...]
    Here's what the query plan looks like on a MATCH:

    Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166 rows=0
    loops=1)
    Action 1: Update When Matched
    Action 2: Insert When Not Mactched
    "Mactched"? - is this a c&p error or the actual output of EXPLAIN? :)


    lg


    Stefan
  • Greg Smith at Oct 23, 2010 at 5:35 am
    There are now two branches of MERGE code in review progress available.
    http://github.com/greg2ndQuadrant/postgres/tree/merge-unstable has the
    bit-rotted version that doesn't quite work against HEAD yet, while
    http://github.com/greg2ndQuadrant/postgres/tree/merge is what I'm still
    testing against until I get that sorted out.

    Attached is a tar file containing an initial concurrent MERGE test
    case. What it does is create is a pgbench database with a scale of 1.
    Then it runs a custom test that does an UPSERT using MERGE against
    pgbench_accounts, telling pgbench the database scale is actually 2.
    This means that approximately half of the statements executed will hit
    the MATCHED side of the merge and update existing rows, while half will
    hit the NOT MATCHED one and create new account records. Did some small
    tests using pgbench's debug mode where you see all the statements it
    executes, and all the output looked correct. Successive tests runs are
    not deterministically perfect performance comparisons, but with enough
    transactions I hope that averages out.

    For comparison sake, there's an almost identical test case that does the
    same thing using the pl/pgsql example function from the PostgreSQL
    manual for the UPSERT instead also in there. You just run test-merge.sh
    or test-function.sh and it runs the whole test, presuming you built and
    installed pgbench and don't mind that it will blow away any database
    named pgbench you already have.

    Since the current MERGE implementation is known not to be optimized for
    concurrency, my main goal here wasn't to see how fast it was. That
    number is interesting though. With the sole postgresql.conf change of:

    checkpoint_settings=32

    And a debug/assertion build using 8 concurrent clients, I got 1607 TPS
    of UPSERT out of the trigger approach @ 6MB/s of writes to disk, while
    the MERGE one delivered 988 TPS @ 4.5MB/s of writes. Will explore this
    more later.

    This did seem to find a bug in the implementation after running for a while:

    TRAP: FailedAssertion("!(epqstate->origslot != ((void *)0))", File:
    "execMain.c", Line: 1732)

    Line number there is relative to what you can see at
    http://github.com/greg2ndQuadrant/postgres/blob/merge/src/backend/executor/execMain.c
    and that's a null pointer check at the beginning of
    EvalPlanQualFetchRowMarks. Haven't explored why this happened or how
    repeatable this is, but since Boxuan was looking for some bugs to chase
    I wanted to deliver him one to chew on.

    While the performance doesn't need to be great in V1, there needs to be
    at least some minimal protection against concurrency issues before this
    is commit quality. Will continue to shake this code out looking for
    them now that I have some basic testing that works for doing so.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services and Support www.2ndQuadrant.us
  • Marko Tiikkaja at Oct 23, 2010 at 12:50 pm

    On 2010-10-23 8:34 AM +0300, Greg Smith wrote:
    While the performance doesn't need to be great in V1, there needs to be
    at least some minimal protection against concurrency issues before this
    is commit quality.
    What's been bothering me is that so far there has not been an agreement
    on whether we need to protect against concurrency issues or not. In
    fact, there has been almost no discussion about the concurrency issues
    which AIUI have been the biggest single reason we don't have MERGE
    already. Right now, this patch fails in even the simplest scenario:

    =# create table foo(a int unique);
    NOTICE: CREATE TABLE / UNIQUE will create implicit index "foo_a_key"
    for table "foo"
    CREATE TABLE

    T1=# begin;
    BEGIN
    T1=# merge into foo using (values (1)) s(i) on s.i = foo.a when matched
    then update set a=a+1 when not matched then insert values (s.i);
    MERGE 1

    T2=# merge into foo using (values (1)) s(i) on s.i = foo.a when matched
    then update set a=a+1 when not matched then insert values (s.i);
    -- blocks

    T1=# commit;
    COMMIT

    .. and T2 gets a unique constraint violation.


    As far as I'm concerned, the reason people want MERGE is to avoid these
    problems; the nicer syntax is just a bonus. Having to LOCK the target
    table makes this feature a lot less useful, even though there are a few
    use cases where locking the table would be OK.


    Regards,
    Marko Tiikkaja
  • Greg Smith at Oct 23, 2010 at 2:31 pm

    Marko Tiikkaja wrote:
    What's been bothering me is that so far there has not been an
    agreement on whether we need to protect against concurrency issues or
    not. In fact, there has been almost no discussion about the
    concurrency issues which AIUI have been the biggest single reason we
    don't have MERGE already.
    Sure there were, they just happened a long time ago. I tried to
    summarize the previous discussion at
    http://wiki.postgresql.org/wiki/SQL_MERGE with the following:

    "PostgreSQL doesn't have a good way to lock access to a key value that
    doesn't exist yet--what other databases call key range
    locking...Improvements to the index implementation are needed to allow
    this feature."

    You can sift through the other archive links in there, the discussion
    leading up to that conclusion is in there somewhere. And we had a
    discussion of this at the last developer's meeting where this was
    identified as a key issue to sort out before this was done; see the
    table at the end of
    http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting and note
    the warning associated with this feature.

    The deadlock on developing this feature has been that there's little
    benefit to building key range locking on its own, or even a good way to
    prove that it works, without a visible feature that uses it. It's much
    easier to justify development time on if the rest of MERGE is known to
    work, and just needs that plugged in to improve concurrency. And since
    Boxuan appeared at the right time to do the syntax and implementation
    part first as well, that's the order it's ended up happening in. We're
    only making the concurrency part a second priority right now in order to
    break the development into managable chunks.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services and Support www.2ndQuadrant.us
  • Tom Lane at Oct 23, 2010 at 4:14 pm

    Greg Smith writes:
    Marko Tiikkaja wrote:
    What's been bothering me is that so far there has not been an
    agreement on whether we need to protect against concurrency issues or
    not. In fact, there has been almost no discussion about the
    concurrency issues which AIUI have been the biggest single reason we
    don't have MERGE already.
    Sure there were, they just happened a long time ago. I tried to
    summarize the previous discussion at
    http://wiki.postgresql.org/wiki/SQL_MERGE with the following:
    "PostgreSQL doesn't have a good way to lock access to a key value that
    doesn't exist yet--what other databases call key range
    locking...Improvements to the index implementation are needed to allow
    this feature."
    This seems to be presuming there's a consensus that that's the way to
    implement it, which is news to me. Why wouldn't we try the INSERT
    first, and if we get a unique-key failure, do UPDATE instead? I am not
    at all in agreement that we ought to build key range locking, nor that
    it'd be particularly helpful for this problem even if we had it.

    regards, tom lane
  • Greg Stark at Oct 23, 2010 at 10:59 pm

    On Sat, Oct 23, 2010 at 9:12 AM, Tom Lane wrote:
    "PostgreSQL doesn't have a good way to lock access to a key value that
    doesn't exist yet--what other databases call key range
    locking...Improvements to the index implementation are needed to allow
    this feature."
    This seems to be presuming there's a consensus that that's the way to
    implement it, which is news to me.  Why wouldn't we try the INSERT
    first, and if we get a unique-key failure, do UPDATE instead?  I am not
    at all in agreement that we ought to build key range locking, nor that
    it'd be particularly helpful for this problem even if we had it.
    Except we *do* have range locking for the special case of inserting
    equal values into a unique btree index. In that case the btree insert
    locks the leaf page containing the conflicting key value --
    effectively locking out anyone else from inserting the same key value.
    That lock is held until the index entry pointing to the newly
    inserted value is finished. That's what prevents two inserters from
    inserting the same key value.

    So the trick for MERGE on equality would be to refactor the btree api
    so that you could find the matching leaf page and *keep* that page
    locked. Then do an update against the conflicting row found there if
    any without ever releasing that page lock. Someone else can come along
    and delete the row (or have already deleted it) before the update
    locks it but if they do you can insert your row without worrying about
    anyone else inserting a conflicting entry since you're still holding a
    lock on the page they'll need to insert the btree entry on and have
    been holding it continuously for the whole operation.

    I'm not sure how the new exclusion constraints stuff affects this. I
    think they would work the same way except instead of holding a page
    lock on the leaf page they would store the value being edited in the
    in-memory array -- effectively another form of key range locking.

    I think the blocker with MERGE has always been that the standard is
    *so* general that it could apply to conditions that *aren't* covered
    by a unique constraint or exclusion constriant. Personally, I'm fine
    saying that those cases will perform poorly, locking the table, or
    aren't allowed and print a warning explaining exactly what exclusion
    constraint or btree unique index would be necessary to allow it. I
    think virtually every case where people will want to use it will be on
    a primary key anyways.

    --
    greg
  • Josh Berkus at Oct 23, 2010 at 11:03 pm

    So the trick for MERGE on equality would be to refactor the btree api
    so that you could find the matching leaf page and *keep* that page
    locked. Then do an update against the conflicting row found there if
    any without ever releasing that page lock. Someone else can come along
    and delete the row (or have already deleted it) before the update
    locks it but if they do you can insert your row without worrying about
    anyone else inserting a conflicting entry since you're still holding a
    lock on the page they'll need to insert the btree entry on and have
    been holding it continuously for the whole operation.
    I think that such a lock would also be useful for improving the FK
    deadlock issues we have.

    --
    -- Josh Berkus
    PostgreSQL Experts Inc.
    http://www.pgexperts.com
  • Greg Stark at Oct 24, 2010 at 7:32 am

    On Sat, Oct 23, 2010 at 4:03 PM, Josh Berkus wrote:
    I think that such a lock would also be useful for improving the FK deadlock
    issues we have.
    I don't see how. I think the problem you're referring to occurs when
    different plans update rows in different orders and the resulting
    locks on foreign key targets are taken in different orders. In which
    case the problem isn't that we're unable to lock the resources --
    they're locked using regular row locks -- but rather that there's
    nothing controlling the ordering of the locks.

    I don't think it would be acceptable to hold low level btree page
    locks across multiple independent row operations on different rows and
    I don't see how they would be any better than the row locks we have
    now. Worse, the resulting deadlock would no longer be detectable (one
    of the reasons it wouldn't be acceptable to hold the lock for so
    long).

    That does point out a problem with the logic I sketched. If you go to
    do an update and find there's an uncommitted update pending you have
    to wait on it. You can't do that while holding the index page lock. I
    assume then you release it and wait on the uncommitted transaction and
    when it's finished you start over doing the btree lookup and
    reacquiring the lock. I haven't thought it through in detail though.

    --
    greg
  • Martijn van Oosterhout at Oct 24, 2010 at 9:50 am

    On Sat, Oct 23, 2010 at 03:59:13PM -0700, Greg Stark wrote:
    I think the blocker with MERGE has always been that the standard is
    *so* general that it could apply to conditions that *aren't* covered
    by a unique constraint or exclusion constriant. Personally, I'm fine
    saying that those cases will perform poorly, locking the table, or
    aren't allowed and print a warning explaining exactly what exclusion
    constraint or btree unique index would be necessary to allow it. I
    think virtually every case where people will want to use it will be on
    a primary key anyways.
    Can we please not get MERGE hung up on trying to make it atomic. The
    standard requires no such thing and there are plenty of uses of MERGE
    that don't involve high concurrency updates of the same row by
    different processes. If we want an atomic UPSERT, make that a seperate
    command, MERGE without atomicity is useful enough already.

    Simply, if the row is visible in the current snapshot you do an update
    otherwise an insert. If you get an error, raise it. Icing can be added
    later.

    Have a nice day,
    --
    Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
    Patriotism is when love of your own people comes first; nationalism,
    when hate for people other than your own comes first.
    - Charles de Gaulle
  • Robert Haas at Oct 24, 2010 at 12:05 pm

    On Sun, Oct 24, 2010 at 5:50 AM, Martijn van Oosterhout wrote:
    On Sat, Oct 23, 2010 at 03:59:13PM -0700, Greg Stark wrote:
    I think the blocker with MERGE has always been that the standard is
    *so* general that it could apply to conditions that *aren't* covered
    by a unique constraint or exclusion constriant. Personally, I'm fine
    saying that those cases will perform poorly, locking the table, or
    aren't allowed and print a warning explaining exactly what exclusion
    constraint or btree unique index would be necessary to allow it. I
    think virtually every case where people will want to use it will be on
    a primary key anyways.
    Can we please not get MERGE hung up on trying to make it atomic. The
    standard requires no such thing and there are plenty of uses of MERGE
    that don't involve high concurrency updates of the same row by
    different processes. If we want an atomic UPSERT, make that a seperate
    command, MERGE without atomicity is useful enough already.

    Simply, if the row is visible in the current snapshot you do an update
    otherwise an insert. If you get an error, raise it. Icing can be added
    later.
    Long term, I'm wondering if the permanent fix for this problem might
    involve something similar to what EXCLUDE does - use a dirty snapshot
    for the scan of the target table, and block on in=d

    Yeah, I couldn't have said it better. It's an annoying implementation
    restriction but only that. I think it's a perfectly acceptable
    restriction for an initial commit. It's no secret that our process
    has trouble absorbing large patches.

    I am also wondering exactly what the semantics are supposed to be
    under concurrency. We can't assume that it's OK to treat WHEN NOT
    MATCHED as WHEN MATCHED if a unique-key violation is encountered. The
    join condition could be something else entirely. I guess we could
    scan the target table with a dirty snapshot and block on any in-doubt
    tuples, similar to what EXCLUDE constraints do internally, but
    throwing MVCC out the window doesn't seem right either.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Greg Smith at Oct 24, 2010 at 4:17 pm

    Robert Haas wrote:
    I am also wondering exactly what the semantics are supposed to be
    under concurrency. We can't assume that it's OK to treat WHEN NOT
    MATCHED as WHEN MATCHED if a unique-key violation is encountered. The
    join condition could be something else entirely. I guess we could
    scan the target table with a dirty snapshot and block on any in-doubt
    tuples, similar to what EXCLUDE constraints do internally, but
    throwing MVCC out the window doesn't seem right either.
    You've answered Tom's question of why not to just INSERT and trap the
    error here. Presuming a unique key violation is what will kick back in
    this situation and to just follow the other side doesn't cover all the
    ways this can get used.

    I'm thinking of this problem now as being like the one that happens when
    a concurrent UPDATE happens. To quote from the docs here:

    "a target row might have already been updated (or deleted or locked) by
    another concurrent transaction by the time it is found. In this case,
    the would-be updater will wait for the first updating transaction to
    commit or roll back (if it is still in progress). If the first updater
    rolls back, then its effects are negated and the second updater can
    proceed with updating the originally found row. If the first updater
    commits, the second updater will ignore the row if the first updater
    deleted it, otherwise it will attempt to apply its operation to the
    updated version of the row. The search condition of the command (the
    WHERE clause) is re-evaluated to see if the updated version of the row
    still matches the search condition. If so, the second updater proceeds
    with its operation using the updated version of the row."

    What I think we can do with adding a key reservation is apply the same
    sort of logic to INSERTs too, given a way to lock an index entry before
    the row itself is complete. If MERGE hits a WHERE condition that finds
    a key lock entry in the index, it has to sit and wait for that other
    command to finish. There isn't any other sensible behavior I can see in
    that situation, just like there isn't one for UPDATE right now.

    If the transaction that was locking the index entry commits, it has to
    start over with re-evaluating the WHEN MATCHED part (the equivilent of
    the WHERE in the UPDATE case). But it shouldn't need to do a new JOIN
    again, because that condition was proven to be met already by the lock
    squatting on that index key, without even having the rest of the row
    there yet to check its data. If the other entry commits, it would then
    follow the WHEN MATCHED path and in the UPSERT case do an UPDATE. If
    the transaction who had the key reservervation rolls back, the
    re-evaluation of the MERGE WHEN MATCHED will fail, at which point it
    follows the WHERE FOUND INSERT path. As it will now be the locker of
    the key reservation it had been waiting on earlier at that point, it
    will be free to do the INSERT with no concerns about race conditions or
    retries. In the SERIALIZABLE case, again just try to follow the same
    slightly different rules that exist for concurrent UPDATE commands.

    There may be a tricky point here that we will need to warn people that
    UPSERT implementations need to make sure they only reference the columns
    of the primary key in the MERGE conditions for this to work as expected,
    because otherwise they might get back a unique key violation error
    anyway. I don't know how other databases deal with that particular
    problem. I have considered following the somewhat distasteful path of
    installing SQL Server or Oracle here just to see how they handle some of
    the pathological test cases I'm now thinking about for this command.

    This particular weak spot in MVCC is already there for UPDATE, and this
    approach seems to me to be the most logical way to extend the same basic
    implementation to also eliminate some concurrent MERGE race conditions,
    including the most common one the UPSERT case is stuck with. But I have
    no intention of halting work on the basic MERGE to wait for this problem
    to be solved even if I'm completely wrong about what I've outlined
    above. That style of thinking--don't even start until every a perfect
    solution for every possible problem is known--is something I consider a
    problem in this community's development model, one that has contributed
    to why no work has been done on this feature until now. This one splits
    nicely into "get the implemenation working" and "improve the
    concurrency" parts, and I haven't heard a good reason yet why those need
    to coupled together.

    --
    Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services and Support www.2ndQuadrant.us
  • Robert Haas at Oct 24, 2010 at 5:43 pm

    On Sun, Oct 24, 2010 at 12:15 PM, Greg Smith wrote:
    Robert Haas wrote:
    I am also wondering exactly what the semantics are supposed to be
    under concurrency.  We can't assume that it's OK to treat WHEN NOT
    MATCHED as WHEN MATCHED if a unique-key violation is encountered.  The
    join condition could be something else entirely.  I guess we could
    scan the target table with a dirty snapshot and block on any in-doubt
    tuples, similar to what EXCLUDE constraints do internally, but
    throwing MVCC out the window doesn't seem right either.
    [discussion of EPQ behavior for UPDATE statements]

    What I think we can do with adding a key reservation is apply the same sort
    of logic to INSERTs too, given a way to lock an index entry before the row
    itself is complete.  If MERGE hits a WHERE condition that finds a key lock
    entry in the index, it has to sit and wait for that other command to finish.
    There isn't any other sensible behavior I can see in that situation, just
    like there isn't one for UPDATE right now.
    Well, there's no guarantee that any index at all exists on the target
    table, so any solution based on index locks can't be fully general.

    But let's back up and talk about MVCC for a minute. Suppose we have
    three source tuples, (1), (2), and (3); and the target table contains
    tuples (1) and (2), of which only (1) is visible to our MVCC snapshot;
    suppose also an equijoin. Clearly, source tuple (1) should fire the
    MATCHED rule and source tuple (3) should fire the NOT MATCHED rule,
    but what in the world should source tuple (2) do? AFAICS, the only
    sensible behavior is to throw a serialization error, because no matter
    what you do the results won't be equivalent to a serial execution of
    the transaction that committed target tuple (2) and the transaction
    that contains the MERGE.

    Even with predicate locks, it's not obvious how to me how solve this
    problem. Target tuple (2) may already be there, and its transaction
    already committed, by the time the MERGE statement gets around to
    looking at the source data. In fact, even taking an
    AccessExclusiveLock on the target table doesn't fix this, because the
    snapshot would be taken before the lock.
    been done on this feature until now.  This one splits nicely into "get the
    implemenation working" and "improve the concurrency" parts, and I haven't
    heard a good reason yet why those need to coupled together.
    Sounds like we're all on the same page.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Greg Smith at Oct 24, 2010 at 9:40 pm

    Robert Haas wrote:
    Well, there's no guarantee that any index at all exists on the target
    table, so any solution based on index locks can't be fully general.
    Sure, but in the most common use case I think we're targeting at first,
    no indexes means there's also no unique constraints either. I don't
    think people can reasonable expect to MERGE or anything else to
    guarantee they won't accidentally create two rows that conflict via the
    terms of some non-database enforced key.

    I am too brain-dead this particular Sunday to think of exactly how to
    deal with the 3-tuple case you described, except to say that I think
    it's OK for complicated situations to give up and throw a serialization
    error. I'm already collecting a list of pathological tests and will try
    to add something based on your problem case, then see what we can do
    with it later.

    --
    Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services and Support www.2ndQuadrant.us
  • Greg Stark at Oct 24, 2010 at 11:17 pm

    On Sun, Oct 24, 2010 at 2:39 PM, Greg Smith wrote:
    Sure, but in the most common use case I think we're targeting at first, no
    indexes means there's also no unique constraints either.  I don't think
    people can reasonable expect to MERGE or anything else to guarantee they
    won't accidentally create two rows that conflict via the terms of some
    non-database enforced key.
    I'm fine with either a) having the merge constraint be required to
    match exactly either a exclusion constraint or unique index or throw
    an error or b) lock the table if you perform a merge against a table
    on a non-indexed condition like foreign keys do if you're missing the
    relevant index.

    Either way I expect this case to be a rare use case where users are
    either doing data loads and locking the table against concurrent
    updates is fine or they will immediately realize the error of their
    ways and create the corresponding unique or exclusion constraint
    index.

    Other than bulk data loads I don't see any useful use case that
    doesn't have the corresponding exclusion constraint or unique index as
    a hard requirement anyways. It'll be necessary for both correctness
    and performance.

    --
    greg
  • Greg Stark at Oct 25, 2010 at 5:43 pm

    On Sun, Oct 24, 2010 at 10:43 AM, Robert Haas wrote:
    But let's back up and talk about MVCC for a minute.  Suppose we have
    three source tuples, (1), (2), and (3); and the target table contains
    tuples (1) and (2), of which only (1) is visible to our MVCC snapshot;
    suppose also an equijoin.  Clearly, source tuple (1) should fire the
    MATCHED rule and source tuple (3) should fire the NOT MATCHED rule,
    but what in the world should source tuple (2) do?  AFAICS, the only
    sensible behavior is to throw a serialization error, because no matter
    what you do the results won't be equivalent to a serial execution of
    the transaction that committed target tuple (2) and the transaction
    that contains the MERGE.
    So the behaviour we get with UPDATE in this situation is that we
    update (2) so I would expect this to execute the MATCHED rule. The key
    distinction is that since we're not returning the data to the user the
    user sees we want to update the most recent version and it's "almost"
    as if we ran "after" all the other transactions. It's not really
    serializable and I think in serializable mode we throw a serialization
    failure instead but in most usage patterns it's precisely what the
    user wants.

    Here "bbb" contained two records when we began with values "1" and "2"
    but the "2" was inserted in a transaction which hadn't committed yet.
    It commited after the update.

    postgres=> begin;
    BEGIN
    postgres=> select * from bbb;
    i
    ---
    1
    (1 row)

    postgres=> update bbb set i = i+1;
    UPDATE 2
    postgres=> commit;
    COMMIT
    postgres=> select * from bbb;
    i
    ---
    2
    3
    (2 rows)



    --
    greg
  • Kevin Grittner at Oct 25, 2010 at 6:08 pm

    Greg Stark wrote:
    Robert Haas wrote:
    But let's back up and talk about MVCC for a minute. Suppose we
    have three source tuples, (1), (2), and (3); and the target table
    contains tuples (1) and (2), of which only (1) is visible to our
    MVCC snapshot; suppose also an equijoin. Clearly, source tuple
    (1) should fire the MATCHED rule and source tuple (3) should fire
    the NOT MATCHED rule, but what in the world should source tuple
    (2) do? AFAICS, the only sensible behavior is to throw a
    serialization error, because no matter what you do the results
    won't be equivalent to a serial execution of the transaction that
    committed target tuple (2) and the transaction that contains the
    MERGE.
    So the behaviour we get with UPDATE in this situation is that we
    update (2) so I would expect this to execute the MATCHED rule.
    Certainly that would be consistent with the behavior of READ
    COMMITTED -- wait for commit or rollback of the concurrent
    transaction, and then proceed with whatever data is there after
    completion of the other transaction. With REPEATABLE READ or
    SERIALIZABLE you would block until commit of the other transaction
    and terminate with a write conflict -- a form of serialization
    failure. If the other transaction rolls back you INSERT.

    At least, that would be the least surprising behavior to me.

    -Kevin
  • Robert Haas at Oct 25, 2010 at 6:10 pm

    On Mon, Oct 25, 2010 at 1:42 PM, Greg Stark wrote:
    On Sun, Oct 24, 2010 at 10:43 AM, Robert Haas wrote:
    But let's back up and talk about MVCC for a minute.  Suppose we have
    three source tuples, (1), (2), and (3); and the target table contains
    tuples (1) and (2), of which only (1) is visible to our MVCC snapshot;
    suppose also an equijoin.  Clearly, source tuple (1) should fire the
    MATCHED rule and source tuple (3) should fire the NOT MATCHED rule,
    but what in the world should source tuple (2) do?  AFAICS, the only
    sensible behavior is to throw a serialization error, because no matter
    what you do the results won't be equivalent to a serial execution of
    the transaction that committed target tuple (2) and the transaction
    that contains the MERGE.
    So the behaviour we get with UPDATE in this situation is that we
    update (2) so I would expect this to execute the MATCHED rule.
    Not exactly. Consider this example:

    rhaas=# create table concurrent (x integer primary key);
    NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
    "concurrent_pkey" for table "concurrent"
    CREATE TABLE
    rhaas=# insert into x values (1);
    rhaas=# begin;
    BEGIN
    rhaas=# insert into concurrent values (2);
    INSERT 0 1

    <switch to a different window>

    rhaas=# update concurrent set x=x where x=2;
    UPDATE 0
    The key
    distinction is that since we're not returning the data to the user the
    user sees we want to update the most recent version and it's "almost"
    as if we ran "after" all the other transactions. It's not really
    serializable and I think in serializable mode we throw a serialization
    failure instead but in most usage patterns it's precisely what the
    user wants.
    I think it would be perfectly reasonable to have a transaction
    isolation level that does not use a snapshot at all and instead runs
    everything relative to SnapshotNow, and people could use it with MERGE
    if they were so inclined. I think this would correspond more or less
    to the READ COMMITTED isolation level specified in the standard; what
    we now call READ COMMITTED is actually better than READ COMMITTED but
    not quite as good as REPEATABLE READ. That, combined with an
    exclusive lock on the table (or, perhaps, some kind of predicate
    locking mechanism) would be sufficient to prevent serialization
    anomalies.

    However, I don't think that implementing those semantics for just this
    one command (or part of it) makes a whole lot of sense. The EPQ
    behavior of our current default isolation level is really pretty
    strange, and adding a random wart that the target table (but not the
    source table) in a MERGE query gets scanned using SnapshotNow would be
    one more piece of strangeness atop the strangeness we already have.
    And, as we just saw with the enum stuff, SnapshotNow can lead to some
    awfully strange behavior - you could end up processing half of the
    data from a concurrent transaction and missing the other half.
    Here "bbb" contained two records when we began with values "1" and "2"
    but the "2" was inserted in a transaction which hadn't committed yet.
    It commited after the update.

    postgres=> begin;
    BEGIN
    postgres=> select * from bbb;
    i
    ---
    1
    (1 row)

    postgres=> update bbb set i = i+1;
    UPDATE 2
    postgres=> commit;
    COMMIT
    postgres=> select * from bbb;
    i
    ---
    2
    3
    (2 rows)
    Well, at least on my system, if the transaction inserting (2) hasn't
    committed yet, that UPDATE statement will block until it does, because
    trying to change i from 1 to 2 causes the update of the unique index
    to block, since there's an in-doubt tuple with (2) already. Then it
    will continue on as you've shown here, due to EPQ. But if you do the
    same statement with i = i + 10 instead of + 1, then it doesn't block,
    and only updates the one row that's visible.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Kevin Grittner at Oct 25, 2010 at 7:15 pm

    Robert Haas wrote:

    rhaas=# create table concurrent (x integer primary key);
    NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
    "concurrent_pkey" for table "concurrent"
    CREATE TABLE
    rhaas=# insert into x values (1);
    rhaas=# begin;
    BEGIN
    rhaas=# insert into concurrent values (2);
    INSERT 0 1

    <switch to a different window>

    rhaas=# update concurrent set x=x where x=2;
    UPDATE 0
    That surprised me. I would have thought that the INSERT would have
    created an "in doubt" tuple which would block the UPDATE. What is
    the reason for not doing so?

    FWIW I did a quick test and REPEATABLE READ also lets this pass but
    with the SSI patch SERIALIZABLE seems to cover this correctly,
    generating a serialization failure where such access is involved in
    write skew:

    test=# create table concurrent (x integer primary key);
    NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
    "concurrent_pkey" for table "concurrent"
    CREATE TABLE
    test=# insert into concurrent select generate_series(1, 20000);
    INSERT 0 20000
    test=# begin isolation level serializable;
    BEGIN
    test=# insert into concurrent values (0);
    INSERT 0 1
    test=# update concurrent set x = 30001 where x = 30000;
    UPDATE 0

    <different session>

    test=# begin isolation level serializable;
    BEGIN
    test=# insert into concurrent values (30000);
    INSERT 0 1
    test=# update concurrent set x = -1 where x = 0;
    UPDATE 0
    test=# commit;
    ERROR: could not serialize access due to read/write dependencies
    among transactions
    HINT: The transaction might succeed if retried.

    I'll need to add a test to cover this, because it might have broken
    with one of the optimizations on my list, had you not point out this
    behavior.

    On the other hand:

    <session 1>

    test=# drop table concurrent ;
    DROP TABLE
    test=# create table concurrent (x integer primary key);
    NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
    "concurrent_pkey" for table "concurrent"
    CREATE TABLE
    test=# insert into concurrent select generate_series(1, 20000);
    INSERT 0 20000
    test=# begin isolation level serializable;
    BEGIN
    test=# insert into concurrent values (0);
    INSERT 0 1

    <session 2>

    test=# begin isolation level serializable;
    BEGIN
    test=# select * from concurrent where x = 0;
    x
    ---
    (0 rows)

    test=# insert into concurrent values (0);
    <blocks>

    <session 1>

    test=# commit;
    COMMIT

    <session 2>

    ERROR: duplicate key value violates unique constraint
    "concurrent_pkey"
    DETAIL: Key (x)=(0) already exists.

    Anyway, I thought this might be of interest in terms of the MERGE
    patch concurrency issues, since the SSI patch has been mentioned.

    -Kevin
  • Robert Haas at Oct 25, 2010 at 7:40 pm

    On Mon, Oct 25, 2010 at 3:15 PM, Kevin Grittner wrote:
    Robert Haas wrote:
    rhaas=# create table concurrent (x integer primary key);
    NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
    "concurrent_pkey" for table "concurrent"
    CREATE TABLE
    rhaas=# insert into x values (1);
    rhaas=# begin;
    BEGIN
    rhaas=# insert into concurrent values (2);
    INSERT 0 1

    <switch to a different window>

    rhaas=# update concurrent set x=x where x=2;
    UPDATE 0
    That surprised me.  I would have thought that the INSERT would have
    created an "in doubt" tuple which would block the UPDATE.  What is
    the reason for not doing so?
    This is just standard MVCC - readers don't block writers, nor writers
    readers. You might also think about what would happen if the UPDATE
    were run before the INSERT of (2). There's no serialization anomaly
    here, because either concurrent case is equivalent to the serial
    schedule where the update precedes the insert.

    In the case of a MERGE that matches a just-inserted invisible tuple
    but no visible tuple, things are a bit stickier. Let's suppose we're
    trying to use MERGE to get UPSERT semantics. If the MERGE command has
    the obvious behavior of ignoring the invisible tuple just as UPDATE or
    DELETE would do, then clearly any equivalent serial schedule must run
    the MERGE before the INSERT (because if it were run after the INSERT,
    it would fire the MATCHED action rather than the NOT MATCHED action).
    But if the merge were run before the INSERT, then the INSERT would
    have failed with a unique key violation; instead, the merge fails with
    a unique key violation. On the other hand, if the MERGE sees the
    invisible tuple, essentially using SnapshotNow semantics, as Greg
    Stark proposed, you get a different (and probably worse) class of
    serialization anomalies. For example, suppose the table has rows
    1-100 and you do an update adding 1000 to each value concurrently with
    merging in the values 51-100. You might get something like this:

    - MERGE scans rows 1-75, firing MATCHED for rows 51-75.
    - UPDATE commits.
    - MERGE scans rows 76-100, firing NOT MATCHED for each.

    Now, as Greg says, that might be what some people want, but it's
    certainly monumentally unserializable. In a serial execution
    schedule, the MERGE will either run before the UPDATE, in which case
    MATCHED will fire for rows 51-100, or else the UPDATE will run before
    the MERGE, in which case NOT MATCHED will fire for rows 51-100. No
    serial schedule is going to fire MATCHED for some rows and NOT MATCHED
    for others.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Greg Stark at Oct 25, 2010 at 8:13 pm

    On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas wrote:
    Now, as Greg says, that might be what some people want, but it's
    certainly monumentally unserializable.
    To be clear when I said it's what people want what I meant was that in
    the common cases it's doing exactly what people want. As opposed to
    getting closer to what people want in general but not quite hitting
    the mark in the common cases.

    Just as an example I think it's important that in the simplest case,
    upsert of a single record, it be 100% guaranteed to do the naive
    upsert. If two users are doing the merge of a single key at the same
    time one of them had better insert and one of them had better update
    or else users are going to be monumentally surprised.

    I guess I hadn't considered all the cases and I agree it's important
    that our behaviour make some kind of sense and be consistent with how
    we handle updates and of existing in-doubt tuples. I wasn't trying to
    introduce a whole new mode of operation, just work from analogy from
    the way update works. It's clear that even with our existing semantics
    there are strange corner cases once you get to multiple updates
    happening in a single transaction. But we get the simple cases right
    and even in the more complex cases, while it's not truly serializable
    we should be able to come up with some basic smell tests that we pass.

    My understanding is that currently we generally treat DML in one of
    two ways depending on whether it's returning data to the user or
    updating data in the table (include select for share). If it's
    returning data to the user we use a snapshot to give the user a
    consistent view of the database. If it's altering data in the database
    we use the snapshot to get a consistent set of records and then apply
    the updates to the most recent version.

    The anomaly you showed with update and the problem with MERGE are both
    because the operation was simultaneously doing a "read" -- the WHERE
    clause and the uniqueness check in the MERGE -- and a write. This is
    already the kind of case where we do weird things -- what kind of
    behaviour would be consistent with our existing, somewhat weird,
    behaviour?


    --
    greg
  • Robert Haas at Oct 25, 2010 at 8:58 pm

    On Mon, Oct 25, 2010 at 4:10 PM, Greg Stark wrote:
    On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas wrote:
    Now, as Greg says, that might be what some people want, but it's
    certainly monumentally unserializable.
    To be clear when I said it's what people want what I meant was that in
    the common cases it's doing exactly what people want. As opposed to
    getting closer to what people want in general but not quite hitting
    the mark in the common cases.

    Just as an example I think it's important that in the simplest case,
    upsert of a single record, it be 100% guaranteed to do the naive
    upsert. If two users are doing the merge of a single key at the same
    time one of them had better insert and one of them had better update
    or else users are going to be monumentally surprised.
    Hmm, so let's think about that case.

    The first merge comes along and finds no match so it fires the NOT
    MATCHED rule, which inserts a tuple. The second merge comes along and
    finds no match, so it also fires the NOT MATCHED rule and tries to
    insert a tuple. But upon consulting the PRIMARY KEY index it finds
    that an in-doubt tuple exists so it goes to sleep waiting for the
    first transaction to commit or abort. If the first transaction
    commits it then decides that the jig is up and fails. We could
    (maybe) fix this by doing something similar to what EPQ does for
    updates: when the first transaction commits, instead of redoing the
    insert, we back up and recheck whether the new tuple would have
    matched the join clause and, if so, we instead fire the MATCHED action
    on the updated tuple. If not, we fire NOT MATCHED anyway. I'm not
    sure how hard that would be, or whether it would introduce any other
    nasty anomalies in more complex cases.

    Alternatively, we could introduce an UPSERT or REPLACE statement
    intended to handle exactly this case and leave MERGE for more complex
    situations. It's pretty easy to imagine what the coding of that
    should look like: if we encounter an in-doubt tuple in we wait on its
    xmin. If the transaction aborts, we insert. If it commits, and we're
    in READ COMMITTED mode, we update it; but if we're in REPEATABLE READ
    or SERIALIZABLE mode, we abort with a serialization error. That's a
    lot simpler to understand and reason about than MERGE in its full
    generality.

    I think it's pretty much hopeless to think that MERGE is going to work
    in complex concurrent scenarios without creating serialization
    anomalies, or at least rollbacks. I think that's baked into the
    nature of what the statement does. To simulate MERGE, you need to
    read from the target table and then do writes that depend on what you
    read. If you do that with the commands that are available today,
    you're going to get serialization anomalies and/or rollbacks under
    concurrency. The mere fact of that logic being inside the database
    rather than outside isn't going to make that go away. Now sometimes,
    as with exclusion constraints, you can play games with dirty snapshots
    to get the semantics you want, but whether that's possible in a
    particular case depends on the details of the operation being
    performed, and here I think it can't be done. Some operations are
    *fundamentally* unserializable.

    A very simple example of this is a sequence that is guaranteed not to
    have gaps (a feature we've occasionally been requested to provide).
    If N processes request a sequence number simultaneously, you have to
    hand out a value to the first guy and wait and see whether he commits
    or aborts before deciding what number to give the second guy. That
    sucks, so usually we just design our applications not to require that
    sequences be gap-free. Similarly here.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Simon Riggs at Oct 26, 2010 at 8:01 pm

    On Mon, 2010-10-25 at 16:58 -0400, Robert Haas wrote:
    On Mon, Oct 25, 2010 at 4:10 PM, Greg Stark wrote:
    On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas wrote:
    Now, as Greg says, that might be what some people want, but it's
    certainly monumentally unserializable.
    To be clear when I said it's what people want what I meant was that in
    the common cases it's doing exactly what people want. As opposed to
    getting closer to what people want in general but not quite hitting
    the mark in the common cases.

    Just as an example I think it's important that in the simplest case,
    upsert of a single record, it be 100% guaranteed to do the naive
    upsert. If two users are doing the merge of a single key at the same
    time one of them had better insert and one of them had better update
    or else users are going to be monumentally surprised.
    Hmm, so let's think about that case.

    The first merge comes along and finds no match so it fires the NOT
    MATCHED rule, which inserts a tuple. The second merge comes along and
    finds no match, so it also fires the NOT MATCHED rule and tries to
    insert a tuple. But upon consulting the PRIMARY KEY index it finds
    that an in-doubt tuple exists so it goes to sleep waiting for the
    first transaction to commit or abort. If the first transaction
    commits it then decides that the jig is up and fails. We could
    (maybe) fix this by doing something similar to what EPQ does for
    updates: when the first transaction commits, instead of redoing the
    insert, we back up and recheck whether the new tuple would have
    matched the join clause and, if so, we instead fire the MATCHED action
    on the updated tuple. If not, we fire NOT MATCHED anyway. I'm not
    sure how hard that would be, or whether it would introduce any other
    nasty anomalies in more complex cases.

    Alternatively, we could introduce an UPSERT or REPLACE statement
    intended to handle exactly this case and leave MERGE for more complex
    situations. It's pretty easy to imagine what the coding of that
    should look like: if we encounter an in-doubt tuple in we wait on its
    xmin. If the transaction aborts, we insert. If it commits, and we're
    in READ COMMITTED mode, we update it; but if we're in REPEATABLE READ
    or SERIALIZABLE mode, we abort with a serialization error. That's a
    lot simpler to understand and reason about than MERGE in its full
    generality.

    I think it's pretty much hopeless to think that MERGE is going to work
    in complex concurrent scenarios without creating serialization
    anomalies, or at least rollbacks. I think that's baked into the
    nature of what the statement does. To simulate MERGE, you need to
    read from the target table and then do writes that depend on what you
    read. If you do that with the commands that are available today,
    you're going to get serialization anomalies and/or rollbacks under
    concurrency. The mere fact of that logic being inside the database
    rather than outside isn't going to make that go away. Now sometimes,
    as with exclusion constraints, you can play games with dirty snapshots
    to get the semantics you want, but whether that's possible in a
    particular case depends on the details of the operation being
    performed, and here I think it can't be done. Some operations are
    *fundamentally* unserializable.
    I agree with your analysis. Let me review...

    There is a case that locking alone won't resolve, however that locking
    works. The transaction history for that is:

    T1: takes snapshot
    T2: INSERT row1
    T2: COMMIT;
    T1: attempts to determine if MATCHED or NOT MATCHED.

    The answer is neither of those two answers. If we say it is NOT MATCHED
    then we will just fail on any INSERT, or if we say it is MATCHED then
    technically we can't see the row so the UPDATE should fail. The COMMIT
    of T2 releases any locks that would have helped resolve this, and note
    that even if T1 attempts to lock that row as early as possible, only a
    table level lock would prevent T2 from doing that action.

    Two options for resolution are

    1) Throw SERIALIZABLE error

    2) Implement something similar to EvalPlanQual
    As you say, we already resolve this situation for concurrent updates by
    following the update chain from a row that is visible to the latest row.
    For MERGE the semantics need to be subtely different here: we need to
    follow the update chain to the latest row, but from a row that we
    *can't* see.

    MERGE is still very useful without the need for 2), though fails in some
    cases for concurrent use. The failure rate would increase as the number
    of concurrent MERGErs and/or number of rows in source table. Those
    errors are no more serious than are possible now.

    So IMHO we should implement 1) now and come back later to implement 2).
    Given right now there may be other issues with 2) it seems unsafe to
    rely on the assumption that we'll fix them by end of release.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Robert Haas at Oct 26, 2010 at 8:08 pm

    On Tue, Oct 26, 2010 at 8:10 AM, Simon Riggs wrote:
    I agree with your analysis. Let me review...
    [review]
    Sounds like we're on the same page.
    Two options for resolution are

    1) Throw SERIALIZABLE error

    2) Implement something similar to EvalPlanQual
    As you say, we already resolve this situation for concurrent updates by
    following the update chain from a row that is visible to the latest row.
    For MERGE the semantics need to be subtely different here: we need to
    follow the update chain to the latest row, but from a row that we
    *can't* see.

    MERGE is still very useful without the need for 2), though fails in some
    cases for concurrent use. The failure rate would increase as the number
    of concurrent MERGErs and/or number of rows in source table. Those
    errors are no more serious than are possible now.

    So IMHO we should implement 1) now and come back later to implement 2).
    Given right now there may be other issues with 2) it seems unsafe to
    rely on the assumption that we'll fix them by end of release.
    Yeah. In fact, I'm not sure we're ever going to want to implement #2
    - I think that needs more study to determine whether there's even
    something there that makes sense to implement at all. But certainly I
    wouldn't count on it happening for 9.1.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Simon Riggs at Oct 26, 2010 at 9:01 pm

    On Tue, 2010-10-26 at 16:08 -0400, Robert Haas wrote:
    On Tue, Oct 26, 2010 at 8:10 AM, Simon Riggs wrote:
    I agree with your analysis. Let me review...
    [review]
    Sounds like we're on the same page.
    Two options for resolution are

    1) Throw SERIALIZABLE error

    2) Implement something similar to EvalPlanQual
    As you say, we already resolve this situation for concurrent updates by
    following the update chain from a row that is visible to the latest row.
    For MERGE the semantics need to be subtely different here: we need to
    follow the update chain to the latest row, but from a row that we
    *can't* see.

    MERGE is still very useful without the need for 2), though fails in some
    cases for concurrent use. The failure rate would increase as the number
    of concurrent MERGErs and/or number of rows in source table. Those
    errors are no more serious than are possible now.

    So IMHO we should implement 1) now and come back later to implement 2).
    Given right now there may be other issues with 2) it seems unsafe to
    rely on the assumption that we'll fix them by end of release.
    Yeah. In fact, I'm not sure we're ever going to want to implement #2
    - I think that needs more study to determine whether there's even
    something there that makes sense to implement at all. But certainly I
    wouldn't count on it happening for 9.1.
    2) sounds weird, until you realise it is exactly how you would need to
    code a PL/pgSQL procedure to do the equivalent of MERGE. Not doing it
    just makes no sense in the longer term. I agree it will take a while to
    think it through in sufficient detail.

    In the meantime it's a good argument for the ELSE construct at the end
    of the WHEN clauses, so we can do something other than skip a row or
    throw an ERROR.

    --
    Simon Riggs http://www.2ndQuadrant.com/books/
    PostgreSQL Development, 24x7 Support, Training and Services
  • Kevin Grittner at Oct 25, 2010 at 8:23 pm

    Robert Haas wrote:
    Kevin Grittner wrote:
    I would have thought that the INSERT would have
    created an "in doubt" tuple which would block the UPDATE.
    This is just standard MVCC - readers don't block writers, nor
    writers readers.
    Sure, but I tend to think of both INSERT and UPDATE as writes. ;-)
    You might also think about what would happen if the UPDATE
    were run before the INSERT
    either concurrent case is equivalent to the serial
    schedule where the update precedes the insert.
    I guess that's persuasive enough. It feels funny, but the argument
    looks sound, so I guess it's just a case of my intuition being
    faulty.
    In the case of a MERGE that matches a just-inserted invisible
    tuple but no visible tuple, things are a bit stickier.
    Well, more generally it can lead to anomalies in a more complex
    combination of actions, since it creates, as you imply above, a
    rw-dependency from the transaction doing the UPDATE to the
    transaction doing the INSERT, so the combination can form part of a
    cycle in apparent order of execution which can produce an anomaly.
    The more I look at it, the more clear it is that current behavior is
    correct and what the implications are. I've just missed that detail
    until now, wrongly assuming that it would be a write conflict.
    [example of MERGE which can not serialize with a concurrent
    transaction, and possible outcomes if there is no serialization
    failure]
    Now, as Greg says, that might be what some people want, but it's
    certainly monumentally unserializable.
    Yeah. MERGE should probably be sensitive to the transaction
    isolation level, at least to the extent that MERGE in a SERIALIZABLE
    transaction plays nice with other SERIALIZABLE transactions. That
    would be necessary to allow business rules enforced through triggers
    to be able to guarantee data integrity. It would mean that a MERGE
    involving tables which were the target of modifications from
    concurrent SERIALIZABLE transactions would be likely to fail and/or
    to cause other transactions to fail.

    -Kevin
  • Bruce Momjian at Nov 12, 2010 at 6:05 pm

    Kevin Grittner wrote:
    Robert Haas wrote:
    rhaas=# create table concurrent (x integer primary key);
    NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
    "concurrent_pkey" for table "concurrent"
    CREATE TABLE
    rhaas=# insert into x values (1);
    rhaas=# begin;
    BEGIN
    rhaas=# insert into concurrent values (2);
    INSERT 0 1

    <switch to a different window>

    rhaas=# update concurrent set x=x where x=2;
    UPDATE 0
    That surprised me. I would have thought that the INSERT would have
    created an "in doubt" tuple which would block the UPDATE. What is
    the reason for not doing so?
    When Kevin gets surprised, I get worried. LOL

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://enterprisedb.com

    + It's impossible for everything to be true. +
  • Greg Stark at Oct 24, 2010 at 11:12 pm

    On Sun, Oct 24, 2010 at 2:50 AM, Martijn van Oosterhout wrote:
    Can we please not get MERGE hung up on trying to make it atomic. The
    standard requires no such thing and there are plenty of uses of MERGE
    that don't involve high concurrency updates of the same row by
    different processes. If we want an atomic UPSERT, make that a seperate
    command, MERGE without atomicity is useful enough already.
    Really? I don't really see any point in a non-atomic MERGE. Nor in a
    non-atomic UPDATE or INSERT or any other operation. The A in ACID is
    as important as any other letter.

    For that matter if you don't care about atomicity then this is a
    problem already solvable easily solvable in the application. Why
    bother providing a special command for it. The whole reason to want a
    special command is precisely because the database can implement it
    atomically more easily and more efficiently than any application
    implementation.

    Moreover having a case which is non-atomic and allows inconsistent
    results or errors in the face of concurrent updates is a foot-gun.
    Someone will come along and try to use it and it will appear to work
    in their application but introduce nasty hidden race conditions.

    --
    greg
  • Robert Haas at Oct 25, 2010 at 3:19 am

    On Sun, Oct 24, 2010 at 7:11 PM, Greg Stark wrote:
    On Sun, Oct 24, 2010 at 2:50 AM, Martijn van Oosterhout
    wrote:
    Can we please not get MERGE hung up on trying to make it atomic. The
    standard requires no such thing and there are plenty of uses of MERGE
    that don't involve high concurrency updates of the same row by
    different processes. If we want an atomic UPSERT, make that a seperate
    command, MERGE without atomicity is useful enough already.
    Really? I don't really see any point in a non-atomic MERGE. Nor in a
    non-atomic UPDATE or INSERT or any other operation. The A in ACID is
    as important as any other letter.
    I think we're really discussing the "I" in ACID, not the "A". There's
    no question that the MERGE transaction will either commit or fail.
    What we're talking about is what happens when there are concurrent
    table modifications in progress; and the answer is that you might get
    serialization anomalies. But we have serialization anomalies without
    MERGE, too - see the discussions around Kevin Grittner's SSI patch
    which, come to think of it, might be useful for this case, too.

    I posted an example upthread which I think demonstrates very clearly
    that MERGE will result in unavoidable serialization failures - so if
    the standard is that we mustn't have any serialization failures then
    the standard can never be met. The best we can hope for is that we'll
    detect them and roll back if they occur, rather than going on to
    commit but perhaps with some anomaly. And I'm pretty sure that's what
    KG's SSI patch is going to give us. So I'm not sure there's really
    anything to get worked up about here in terms of concurrency issues.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Boxuan Zhai at Oct 23, 2010 at 8:30 pm


    This did seem to find a bug in the implementation after running for a
    while:

    TRAP: FailedAssertion("!(epqstate->origslot != ((void *)0))", File:
    "execMain.c", Line: 1732)

    Line number there is relative to what you can see at
    http://github.com/greg2ndQuadrant/postgres/blob/merge/src/backend/executor/execMain.c
    and that's a null pointer check at the beginning of
    EvalPlanQualFetchRowMarks. Haven't explored why this happened or how
    repeatable this is, but since Boxuan was looking for some bugs to chase I
    wanted to deliver him one to chew on.
    I am not sure how this bug comes. I will try to find out the reason for it.
  • Robert Haas at Oct 14, 2010 at 10:25 am

    On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith wrote:
    Starting looking at the latest MERGE patch from Boxuan here tonight. The
    regression tests pass for me here, good starting sign. I expect to move onto
    trying to break it more creatively next, then onto performance tests.
    Nothing much more exciting than that to report so far.
    Greg, are you still working on a review of this patch?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Greg Smith at Oct 14, 2010 at 2:29 pm

    Robert Haas wrote:
    Greg, are you still working on a review of this patch?
    Yes, just had more distractions while coming to speed up on this area
    than I'd hoped. I'll get a second round of looking at this done by the
    weekend.

    --
    Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services and Support www.2ndQuadrant.us
  • Greg Smith at Sep 24, 2010 at 8:13 am
    Finding time for a review as large as this one is a bit tough, but I've
    managed to set aside a couple of days for it over the next week. I'm
    delivering a large project tonight and intend to start in on the review
    work tomorrow onced that's cleared up. If you're ever not sure who is
    working on your patch and what state they feel it's in, check
    https://commitfest.postgresql.org/action/commitfest_view?id=7 for an
    update; that's where we keep track of all that information.

    Did you ever end up keeping a current version of this patch in an
    alternate repository location, such as github? I thought I saw a
    suggestion from you about that, but after looking through the history
    here all I see are the diff patches you've been sending to the list.
    That's fine, just trying to confirm where everything is at.

    --
    Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
    PostgreSQL Training, Services and Support www.2ndQuadrant.us
    Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
    https://www.packtpub.com/postgresql-9-0-high-performance/book
  • Kevin Grittner at Oct 25, 2010 at 12:12 pm

    Robert Haas wrote:

    What we're talking about is what happens when there are concurrent
    table modifications in progress; and the answer is that you might
    get serialization anomalies. But we have serialization anomalies
    without MERGE, too - see the discussions around Kevin Grittner's
    SSI patch which, come to think of it, might be useful for this
    case, too.
    I've been thinking about that as I read the discussion. If both
    transactions are serializable, there would tend to be a default
    behavior of preventing anomalies. ("Tend to be" because it might
    actually require the addition of a few calls to predicate locking
    functions from new MERGE code to be consistent with other code under
    the patch.)

    On the other hand, where there is a more targeted way of protecting
    integrity, I've tried to keep SSI out of the way and let it function
    as it has. For example, foreign key enforcement already manages
    this, so the SSI implementation intentionally ignores those reads and
    writes. From the discussion on MERGE I've been thinking that where
    there is an appropriate unique index the SSI code might want to stay
    out of the way, similar to foreign keys; but it might be used to
    handle the cases where there is no appropriate index. Or possibly
    the predicate locking portion of it could be used in a novel way by
    MERGE code to implement the MERGE logic. The API for SSI basically
    involves three types of functions:
    - Acquire a predicate lock on an object.
    - Check whether a given object is predicate locked.
    - Check for rw-conflict.
    To be useful for MERGE, that second category would probably need to
    be expanded, and we might need to upgrade btree index locking to
    support range locking rather than stopping at index page locks. Dan
    is planning to try this once he has sufficient benchmarks as a base
    to confirm the performance impact.
    I posted an example upthread which I think demonstrates very
    clearly that MERGE will result in unavoidable serialization
    failures - so if the standard is that we mustn't have any
    serialization failures then the standard can never be met. The best
    we can hope for is that we'll detect them and roll back if they
    occur, rather than going on to commit but perhaps with some
    anomaly. And I'm pretty sure that's what KG's SSI patch is going to
    give us. So I'm not sure there's really anything to get worked up
    about here in terms of concurrency issues.
    Given the standard's emphasis on data integrity and the known
    concurrency issues with relational theory, I find it hard to believe
    that the standard requires "no serialization failures duing merge" --
    but I haven't had time to dig into the standard's specifications yet.

    -Kevin

Related Discussions

People

Translate

site design / logo © 2022 Grokbase