Hello.
This is a proposal patch for SQL/MED for 9.1.

At the prev. CF, this patch had so many features to make it hard
to review all of them. So I devided it into smaller parts:
(1) foreign table DDL support (this proposal):
- support for foreign table DDL syntax (CREATE/ALTER FOREIGN TABLE)
- Definition of FDW routine interface and system catalogs for it.
(2) SELECT support for external PostgreSQL tables.


First of all, I'd like to discuss and make agreement on the basic design
and its implementation for FDW interface.
Comments and ideas would be very appriciated, especially
on how system catalog stores the information about
foreign tables, server, and user mappings.

Detail information are described in the wiki:
http://wiki.postgresql.org/wiki/SQL/MED

I'll add this item to the CF site.
And WIP patch for (2) will be available on the page for how (1) is utilized.

- Open Issue:
- fdw connection params issue:
In this implimentation, foreign table connection params(including
passwords) are visible to all users. To avoid it, store them in the
user_mapping object, not in foreign server object.
# It should be improved, though.
- fdw security issue:
It depends on how indivisual fdw extensions are designed.
I'll focus on it after basic agreement on fdw interface.

- prev. CF page:
https://commitfest.postgresql.org/action/patch_view?id=326

- code repository:
http://repo.or.cz/w/pgsql-fdw.git (branch: foreign_table).

repository branches structures is as follows:
master
+foreign_table: (active) branch for this proposal
- foreign_scan: (active) WIP branch for select support
+fdw_ddl: (will be obsolute) branch for last CF
- dblink
- copy_from

Thanks in advance,

SAKAMOTO Masahiko
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
e-mail: sakamoto.masahiko@oss.ntt.co.jp

Search Discussions

  • Heikki Linnakangas at Sep 15, 2010 at 7:02 am

    On 15/09/10 08:46, SAKAMOTO Masahiko wrote:
    This is a proposal patch for SQL/MED for 9.1.

    At the prev. CF, this patch had so many features to make it hard
    to review all of them. So I devided it into smaller parts:
    (1) foreign table DDL support (this proposal):
    - support for foreign table DDL syntax (CREATE/ALTER FOREIGN TABLE)
    - Definition of FDW routine interface and system catalogs for it.
    (2) SELECT support for external PostgreSQL tables.


    First of all, I'd like to discuss and make agreement on the basic design
    and its implementation for FDW interface.
    Comments and ideas would be very appriciated, especially
    on how system catalog stores the information about
    foreign tables, server, and user mappings.
    In my mind the key question is: what does the API for writing foreign
    data wrappers look like? I couldn't figure that out by looking at the patch.

    The API needs to be simple and version-independent, so that you can
    write simple wrappers like the flat file wrapper easily. At the same
    time, it needs to be very flexible, so that it allows safely pushing
    down all kinds constructs like functions, aggregates and joins. The
    planner needs to know which constructs the wrapper can handle and get
    cost estimates for the foreign scans. Those two goals are at odds.

    I've been thinking about that for some time. Here's one sketch I made a
    while ago:
    http://archives.postgresql.org/pgsql-hackers/2008-07/msg00395.php.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Robert Haas at Sep 15, 2010 at 6:36 pm

    2010/9/15 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
    In my mind the key question is: what does the API for writing foreign
    data wrappers look like? I couldn't figure that out by looking at the patch.

    The API needs to be simple and version-independent, so that you can
    write simple wrappers like the flat file wrapper easily. At the same
    time, it needs to be very flexible, so that it allows safely pushing
    down all kinds constructs like functions, aggregates and joins. The
    planner needs to know which constructs the wrapper can handle and get
    cost estimates for the foreign scans. Those two goals are at odds.

    I've been thinking about that for some time. Here's one sketch I made a
    while ago:
    http://archives.postgresql.org/pgsql-hackers/2008-07/msg00395.php.
    I wonder if we might not think of the foreign data wrapper problem as
    an instance of an even more general problem. Right now, we have
    semi-pluggable index access methods - it's probably not quite possible
    to implement one as a standalone module because of XLOG, and maybe
    other reasons, but at least there's some abstraction layer there.
    Perhaps we should be thinking about a similar facility of table-access
    methods. What if someone wants to implement column-oriented storage,
    or index-organized tables, or or tables that are really slow under
    heavy write loads but can instantaneously compute SELECT COUNT(*) FROM
    table, or mauve-colored tables with magic pixie dust? I don't want to
    raise the bar for this project to the point where we can never get it
    off the ground, but if there's a way to avoid baking in the assumption
    that only foreign tables can ever have special capabilities, that
    might be valuable.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Heikki Linnakangas at Sep 15, 2010 at 6:38 pm

    On 15/09/10 21:21, Robert Haas wrote:
    I wonder if we might not think of the foreign data wrapper problem as
    an instance of an even more general problem. Right now, we have
    semi-pluggable index access methods - it's probably not quite possible
    to implement one as a standalone module because of XLOG, and maybe
    other reasons, but at least there's some abstraction layer there.
    Perhaps we should be thinking about a similar facility of table-access
    methods. What if someone wants to implement column-oriented storage,
    or index-organized tables, or or tables that are really slow under
    heavy write loads but can instantaneously compute SELECT COUNT(*) FROM
    table, or mauve-colored tables with magic pixie dust? I don't want to
    raise the bar for this project to the point where we can never get it
    off the ground, but if there's a way to avoid baking in the assumption
    that only foreign tables can ever have special capabilities, that
    might be valuable.
    Well, you could implement all that as a foreign data wrappers. Tables
    made out of pixie dust feels pretty foreign to me ;-).

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Robert Haas at Sep 15, 2010 at 6:40 pm

    On Wed, Sep 15, 2010 at 2:28 PM, Heikki Linnakangas wrote:
    On 15/09/10 21:21, Robert Haas wrote:

    I wonder if we might not think of the foreign data wrapper problem as
    an instance of an even more general problem.  Right now, we have
    semi-pluggable index access methods - it's probably not quite possible
    to implement one as a standalone module because of XLOG, and maybe
    other reasons, but at least there's some abstraction layer there.
    Perhaps we should be thinking about a similar facility of table-access
    methods.  What if someone wants to implement column-oriented storage,
    or index-organized tables, or or tables that are really slow under
    heavy write loads but can instantaneously compute SELECT COUNT(*) FROM
    table, or mauve-colored tables with magic pixie dust?  I don't want to
    raise the bar for this project to the point where we can never get it
    off the ground, but if there's a way to avoid baking in the assumption
    that only foreign tables can ever have special capabilities, that
    might be valuable.
    Well, you could implement all that as a foreign data wrappers. Tables made
    out of pixie dust feels pretty foreign to me ;-).
    Eh, maybe. It doesn't seem like the best name, if we're actually
    managing the underlying data blocks with our smgr layer, etc.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Tom Lane at Sep 15, 2010 at 7:01 pm

    Robert Haas writes:
    2010/9/15 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
    I've been thinking about that for some time. Here's one sketch I made a
    while ago:
    http://archives.postgresql.org/pgsql-hackers/2008-07/msg00395.php.
    I wonder if we might not think of the foreign data wrapper problem as
    an instance of an even more general problem. Right now, we have
    semi-pluggable index access methods - it's probably not quite possible
    to implement one as a standalone module because of XLOG, and maybe
    other reasons, but at least there's some abstraction layer there.
    Perhaps we should be thinking about a similar facility of table-access
    methods. What if someone wants to implement column-oriented storage,
    or index-organized tables, or or tables that are really slow under
    heavy write loads but can instantaneously compute SELECT COUNT(*) FROM
    table, or mauve-colored tables with magic pixie dust? I don't want to
    raise the bar for this project to the point where we can never get it
    off the ground, but if there's a way to avoid baking in the assumption
    that only foreign tables can ever have special capabilities, that
    might be valuable.
    Well, the problem is to not draw the abstraction boundary so high that
    your plugins have to reimplement the world to get anything done.
    mysql got this wrong IMO, and are still paying the price in the form of
    bizarre functional incompatibilities between their different storage
    engines.

    As an example, I don't think there is any sane way to provide
    column-oriented storage as a plugin. The entire executor is based
    around the assumption that table scans return a row at a time; in
    consequence, the entire planner is too. You can't have a plugin that
    replaces all of that. You could probably build a plugin that allows
    columnar storage but reconstructs rows to return to the executor ... but
    having to do that would largely destroy any advantages of a columnar DB,
    I fear.

    Yet there are other cases that probably *could* work well based on a
    storage-level abstraction boundary; index-organized tables for instance.
    So I think we need to have some realistic idea of what we want to
    support and design an API accordingly, not hope that if we don't
    know what we want we will somehow manage to pick an API that makes
    all things possible.

    I'm personally more worried about whether Heikki's sketch has the
    boundary too high-level than too low-level. It might work all right
    for handing off to a full-fledged remote database, particularly if
    the other DB is also Postgres; but otherwise it's leaving a lot of
    work to be done by the plugin author. And at the same time I don't
    think it's exposing enough information to let the local planner do
    anything intelligent in terms of trading off remote vs. local work.

    regards, tom lane
  • Robert Haas at Sep 15, 2010 at 8:13 pm

    On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane wrote:
    Well, the problem is to not draw the abstraction boundary so high that
    your plugins have to reimplement the world to get anything done.
    mysql got this wrong IMO, and are still paying the price in the form of
    bizarre functional incompatibilities between their different storage
    engines.
    Yeah, as far as I can tell there is pretty much universal consensus
    that they got that wrong. Actually, I have no personal opinion on the
    topic, having no familiarity with the innards of MySQL: but that is
    what people keep telling me.
    As an example, I don't think there is any sane way to provide
    column-oriented storage as a plugin.  The entire executor is based
    around the assumption that table scans return a row at a time; in
    consequence, the entire planner is too.  You can't have a plugin that
    replaces all of that.  You could probably build a plugin that allows
    columnar storage but reconstructs rows to return to the executor ... but
    having to do that would largely destroy any advantages of a columnar DB,
    I fear.
    Yeah, I don't know. A columnar DB is a bit like making "SELECT * FROM
    table" really mean some kind of join between table_part1, table_part2,
    and table_part3 (which could then perhaps be reordered, a candidate
    for join removal, etc.). But I have no position on whether whatever
    infrastructure we'd need to support that is any way related to the
    problem du jour. It's worth noting, however, that even if we give up
    on a column-oriented storage within PG, we might easily be talking to
    a column-oriented DB on the other end of an SQL/MED connection; and
    we'd like to be able to handle that sanely.
    Yet there are other cases that probably *could* work well based on a
    storage-level abstraction boundary; index-organized tables for instance.
    So I think we need to have some realistic idea of what we want to
    support and design an API accordingly, not hope that if we don't
    know what we want we will somehow manage to pick an API that makes
    all things possible.
    Agreed. Random ideas: index-organized tables, tables that use a
    rollback log rather than VACUUM, tables that use strict two-phase
    locking rather than MVCC, tables that have no concurrency control at
    all and you get dirty reads (could be useful for logging tables),
    write-once read-many tables, compressed tables, encrypted tables,
    tables in formats used by previous versions of PostgreSQL, tables that
    store data in a round-robin fashion (like MRTG rrdtool). Within the
    general orbit of index-organized tables, you can wonder about
    different kinds of indices: btree, hash, and gist all seem promising.
    You can even imagine a GIST-like structure that does something like
    maintain running totals for certain columns on each non-leaf page, to
    speed up SUM operations. Feel free to ignore whatever of that seems
    irrelevant.
    I'm personally more worried about whether Heikki's sketch has the
    boundary too high-level than too low-level.  It might work all right
    for handing off to a full-fledged remote database, particularly if
    the other DB is also Postgres; but otherwise it's leaving a lot of
    work to be done by the plugin author.  And at the same time I don't
    think it's exposing enough information to let the local planner do
    anything intelligent in terms of trading off remote vs. local work.
    Yeah, I think the API for exposing cost information needs a lot of thought.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Hitoshi Harada at Sep 16, 2010 at 1:01 am

    2010/9/16 Robert Haas <robertmhaas@gmail.com>:
    On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane wrote:
    Yet there are other cases that probably *could* work well based on a
    storage-level abstraction boundary; index-organized tables for instance.
    So I think we need to have some realistic idea of what we want to
    support and design an API accordingly, not hope that if we don't
    know what we want we will somehow manage to pick an API that makes
    all things possible.
    Agreed.  Random ideas: index-organized tables...
    I'd love to see a table that is based on one of the existing KVSs.


    --
    Hitoshi Harada
  • Robert Haas at Sep 16, 2010 at 12:57 am

    On Wed, Sep 15, 2010 at 8:53 PM, Hitoshi Harada wrote:
    2010/9/16 Robert Haas <robertmhaas@gmail.com>:
    On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane wrote:
    Yet there are other cases that probably *could* work well based on a
    storage-level abstraction boundary; index-organized tables for instance.
    So I think we need to have some realistic idea of what we want to
    support and design an API accordingly, not hope that if we don't
    know what we want we will somehow manage to pick an API that makes
    all things possible.
    Agreed.  Random ideas: index-organized tables...
    I'd love to see a table that is based on one of the existing KVSs.
    I'm not familiar with the term KVS?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Robert Haas at Sep 16, 2010 at 1:05 am

    On Wed, Sep 15, 2010 at 8:57 PM, Robert Haas wrote:
    On Wed, Sep 15, 2010 at 8:53 PM, Hitoshi Harada wrote:
    2010/9/16 Robert Haas <robertmhaas@gmail.com>:
    On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane wrote:
    Yet there are other cases that probably *could* work well based on a
    storage-level abstraction boundary; index-organized tables for instance.
    So I think we need to have some realistic idea of what we want to
    support and design an API accordingly, not hope that if we don't
    know what we want we will somehow manage to pick an API that makes
    all things possible.
    Agreed.  Random ideas: index-organized tables...
    I'd love to see a table that is based on one of the existing KVSs.
    I'm not familiar with the term KVS?
    Oh, key-value store, I bet. Yeah, that would be cool.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Hitoshi Harada at Sep 16, 2010 at 1:06 am

    2010/9/16 Robert Haas <robertmhaas@gmail.com>:
    On Wed, Sep 15, 2010 at 8:57 PM, Robert Haas wrote:
    On Wed, Sep 15, 2010 at 8:53 PM, Hitoshi Harada wrote:
    2010/9/16 Robert Haas <robertmhaas@gmail.com>:
    On Wed, Sep 15, 2010 at 3:00 PM, Tom Lane wrote:
    Yet there are other cases that probably *could* work well based on a
    storage-level abstraction boundary; index-organized tables for instance.
    So I think we need to have some realistic idea of what we want to
    support and design an API accordingly, not hope that if we don't
    know what we want we will somehow manage to pick an API that makes
    all things possible.
    Agreed.  Random ideas: index-organized tables...
    I'd love to see a table that is based on one of the existing KVSs.
    I'm not familiar with the term KVS?
    Oh, key-value store, I bet.  Yeah, that would be cool.
    That's it. Like Redis, Tokyo Cabinet, or something.



    --
    Hitoshi Harada
  • Tom Lane at Sep 16, 2010 at 1:22 am

    Hitoshi Harada writes:
    2010/9/16 Robert Haas <robertmhaas@gmail.com>:
    Oh, key-value store, I bet.  Yeah, that would be cool.
    That's it. Like Redis, Tokyo Cabinet, or something.
    What exactly do those get you that an ordinary index, or at worst an
    index-organized table, doesn't get you?

    regards, tom lane
  • Mark Kirkwood at Sep 16, 2010 at 1:31 am

    On 16/09/10 13:22, Tom Lane wrote:
    Hitoshi Harada<umi.tanuki@gmail.com> writes:
    2010/9/16 Robert Haas<robertmhaas@gmail.com>:
    Oh, key-value store, I bet. Yeah, that would be cool.
    That's it. Like Redis, Tokyo Cabinet, or something.
    What exactly do those get you that an ordinary index, or at worst an
    index-organized table, doesn't get you?

    regards, tom lane
    It is pretty rare to see key value stores vs relational engines
    discussed without a descent into total foolishiness, but this Wikipedia
    page looks like a reasonable summary:

    http://en.wikipedia.org/wiki/NoSQL
  • Tom Lane at Sep 16, 2010 at 2:05 am

    Mark Kirkwood writes:
    On 16/09/10 13:22, Tom Lane wrote:
    What exactly do those get you that an ordinary index, or at worst an
    index-organized table, doesn't get you?
    It is pretty rare to see key value stores vs relational engines
    discussed without a descent into total foolishiness, but this Wikipedia
    page looks like a reasonable summary:
    http://en.wikipedia.org/wiki/NoSQL
    That doesn't do anything at all to answer my question. I don't want
    to debate NoSQL versus traditional RDBMS here. What I asked was:
    given that PG is a traditional RDBMS, what exactly are you hoping
    to accomplish by putting a key-value storage mechanism in it? And
    if you did, how would that be different from an index-organized table?

    regards, tom lane
  • Garick Hamlin at Sep 16, 2010 at 2:04 pm

    On Wed, Sep 15, 2010 at 10:05:00PM -0400, Tom Lane wrote:
    Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
    On 16/09/10 13:22, Tom Lane wrote:
    What exactly do those get you that an ordinary index, or at worst an
    index-organized table, doesn't get you?
    It is pretty rare to see key value stores vs relational engines
    discussed without a descent into total foolishiness, but this Wikipedia
    page looks like a reasonable summary:
    http://en.wikipedia.org/wiki/NoSQL
    That doesn't do anything at all to answer my question. I don't want
    to debate NoSQL versus traditional RDBMS here. What I asked was:
    given that PG is a traditional RDBMS, what exactly are you hoping
    to accomplish by putting a key-value storage mechanism in it? And
    if you did, how would that be different from an index-organized table?
    One thing it would get is integration with existing infrastructure that
    makes up many critical apps. Many horizontal apps use things like memcached
    or redis to provide a distributed data layer for developing their applications.
    Sometimes that becomes the middle layer for an enterprise. Being able to hook
    into that middle layer is very handy. Shared login or session information is
    a good example of data that one might want put in a KVS. Also, many enterprises
    are full of different departments, orgs, teams, systems, etc ... KVS are simple
    and limited enough they might make a good choice for standardizing on how to share
    data in some places.

    Isn't this what SQL/Med is about?

    Garick
    regards, tom lane

    --
    Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-hackers
  • Robert Haas at Sep 16, 2010 at 2:05 am

    On Wed, Sep 15, 2010 at 9:22 PM, Tom Lane wrote:
    Hitoshi Harada <umi.tanuki@gmail.com> writes:
    2010/9/16 Robert Haas <robertmhaas@gmail.com>:
    Oh, key-value store, I bet.  Yeah, that would be cool.
    That's it. Like Redis, Tokyo Cabinet, or something.
    What exactly do those get you that an ordinary index, or at worst an
    index-organized table, doesn't get you?
    Speed, hopefully. Because otherwise there's little reason to think
    that abandoning transactional semantics, durability, and strong typing
    are good ideas.

    We spend a lot of time focusing on people who need things to be MORE
    ACID (sync rep and true serializability are just two examples) and
    that's really great stuff but on occasion it can be useful to turn the
    dial in the other direction. There are an awful lot of solutions out
    there that are based on eventually consistent replication, in-memory
    databases with periodic snapshots to disk, etc. and, while I sure as
    hell wouldn't advocate using those solutions to target nuclear
    weapons, they are proving to be pretty useful for things like social
    networking, storing the results of giant web crawls, etc., because
    they reduce the amount of hardware that you need to buy to process a
    given volume of data.

    The work I'm doing on unlogged tables is a good example of how we can
    provide users with the flexibility they need. I just did my first
    test (sorry, it's not in good enough shape to post for the CF, more's
    the pity) and a bulk insert into an unlogged table was ~3x faster than
    the same thing on a regular table. There are a lot of people who can
    benefit from that sort of thing. For example, you can imagine that if
    you have a "sessions" table where you store a record for each
    currently-logged-in user, an unlogged table would be fine. If the
    database crashes and comes back up again, everyone has to log in
    again, but that's a rare event and not a disaster if it happens.

    Another avenue that I think we should explore in this regard is tables
    that have some sort of weakened MVCC properties. The most obvious
    example is a table where all tuples are visible to everyone, period.
    You can imagine using this as a logging table, for example., in lieu
    of autonomous transactions. There might be other cases that are
    useful, too: what about a table where only committed tuples are
    visible, but we ignore the effect of snapshots (i.e. if the inserting
    or deleting transaction is committed, then we assume that the
    XMIN/XMAX is visible to the current snapshot without checking)? That
    might seem like playing with firecrackers, but suppose the table has
    very low concurrency: e.g. overnight you update data in bulk, then
    after that's done you run daily reporting queries, then during the day
    users run small read-only queries interactively. You pretty much know
    that you're never going to have a tuple that's visible to some
    transactions but not others, so why pay the overhead of checking every
    time?

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Mark Kirkwood at Sep 16, 2010 at 2:27 am

    On 16/09/10 14:05, Robert Haas wrote:
    On Wed, Sep 15, 2010 at 9:22 PM, Tom Lanewrote:
    Hitoshi Harada<umi.tanuki@gmail.com> writes:
    2010/9/16 Robert Haas<robertmhaas@gmail.com>:
    Oh, key-value store, I bet. Yeah, that would be cool.
    That's it. Like Redis, Tokyo Cabinet, or something.
    What exactly do those get you that an ordinary index, or at worst an
    index-organized table, doesn't get you?
    For example, you can imagine that if
    you have a "sessions" table where you store a record for each
    currently-logged-in user, an unlogged table would be fine. If the
    database crashes and comes back up again, everyone has to log in
    again, but that's a rare event and not a disaster if it happens.
    Or perhaps even a "sessions" type table where the rows are overwritten
    in place in some manner, to avoid bloat.

    regards

    Mark
  • Hitoshi Harada at Sep 16, 2010 at 5:35 am

    2010/9/16 Mark Kirkwood <mark.kirkwood@catalyst.net.nz>:
    On 16/09/10 14:05, Robert Haas wrote:

    On Wed, Sep 15, 2010 at 9:22 PM, Tom Lanewrote:
    Hitoshi Harada<umi.tanuki@gmail.com>  writes:
    2010/9/16 Robert Haas<robertmhaas@gmail.com>:
    Oh, key-value store, I bet.  Yeah, that would be cool.
    That's it. Like Redis, Tokyo Cabinet, or something.
    What exactly do those get you that an ordinary index, or at worst an
    index-organized table, doesn't get you?
    For example, you can imagine that if
    you have a "sessions" table where you store a record for each
    currently-logged-in user, an unlogged table would be fine.  If the
    database crashes and comes back up again, everyone has to log in
    again, but that's a rare event and not a disaster if it happens.
    Or perhaps even a "sessions" type table where the rows are overwritten in
    place in some manner, to avoid bloat.
    My answer is "variety". If an index-organized table was the one best
    solution, there would not been so many KVSes these days.

    Regards,



    --
    Hitoshi Harada
  • Itagaki Takahiro at Sep 16, 2010 at 2:20 am

    2010/9/15 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
    The API needs to be simple and version-independent, so that you can
    write simple wrappers like the flat file wrapper easily. At the same
    time, it needs to be very flexible, so that it allows safely pushing
    down all kinds constructs like functions, aggregates and joins. The
    planner needs to know which constructs the wrapper can handle and get
    cost estimates for the foreign scans. Those two goals are at odds.
    The patch didn't say nothing about the API design, no?
    It just implement FOREIGN TABLE commands in the SQL standard,
    and we need the part anyway if we want to support the standard.

    --
    Itagaki Takahiro
  • SAKAMOTO Masahiko at Sep 16, 2010 at 6:16 am

    (2010/09/16 11:20), Itagaki Takahiro wrote:
    2010/9/15 Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>:
    The API needs to be simple and version-independent, so that you can
    write simple wrappers like the flat file wrapper easily. At the same
    time, it needs to be very flexible, so that it allows safely pushing
    down all kinds constructs like functions, aggregates and joins. The
    planner needs to know which constructs the wrapper can handle and get
    cost estimates for the foreign scans. Those two goals are at odds.
    The patch didn't say nothing about the API design, no?
    It just implement FOREIGN TABLE commands in the SQL standard,
    and we need the part anyway if we want to support the standard.
    you are right, sorry. Attached includes
    - FDW routine interface and
    - FDW implementation of FDW for external PostgreSQL.
    This should be patched after the previous patch.

    This is designed to be used in the executor module.
    The API has some simple interfaces, such as FreeConnection,
    Open, Iterate, Close, ReOpen, and OnError.

    In this current FDW implementation for external PG,
    It can push-down WHERE-clause by reconstructing query
    that shouled be sent to the foreign server.

    I think this FDW routine interface is simple, but
    extension develpers can add intelligent features to some extent.

    FDW routine interface design and what this fdw-for-pg can do
    is summarized in:
    http://wiki.postgresql.org/wiki/SQL/MED#PostgreSQL

    Regards,

    SAKAMOTO Masahiko
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    e-mail: sakamoto.masahiko@oss.ntt.co.jp
  • SAKAMOTO Masahiko at Sep 16, 2010 at 6:30 am
    And this is sample script to play the feature of this patch.

    It includes:
    - create foreign data wrapper and foreign server(PG at localhost)
    - select foreign PostgreSQL table.

    *Important*
    This script initializes your database cluster(specified by $PGDATA).
    Run with care....

    Regards,

    SAKAMOTO Masahiko
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    e-mail: sakamoto.masahiko@oss.ntt.co.jp
  • Itagaki Takahiro at Sep 16, 2010 at 6:58 am

    On Thu, Sep 16, 2010 at 3:15 PM, SAKAMOTO Masahiko wrote:
    This is designed to be used in the executor module.
    The API has some simple interfaces, such as FreeConnection,
    Open, Iterate, Close, ReOpen, and OnError.
    Of course they are similar APIs mentioned in the SQL standard, but
    I'm not sure they are enough for all purposes. For example, they
    don't have planner hooks, but we will need them eventually to push
    down ORDER BY and LIMIT into FDW.

    We could learn from MySQL's storage engine interface, because
    they are designed for flat file wrapper at first, but enhanced
    with pains for condition push-down.

    --
    Itagaki Takahiro
  • Itagaki Takahiro at Sep 17, 2010 at 2:29 am

    2010/9/15 SAKAMOTO Masahiko <sakamoto.masahiko@oss.ntt.co.jp>:
    This is a proposal patch for SQL/MED for 9.1.
    (1) foreign table DDL support (this proposal):
    - support for foreign table DDL syntax (CREATE/ALTER FOREIGN TABLE)
    - Definition of FDW routine interface and system catalogs for it.
    I checked the patch. It includes changes for DDL, system catalogs,
    information schema, and basic psql support. The patch itself have no useful
    works, but we need the parts anyway to support the SQL standard.

    I have a couples of comments:

    * There are unused types in the patch. They might be used by additional
    patches based on the patch, but should be removed for now.
    - enum GenericOptionFlags.ForeignTableOpt
    - typedef struct FSConnection FSConnection;
    - typedef struct FdwRoutine FdwRoutine;
    - typedef struct FdwReply FdwReply;

    * Needs an error check to SELECT FROM foreign table. It might be replaced
    to actual FDW routines soon, but the current error message is not ideal.
    postgres=# SELECT * FROM ft1;
    ERROR: could not open file "base/11908/16391": No such file or directory

    * Type checks between TABLE and FOREIGN TABLE are a bit inconsistent.
    For example, "ALTER TABLE ADD COLUMN" can add a column to a foreign tables
    but "DROP TABLE" cannot remove foreign tables.
    IMHO, however, we can allow such looseness because operations actually
    forbidden will end with ERRORs without problems.

    --
    Itagaki Takahiro
  • SAKAMOTO Masahiko at Sep 17, 2010 at 10:47 am
    Thanks for your comment. I've updated the patches.
    I checked the patch. It includes changes for DDL, system catalogs,
    information schema, and basic psql support. The patch itself have no useful
    works, but we need the parts anyway to support the SQL standard.
    As you mentioned, this patch provides
    CREATE/ALTER/DROP FOREIGNTABLE functionality only.
    Please review this patch from the viewpoints:
    - confirming level to the SQL standard
    - features which should be supported on foreign table
    - WITH OIDS
    - DEFAULT
    - constraints (currently NOT NULL and CHECK are supported)
    - etc.

    SELECTing on foreign tables functionality will be provided
    by the fdw_select patch, as attached.
    * There are unused types in the patch.
    Fixed (definitions are moved to the other patch).
    * Needs an error check to SELECT FROM foreign table.
    Fixed. I've added relkind check in set_plain_rel_pathlist(),
    which aborts query with elog(ERROR) if the relation was a foreign table.
    This fix also checks for DELETE, UPDATE and EXPLAIN SELECT.
    * Type checks between TABLE and FOREIGN TABLE are a bit inconsistent.
    For example, "ALTER TABLE ADD COLUMN" can add a column to a
    foreign tables
    but "DROP TABLE" cannot remove foreign tables.
    IMHO, however, we can allow such looseness because operations actually
    forbidden will end with ERRORs without problems.
    In current implementation, we allow some kind of operations to
    ALTER TABLE which can be done on views on foreign tables (i.e.
    ALTER TABLE RENAME TO).
    But as Itagaki-san says, the inconsistency will not cause serious
    problem.

    Regards,

    (2010/09/17 11:29), Itagaki Takahiro wrote:it.
    I checked the patch. It includes changes for DDL, system catalogs,
    information schema, and basic psql support. The patch itself have no useful
    works, but we need the parts anyway to support the SQL standard.

    I have a couples of comments:

    * There are unused types in the patch. They might be used by additional
    patches based on the patch, but should be removed for now.
    - enum GenericOptionFlags.ForeignTableOpt
    - typedef struct FSConnection FSConnection;
    - typedef struct FdwRoutine FdwRoutine;
    - typedef struct FdwReply FdwReply;

    * Needs an error check to SELECT FROM foreign table. It might be replaced
    to actual FDW routines soon, but the current error message is not ideal.
    postgres=# SELECT * FROM ft1;
    ERROR: could not open file "base/11908/16391": No such file or directory

    * Type checks between TABLE and FOREIGN TABLE are a bit inconsistent.
    For example, "ALTER TABLE ADD COLUMN" can add a column to a foreign tables
    but "DROP TABLE" cannot remove foreign tables.
    IMHO, however, we can allow such looseness because operations actually
    forbidden will end with ERRORs without problems.
  • Itagaki Takahiro at Sep 24, 2010 at 2:55 am

    On Fri, Sep 17, 2010 at 7:47 PM, SAKAMOTO Masahiko wrote:
    Thanks for your comment. I've updated the patches.
    I reviewed fdw_table20100917.patch.gz. It adds FOREIGN TABLE in the
    SQL standard,
    including CREATE/ALTER/DROP FOREIGN TABLE, information_schema (foreign_table
    and foreign_table_options), and permission controls. But SELECT from
    foreign tables
    is not supported yet -- so, SELECT permission won't work in anyway.

    In addition to the standard, it also supports some additional operations.
    * COMMENT ON FOREIGN TABLE
    * WITH OIDS -- reasonable because it is just same as adding a column.
    * RULE and DEFAULT -- useful to define updatable foreign table.
    * INHERIT, NOT NULL, and CHECK -- they will be used by inter-server
    partitioning in the future.

    DEFAULT is not used by INSERT INTO foreign table (not supported),
    but is used by INSERT RULE. So, it's reasonable for us to have the feature.
    Foreign tables are read-only in the standard just like a VIEW.
    * Needs an error check to SELECT FROM foreign table.
    Fixed. I've added relkind check in set_plain_rel_pathlist(),
    Not that if we create a foreign table inherits a normal table, we cannot
    SELECT FROM the normal table because planner expands a query to retrieve
    data from the foreign table, and the query fails with an error.
    ERROR: foreign table scan not implemented.
    But I think it's an acceptable limitation for now.


    I found some trivial mistakes in the patch:

    * src/bin/psql/describe.c : listForeignTables()
    + if (pset.sversion < 80500)
    We are in 90100 now ;-)

    * SGML doc for CREATE FOREIGN TABLE contains "xxx (integer)".
    Is it a typo? All of the Storage Parameters section will be removed.

    * SGML doc for DROP FOREIGN TABLE has "a<productname>PostgreSQL</>"
    A separator is required between a and <productname>.

    For the documentation, we could rewrite some duplicated descriptions
    in CREATE/ALTER FOREIGN TABLE into links to CREATE/ALTER TABLE.
    They have many copy-and-pasted texts. Do we need to adjust them?


    Overall, I think the FOREIGN TABLE DDL part is almost ready to commit if we
    accept the proposal itself. Of course we need more discussions about FDW
    Routines, but it must be developed based on the infrastructure.

    --
    Itagaki Takahiro
  • Robert Haas at Sep 24, 2010 at 3:08 am

    On Thu, Sep 23, 2010 at 10:55 PM, Itagaki Takahiro wrote:
    Overall, I think the FOREIGN TABLE DDL part is almost ready to commit if we
    accept the proposal itself. Of course we need more discussions about FDW
    Routines, but it must be developed based on the infrastructure.
    I think we need to further discuss how this is eventually going to get
    integrated with the query planner and the executor before we commit
    anything. The syntax support by itself is quite trivial.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Itagaki Takahiro at Sep 24, 2010 at 3:26 am

    On Fri, Sep 24, 2010 at 12:08 PM, Robert Haas wrote:
    I think we need to further discuss how this is eventually going to get
    integrated with the query planner and the executor before we commit
    anything.  The syntax support by itself is quite trivial.
    There are no active discussions :-( I think the author tried his best, so if
    other developers think it's a bad design, alternate plan must be proposed.

    Also, if the syntax change is trivial, that's why we merge it at
    earlier commitfests. I saw many patch went into "No patch reviewers
    because of too large size" syndrome before. DDL changes are 5K
    lines of diff -c patch, and "select" part is additional 6K lines.

    --
    Itagaki Takahiro
  • Heikki Linnakangas at Sep 24, 2010 at 9:12 am

    On 24/09/10 06:26, Itagaki Takahiro wrote:
    On Fri, Sep 24, 2010 at 12:08 PM, Robert Haaswrote:
    I think we need to further discuss how this is eventually going to get
    integrated with the query planner and the executor before we commit
    anything. The syntax support by itself is quite trivial.
    There are no active discussions :-( I think the author tried his best, so if
    other developers think it's a bad design, alternate plan must be proposed.
    It's not that the design is bad, it's that it's non-existent. I haven't
    seen any design on how this integrates with the planner.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Itagaki Takahiro at Sep 24, 2010 at 9:56 am

    On Fri, Sep 24, 2010 at 6:12 PM, Heikki Linnakangas wrote:
    It's not that the design is bad, it's that it's non-existent. I haven't seen
    any design on how this integrates with the planner.
    In my understanding, the DDL part is independent from planner integration,
    and that's why the author extract DDL changes from the composite patch.
    (The proposal name is "SQL/MED(FDW) DDL".)


    I think there are two type of FDWs. One is a simple flat file wrapper
    used by COPY FROM now, that doesn't require any planner hooks.
    Another is a connector to an external database, like as dblink, that
    should be integrated with the planner.

    The former is almost same as SRF functions, but it will skip unneeded
    tuplestore caching. Even the level is useful enough because we can
    support "SELECT * FROM csvfile" without any intermediate tables.
    Could we set the first goal to the level?

    Of course we need more discussion for the latter case. The current proposed
    patch ("select" part") supports executor integration -- WHERE clause
    push-down, so it can use indexes in external servers. On the other hand,
    unsupported ORDER BY, LIMIT, and JOIN push-down require planner integration.
    More works will be required for fdw_select20100917.patch.gz.

    --
    Itagaki Takahiro
  • Robert Haas at Sep 24, 2010 at 12:14 pm

    On Fri, Sep 24, 2010 at 5:56 AM, Itagaki Takahiro wrote:
    I think there are two type of FDWs. One is a simple flat file wrapper
    used by COPY FROM now, that doesn't require any planner hooks.
    Another is a connector to an external database, like as dblink, that
    should be integrated with the planner.
    This is a good point. On the one hand, I do agree that the API for
    simple things like processing CSV files shouldn't be overly complex.
    So perhaps we could start with a simple API and extend it later. On
    the other hand, understanding how some of the more complex cases ought
    to work provides insight into handling the simpler cases. So I think
    we should aim to have at least a sketch of a design for the whole
    feature, and then if in phase 1 we want to implement only the easier
    parts, that's OK.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • SAKAMOTO Masahiko at Sep 27, 2010 at 6:50 am
    sorry for my slow response.

    (2010/09/24 21:13), Robert Haas wrote:
    This is a good point. On the one hand, I do agree that the API for
    simple things like processing CSV files shouldn't be overly complex.
    So perhaps we could start with a simple API and extend it later. On
    the other hand, understanding how some of the more complex cases ought
    to work provides insight into handling the simpler cases. So I think
    we should aim to have at least a sketch of a design for the whole
    feature, and then if in phase 1 we want to implement only the easier
    parts, that's OK.
    Right. In any case, I should clearify what this API could cover
    by this patch and what could not.
    # And also how far I and my collaborator can implement..

    As Itagaki points out, we have two types of FDW implementations in
    progress: flat file wrapper and postgresql wrapper. these items are
    described in wiki:
    http://wiki.postgresql.org/wiki/SQL/MED

    But it may be hard to get what this fdw routines 'could' provide,
    so I and my collaborator is now summarizing these items to discuss and
    ask for help. Anyway I think these items are needed to discuss further.
    - API design that is used in executor for a single external table.
    - simple FDW implements to clearfy what this API could provide.

    These items will be shown by my collaborator soon.
    thanks in advance,

    SAKAMOTO Masahiko
    NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    NTT Open Source Software Center
    e-mail: sakamoto.masahiko@oss.ntt.co.jp
  • Shigeru HANADA at Sep 27, 2010 at 2:29 pm
    Hi hackers,

    On Mon, 27 Sep 2010 15:50:34 +0900
    SAKAMOTO Masahiko wrote:
    Right. In any case, I should clearify what this API could cover
    by this patch and what could not.
    # And also how far I and my collaborator can implement..

    As Itagaki points out, we have two types of FDW implementations in
    progress: flat file wrapper and postgresql wrapper. these items are
    described in wiki:
    http://wiki.postgresql.org/wiki/SQL/MED

    But it may be hard to get what this fdw routines 'could' provide,
    so I and my collaborator is now summarizing these items to discuss and
    ask for help. Anyway I think these items are needed to discuss further.
    - API design that is used in executor for a single external table.
    - simple FDW implements to clearfy what this API could provide.
    To discuss how the SQL/MED patches should be, we made summary of
    current SQL/MED proposal.

    * What the current SQL/MED patches provide

    Recent patch fdw_select20100917 consists of some parts: I/F of
    FDW, new executor node ForeignScan, connection caching, and FDW
    for PostgreSQL. FDW for flat file is not included. All of them
    are based on DDLs and catalogs which are provided by
    fdw_table20100917 patch.

    Itagaki has summarized the details of fdw_table20100917 patch
    well. Thanks for review.
    http://archives.postgresql.org/pgsql-hackers/2010-09/msg01653.php

    The I/F of FDW is defined as struct FdwRoutine, set of API
    functions which are called from backend modules. The APIs are
    designed to support only scanning a foreign at once. Path/Plan
    optimizations like JOIN/UNION push-down are out of scope. Such
    optimizations require planner hook as mentioned by Itagaki before.
    In short, our current goal is to implement SeqScan for foreign
    tables.

    ForeignScan node is an executor node which is like SeqScan node
    for local table. Optimizer generates T_ForeignScan path instead
    of T_SeqScan path for a foreign table in set_plain_rel_pathlist()
    if the RangeTblEntry was a foreign table. Index paths and
    tidscan paths are never generated for foreign tables.

    ForeignScanState and FdwReply are introduced to represent the
    status of a foreign scan. ForeignScanState is a subclass of
    ScanState, and FdwReply is a abstract type which is used to pass
    FDW-specific data between API calls.

    * Details of FDW API

    FDWs should implement HANDLER function which returns a pointer to
    a FdwRoutine instance which has pointers to actual functions.

    struct FdRoutine {

    /*
    * ConnectServer() will be called from ExecInitForeignScan()
    * if the backend has no connection which can be used to
    * execute the foreign query for the foreign table.
    * FDW should establish a connection between foreign server
    * and return it with casting to pointer to FSConnection
    * (abstract connection type). If the FDW doesn't need any
    * connection, returning NULL is OK.
    * The arguments, server and user, can be used to extract
    * connection information.
    */
    FSConnection* (*ConnectServer)(ForeignServer *server,
    UserMapping *user);

    /*
    * FreeFSConnection() will be called when backend dies or
    * DISCARD ALL command was executed.
    * FDW should close connection gracefully and free resources
    * if any.
    */
    void (*FreeFSConnection)(FSConnection *conn);

    /*
    * Open() will be called from ExecInitForeignScan().
    * FDW should initialize ForeignScanState, internal state of
    * a foreign scan, and ready to return tuple in next Iterate()
    * call.
    * For instance, FDW for PostgreSQL only generate SQL from
    * ScanState. And we implemented WHERE clause push-down here.
    */
    void (*Open)(ForeignScanState *scanstate);

    /*
    * Iterate() will be called from ExecForeignScan() when the
    * executor requests next tuple.
    * For instance, FDW for PostgreSQL executes foreign query at
    * first call and stores all results into TupleStore, and
    * returns each tuple for each Iterate() call.
    */
    void (*Iterate)(ForeignScanState *scanstate);

    /*
    * Close() will be called from ExecEndForeignScan().
    * FDW should free resources for FdwReply if any.
    */
    void (*Close)(ForeignScanState *scanstate);

    /*
    * ReOpen() will be called from ExecForeignReScan() when the
    * foreign scan should be reseted to scan whole data from the
    * head again.
    * For instance, FDW for PostgreSQL frees current result set
    * to make next Iterate() call to execute foreign query again.
    */
    void (*ReOpen)(ForeignScanState *scanstate);

    };

    Maybe FdwRoutine should have more APIs to support essential
    features.

    The startup/total cost of scanning a foreign table are fixed to
    10000.0 in current patch. They are groundless values, they just
    say that scanning foreign table costs more than scanning local
    table. The cost should be estimated from statistics in
    pg_statistic and pg_class. This function has not been
    implemented yet.

    /*
    * UpdateStats() will be called from ANALYZE routine when
    * statistics of a foreign table should be updated.
    *
    * Some more parameters may be needed.
    *
    * For instance, FDW for PostgreSQL will retrieve
    * pg_statistic and pg_class from foreign server and store
    * them in local catalog. How to represent overhead of
    * foreign query is still issue.
    */
    void (*UpdateStats)(Oid relid);

    To support path/plan optimization, some APIs should be added, but
    we think that simple FDWs can be implemented with APIs above.

    * Next action

    To focus on API discussion, we are going to simplify and
    re-submit patches.

    * Fix some mistakes pointed out by Itagaki.
    * Make FDW for PostgreSQL independent contrib module.
    Currently it shares connection management codes with
    contrib/dblink.
    * Add simple FDW for flat file (maybe CSV ?) into core to
    implement regression tests for FOREIGN TABLE DDLs and foreign
    table query. In the future, the FDW for flat file can be
    integrated with COPY FROM.

    Any comments/questions are welcome.

    Regards,
    --
    Shigeru Hanada
  • Robert Haas at Sep 28, 2010 at 2:27 pm

    On Mon, Sep 27, 2010 at 2:50 AM, SAKAMOTO Masahiko wrote:
    http://wiki.postgresql.org/wiki/SQL/MED
    With regard to what is written here, it strikes me that it would be an
    extremely bad idea to try to mix reloptions or attoptions with
    fdwoptions. fdwoptions are options to be passed transparently to the
    fdw to handle as it likes; rel/attoptions affect the behavior of PG.

    I think the section about WHERE clause push-down is way off base.
    First, it seems totally wrong to assume that the same functions and
    operators will be defined on the remote side as you have locally;
    indeed, for CSV files, you won't have anything defined on the remote
    side at all. You need some kind of a discovery mechanism here to
    figure out which quals are push-downable. And it should probably be
    something generic, not a bunch of hard-wired rules that may or may not
    be correct in any particular case. What if the remote side is a
    competing database product that doesn't understand X = ANY(Y)?
    Second, even if a functions or operators does exist on both sides of
    the link, how do you know whether they have compatible semantics?
    Short of solving the entscheidungsproblem, you're not going to be able
    to determine that algorithmically, so you need some kind of mechanism
    for controlling what assumptions get made. Otherwise, you'll end up
    with queries that don't work and no way for the user to fix it.

    It seems to me that the API should allow PG to ask the FDW questions like this:

    - How many tuples are there on the remote side?
    - Here is a qual. Are you able to evaluate this qual remotely?
    - What are the startup and total costs of a sequential scan of the
    remote side with the following set of remotely executable quals?
    - Are there any indices available on the remote side, and if so what
    are there names and which columns do they index in which order
    (asc/desc, nulls first/last)?
    - What are the startup and total costs of an index scan of the remote
    side using the index called $NAME given the following set of remotely
    executable quals?

    and, as you mentIon:

    - Please update pg_statistic for this foreign table, if you have that
    capability.

    Then:

    - Begin a sequential scan with the following set of quals.
    - Begin an index scan using the index called X with the following set of quals.
    - Fetch next tuple.
    - End scan.

    Maybe that's too much for a first version but if we're not going to
    deal with the problems in a general way, then we ought to not deal
    with them at all, rather than having hacky rules that will work if
    your environment is set up in exactly the way the code expects and
    otherwise break horribly.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Alvaro Herrera at Sep 29, 2010 at 2:10 pm

    Excerpts from Robert Haas's message of mar sep 28 10:26:54 -0400 2010:

    Then:

    - Begin a sequential scan with the following set of quals.
    - Begin an index scan using the index called X with the following set of quals.
    - Fetch next tuple.
    - End scan.
    I'm not sure that it's a good idea to embed into the FDW API the set of
    operations known to the executor. For example your proposal fails to
    consider bitmap scans. Seems simpler and more general to hand the quals
    over saying "I need to scan this relation with these quals", and have it
    return an opaque iterable object; the remote executor would be in charge
    of determining their usage for indexscans; or even for filtering tuples
    with quals that cannot be part of the index condition.

    There doesn't to be much point in knowing the names of remote indexes
    either (if it came to referencing them, better use OIDs)

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Itagaki Takahiro at Sep 30, 2010 at 1:30 am

    On Wed, Sep 29, 2010 at 11:09 PM, Alvaro Herrera wrote:
    I'm not sure that it's a good idea to embed into the FDW API the set of
    operations known to the executor.  For example your proposal fails to
    consider bitmap scans.  Seems simpler and more general to hand the quals
    over saying "I need to scan this relation with these quals", and have it
    return an opaque iterable object;
    Agreed. If possible, we will avoid dedicated interfaces for seqscans and
    index scans. However, bitmap scan is difficult anyway because foreign tables
    might not have ctid columns. It's a challenging task to identify individual
    tuples in foreign tables. It will be also used for UPDATE and DELETE.
    There doesn't to be much point in knowing the names of remote indexes
    either (if it came to referencing them, better use OIDs)
    FYI, HiRDB, that implements FDW routines, has CREATE FOREIGN INDEX.
    I think it is a little ugly and won't work in some cases -- for example,
    index organized tables -- but evidently it's a realistic solution.

    --
    Itagaki Takahiro
  • Heikki Linnakangas at Sep 30, 2010 at 6:27 am

    On 30.09.2010 04:30, Itagaki Takahiro wrote:
    On Wed, Sep 29, 2010 at 11:09 PM, Alvaro Herrera
    wrote:
    I'm not sure that it's a good idea to embed into the FDW API the set of
    operations known to the executor. For example your proposal fails to
    consider bitmap scans. Seems simpler and more general to hand the quals
    over saying "I need to scan this relation with these quals", and have it
    return an opaque iterable object;
    Agreed. If possible, we will avoid dedicated interfaces for seqscans and
    index scans. However, bitmap scan is difficult anyway because foreign tables
    might not have ctid columns. It's a challenging task to identify individual
    tuples in foreign tables. It will be also used for UPDATE and DELETE.
    There doesn't to be much point in knowing the names of remote indexes
    either (if it came to referencing them, better use OIDs)
    FYI, HiRDB, that implements FDW routines, has CREATE FOREIGN INDEX.
    I think it is a little ugly and won't work in some cases -- for example,
    index organized tables -- but evidently it's a realistic solution.
    A long time ago I used DB2's federated database feature, which is at
    least close to SQL/MED if not fully compatible. When you create a
    "federated index" there. it's just a planner hint to the local database,
    so that it knows how expensive it is to evaluate a qual remotely vs.
    locally. It shouldn't matter what technology the remote index uses in
    that case, as long as the cost model is roughly the same as a b-tree.

    I don't think we want to go down that path though, it's better to leave
    the cost estimation altogether to the wrapper. It has much better
    knowledge of expensive various quals are.

    However, the wrapper will likely need some local storage for indexes and
    like to do the cost estimation. Or maybe it can just keep the
    information in cache, loading it on first use from the remote database.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Shigeru HANADA at Sep 30, 2010 at 7:48 am

    On Thu, 30 Sep 2010 09:26:54 +0300 Heikki Linnakangas wrote:
    FYI, HiRDB, that implements FDW routines, has CREATE FOREIGN INDEX.
    I think it is a little ugly and won't work in some cases -- for example,
    index organized tables -- but evidently it's a realistic solution.
    A long time ago I used DB2's federated database feature, which is at
    least close to SQL/MED if not fully compatible. When you create a
    "federated index" there. it's just a planner hint to the local database,
    so that it knows how expensive it is to evaluate a qual remotely vs.
    locally. It shouldn't matter what technology the remote index uses in
    that case, as long as the cost model is roughly the same as a b-tree.

    I don't think we want to go down that path though, it's better to leave
    the cost estimation altogether to the wrapper. It has much better
    knowledge of expensive various quals are.

    However, the wrapper will likely need some local storage for indexes and
    like to do the cost estimation. Or maybe it can just keep the
    information in cache, loading it on first use from the remote database.
    How about having cost hints in generic option of the foreign table or
    its columns? Generic options are storage for wrappers, not for
    PostgreSQL core modules. Wrappers can use their own format to
    represent various information, and use the hints to estimate costs of
    a path.

    In addition, I think that the generic option of a server could be used
    to store cost hints which depend on each server, such as network
    transfer overhead for dbms wrappers, or disk I/O for file-wrappers.

    Regards,
    --
    Shigeru Hanada
  • Robert Haas at Oct 4, 2010 at 11:32 pm

    On Thu, Sep 30, 2010 at 3:48 AM, Shigeru HANADA wrote:
    On Thu, 30 Sep 2010 09:26:54 +0300
    Heikki Linnakangas wrote:
    FYI, HiRDB, that implements FDW routines, has CREATE FOREIGN INDEX.
    I think it is a little ugly and won't work in some cases -- for example,
    index organized tables -- but evidently it's a realistic solution.
    A long time ago I used DB2's federated database feature, which is at
    least close to SQL/MED if not fully compatible. When you create a
    "federated index" there. it's just a planner hint to the local database,
    so that it knows how expensive it is to evaluate a qual remotely vs.
    locally. It shouldn't matter what technology the remote index uses in
    that case, as long as the cost model is roughly the same as a b-tree.

    I don't think we want to go down that path though, it's better to leave
    the cost estimation altogether to the wrapper. It has much better
    knowledge of expensive various quals are.

    However, the wrapper will likely need some local storage for indexes and
    like to do the cost estimation. Or maybe it can just keep the
    information in cache, loading it on first use from the remote database.
    How about having cost hints in generic option of the foreign table or
    its columns?  Generic options are storage for wrappers, not for
    PostgreSQL core modules.  Wrappers can use their own format to
    represent various information, and use the hints to estimate costs of
    a path.
    I do think we're going to need some kind of local caching of relevant
    information from the foreign side. Really, I doubt that fdwoptions
    are the place for that, though: that's data for the user to input, not
    a place for the wrapper to scribble on internally. The trick is that
    there's a lot of stuff you might want to cache, and we don't really
    know anything about what the format of it is - for example, you might
    have foreign-side statistics that need to get cached locally, but they
    needn't be in the same format we use for pg_statistic. Perhaps part
    of setting up an FDW should be creating tables with prespecified
    definitions and passing the table names to CREATE FOREIGN DATA WRAPPER
    as options. Maybe we could even arrange to set up the dependencies
    appropriately...

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Shigeru HANADA at Oct 5, 2010 at 12:10 pm

    On Mon, 4 Oct 2010 19:31:52 -0400 Robert Haas wrote:
    On Thu, Sep 30, 2010 at 3:48 AM, Shigeru HANADA
    wrote:
    How about having cost hints in generic option of the foreign table or
    its columns? ?Generic options are storage for wrappers, not for
    PostgreSQL core modules. ?Wrappers can use their own format to
    represent various information, and use the hints to estimate costs of
    a path.
    I do think we're going to need some kind of local caching of relevant
    information from the foreign side. Really, I doubt that fdwoptions
    are the place for that, though: that's data for the user to input, not
    a place for the wrapper to scribble on internally. The trick is that
    there's a lot of stuff you might want to cache, and we don't really
    know anything about what the format of it is - for example, you might
    have foreign-side statistics that need to get cached locally, but they
    needn't be in the same format we use for pg_statistic. Perhaps part
    of setting up an FDW should be creating tables with prespecified
    definitions and passing the table names to CREATE FOREIGN DATA WRAPPER
    as options. Maybe we could even arrange to set up the dependencies
    appropriately...
    Agreed. I withdraw the idea to store foreign-side statistics into
    generic options.

    Can we treat statistics of a foreign table separately?

    1. Same as local tables (maybe required)
    (pg_statistic.*, pg_class.reltuples/relpages)
    They will be used by planner/optimizer to estimate basic costs based
    on tuple selectivity, result row count and so on. Such statistics
    could be generated by ANALYZE module if the FDW can supply all tuples
    from foreign side. The basic costs should be able to correct by FDW
    via another API, because foreign queries might have some overheads,
    such as connection and transfer.
    ISTM that it is very difficult for non-PG FDW to generate PG-style
    statistics correctly.

    2. depend on FDW (optional)
    (in various, arbitrary format)
    They will be used by FDW to optimize query to be executed on
    foreign-side in their own way. As you say, new table(s) to store such
    statistics can be created during creation of new FOREIGN DATA WRAPPER
    or installation of new fdw_handler module. Maybe ANALYZE should call
    another API which collect these kind of statistics.

    I think that(1) is necessary in the first step, but (2) is not.

    Regards,
    --
    Shigeru Hanada
  • Tom Lane at Oct 5, 2010 at 2:25 pm

    Shigeru HANADA writes:
    Can we treat statistics of a foreign table separately?
    1. Same as local tables (maybe required)
    (pg_statistic.*, pg_class.reltuples/relpages)
    This whole discussion seems to me to be about trying to do things outside
    the FDW that should properly be left inside the FDW. Who's to say that
    the remote side even *has* statistics of the sort that PG creates?

    We should provide an API that lets the FDW return a cost estimate for a
    proposed access path. Where it gets the cost estimate from is not
    something that should be presupposed.

    regards, tom lane
  • Robert Haas at Oct 5, 2010 at 2:27 pm

    On Tue, Oct 5, 2010 at 10:25 AM, Tom Lane wrote:
    Shigeru HANADA <hanada@metrosystems.co.jp> writes:
    Can we treat statistics of a foreign table separately?
    1. Same as local tables (maybe required)
    (pg_statistic.*, pg_class.reltuples/relpages)
    This whole discussion seems to me to be about trying to do things outside
    the FDW that should properly be left inside the FDW.  Who's to say that
    the remote side even *has* statistics of the sort that PG creates?

    We should provide an API that lets the FDW return a cost estimate for a
    proposed access path.  Where it gets the cost estimate from is not
    something that should be presupposed.
    Unless there's some way for the FDW to have local tables for caching
    its statistics, the chances of this having decent performance seem to
    be near-zero.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Tom Lane at Oct 5, 2010 at 2:42 pm

    Robert Haas writes:
    On Tue, Oct 5, 2010 at 10:25 AM, Tom Lane wrote:
    This whole discussion seems to me to be about trying to do things outside
    the FDW that should properly be left inside the FDW.  Who's to say that
    the remote side even *has* statistics of the sort that PG creates?

    We should provide an API that lets the FDW return a cost estimate for a
    proposed access path.  Where it gets the cost estimate from is not
    something that should be presupposed.
    Unless there's some way for the FDW to have local tables for caching
    its statistics, the chances of this having decent performance seem to
    be near-zero.
    Perhaps, but that would be the FDW's problem to implement. Trying to
    design such tables in advance of actually writing an FDW seems like a
    completely backwards design process.

    (I'd also say that your performance estimate is miles in advance of any
    facts; but even if it's true, the caching ought to be inside the FDW,
    because we have no clear idea of what it will need to cache.)

    regards, tom lane
  • Robert Haas at Oct 5, 2010 at 2:56 pm

    On Tue, Oct 5, 2010 at 10:41 AM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    On Tue, Oct 5, 2010 at 10:25 AM, Tom Lane wrote:
    This whole discussion seems to me to be about trying to do things outside
    the FDW that should properly be left inside the FDW.  Who's to say that
    the remote side even *has* statistics of the sort that PG creates?

    We should provide an API that lets the FDW return a cost estimate for a
    proposed access path.  Where it gets the cost estimate from is not
    something that should be presupposed.
    Unless there's some way for the FDW to have local tables for caching
    its statistics, the chances of this having decent performance seem to
    be near-zero.
    Perhaps, but that would be the FDW's problem to implement.  Trying to
    design such tables in advance of actually writing an FDW seems like a
    completely backwards design process.
    Oh, I agree. I don't want to dictate the structure of those tables; I
    just think it's inevitable that an FDW is going to need the ability to
    be bound to some local tables which the admin should set up before
    installing it. That is, we need a general capability, not a specific
    set of tables.
    (I'd also say that your performance estimate is miles in advance of any
    facts; but even if it's true, the caching ought to be inside the FDW,
    because we have no clear idea of what it will need to cache.)
    I can't imagine how an FDW could possibly be expected to perform well
    without some persistent local data storage. Even assume the remote
    end is PG. To return a cost, it's going to need the contents of
    pg_statistic cached locally, for each remote table. Do you really
    think it's going to work to incur that overhead once per table per
    backend startup? Or else every time we try to plan against a foreign
    table we can fire off an SQL query to the remote side instead of
    trying to compute the cost locally. That's got to be two orders of
    magnitude slower than planning based off local stats.

    We could punt the issue of stats altogether for the first version and
    simply say, hey, this is only intended for things like reading from
    CSV files. But if we're going to have it at all then I can't see how
    we're going to get by without persistent local storage.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Tom Lane at Oct 5, 2010 at 3:07 pm

    Robert Haas writes:
    On Tue, Oct 5, 2010 at 10:41 AM, Tom Lane wrote:
    (I'd also say that your performance estimate is miles in advance of any
    facts; but even if it's true, the caching ought to be inside the FDW,
    because we have no clear idea of what it will need to cache.)
    I can't imagine how an FDW could possibly be expected to perform well
    without some persistent local data storage. Even assume the remote
    end is PG. To return a cost, it's going to need the contents of
    pg_statistic cached locally, for each remote table.
    Or it could ask the remote side.
    Do you really
    think it's going to work to incur that overhead once per table per
    backend startup?
    If you have a cache, how are you going to manage updates of it?

    IMO this is a *clear* case of premature optimization being the root of
    all evil. We should get it working first and then see what needs to be
    optimized by measuring, rather than guessing in a vacuum.

    (BTW, if the remote end is PG I would hardly think that we'd send SQL
    queries at all. If we're interested in micro-optimization, we'd devise
    some lower-level protocol.)

    regards, tom lane
  • Robert Haas at Oct 5, 2010 at 3:39 pm

    On Tue, Oct 5, 2010 at 11:06 AM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    On Tue, Oct 5, 2010 at 10:41 AM, Tom Lane wrote:
    (I'd also say that your performance estimate is miles in advance of any
    facts; but even if it's true, the caching ought to be inside the FDW,
    because we have no clear idea of what it will need to cache.)
    I can't imagine how an FDW could possibly be expected to perform well
    without some persistent local data storage.  Even assume the remote
    end is PG.  To return a cost, it's going to need the contents of
    pg_statistic cached locally, for each remote table.
    Or it could ask the remote side.
    FWIW, I mentioned that option in that part you didn't quote.
    Do you really
    think it's going to work to incur that overhead once per table per
    backend startup?
    If you have a cache, how are you going to manage updates of it?
    I'm not. I'm going to let the FDW determine how often it would like
    to refresh its cache, as well as what it would like to cache and in
    what format it would like to cache it.
    IMO this is a *clear* case of premature optimization being the root of
    all evil.  We should get it working first and then see what needs to be
    optimized by measuring, rather than guessing in a vacuum.
    I have no problem with punting the issue of remote statistics to some
    time in the future. But I don't think we should have a half-baked
    implementation of remote statistics. We should either do it right
    (doing such testing as is necessary to establish what that means) or
    not do it at all. Frankly, if we could get from where we are today to
    a workable implementation of this technology for CSV files in time for
    9.1, I think that would be an impressive accomplishment. Making it
    work for more complicated cases is almost certainly material for 9.2,
    9.3, 9.4, and maybe further out than that.
    (BTW, if the remote end is PG I would hardly think that we'd send SQL
    queries at all.  If we're interested in micro-optimization, we'd devise
    some lower-level protocol.)
    Interesting.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Tom Lane at Oct 5, 2010 at 4:49 pm

    Robert Haas writes:
    On Tue, Oct 5, 2010 at 11:06 AM, Tom Lane wrote:
    IMO this is a *clear* case of premature optimization being the root of
    all evil.  We should get it working first and then see what needs to be
    optimized by measuring, rather than guessing in a vacuum.
    I have no problem with punting the issue of remote statistics to some
    time in the future. But I don't think we should have a half-baked
    implementation of remote statistics. We should either do it right
    (doing such testing as is necessary to establish what that means) or
    not do it at all. Frankly, if we could get from where we are today to
    a workable implementation of this technology for CSV files in time for
    9.1, I think that would be an impressive accomplishment. Making it
    work for more complicated cases is almost certainly material for 9.2,
    9.3, 9.4, and maybe further out than that.
    I quite agree that it's going to take multiple release cycles to have
    a really impressive version of SQL/MED. What I'm saying is that caching
    remote statistics is several versions out in my view of the world, and
    designing support for it shouldn't be happening now.

    Maybe we ought to take a step back and discuss what the development plan
    ought to be, before we spend more time on details like this. My idea of
    a good development process would involve working in parallel on at least
    two FDW adapters, so that we don't find we've skewed the API design to
    meet the needs of just one adapter. Probably a remote-PG adapter and a
    local-CSV-file adapter would be a good combination. I don't have any
    clear idea of how soon we might expect to see how much functionality,
    though. Thoughts?

    regards, tom lane
  • Robert Haas at Oct 5, 2010 at 6:43 pm

    On Tue, Oct 5, 2010 at 12:49 PM, Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    On Tue, Oct 5, 2010 at 11:06 AM, Tom Lane wrote:
    IMO this is a *clear* case of premature optimization being the root of
    all evil.  We should get it working first and then see what needs to be
    optimized by measuring, rather than guessing in a vacuum.
    I have no problem with punting the issue of remote statistics to some
    time in the future.  But I don't think we should have a half-baked
    implementation of remote statistics.  We should either do it right
    (doing such testing as is necessary to establish what that means) or
    not do it at all.  Frankly, if we could get from where we are today to
    a workable implementation of this technology for CSV files in time for
    9.1, I think that would be an impressive accomplishment.  Making it
    work for more complicated cases is almost certainly material for 9.2,
    9.3, 9.4, and maybe further out than that.
    I quite agree that it's going to take multiple release cycles to have
    a really impressive version of SQL/MED.  What I'm saying is that caching
    remote statistics is several versions out in my view of the world, and
    designing support for it shouldn't be happening now.
    Fair enough.
    Maybe we ought to take a step back and discuss what the development plan
    ought to be, before we spend more time on details like this.
    Good idea.
    My idea of
    a good development process would involve working in parallel on at least
    two FDW adapters, so that we don't find we've skewed the API design to
    meet the needs of just one adapter.  Probably a remote-PG adapter and a
    local-CSV-file adapter would be a good combination.  I don't have any
    clear idea of how soon we might expect to see how much functionality,
    though.  Thoughts?
    I'm somewhat afraid that a remote-PG adapter will turn into a can of
    worms. If we give up on remote statistics, does that mean we're also
    giving up on index use on the remote side? I fear that we'll end up
    crafting partial solutions that will only end up getting thrown away,
    but after a lot of work has been invested in them. I wonder if we
    should focus on first efforts on really simple cases like CSV files
    (as you mentioned) and perhaps something like memcached, which has
    different properties than a CSV file, but extremely simple ones. I
    think it's inevitable that the API is going to get more complicated
    from release to release and probably not in backward-compatible ways;
    I think it's too early to be worried about that.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise Postgres Company
  • Tom Lane at Oct 5, 2010 at 6:59 pm

    Robert Haas writes:
    I'm somewhat afraid that a remote-PG adapter will turn into a can of
    worms. If we give up on remote statistics, does that mean we're also
    giving up on index use on the remote side? I fear that we'll end up
    crafting partial solutions that will only end up getting thrown away,
    but after a lot of work has been invested in them.
    Per Fred Brooks: "Plan to throw one away. You will anyhow."
    I wonder if we
    should focus on first efforts on really simple cases like CSV files
    (as you mentioned) and perhaps something like memcached, which has
    different properties than a CSV file, but extremely simple ones. I
    think it's inevitable that the API is going to get more complicated
    from release to release and probably not in backward-compatible ways;
    I think it's too early to be worried about that.
    +1 ... we can *not* assume that we know enough to get the API right
    the first time. It'll have to be stated up front that it's subject to
    change and you probably shouldn't be trying to develop FDWs separately
    from the main project for awhile yet.

    I think that looking only at CSV might be too simple, though. It'd be
    good to be at least trying to build a remote-PG adapter, even with the
    knowledge that it's all throwaway code.

    regards, tom lane
  • Shigeru HANADA at Oct 7, 2010 at 11:21 am

    On Tue, 05 Oct 2010 14:59:27 -0400 Tom Lane wrote:
    Robert Haas <robertmhaas@gmail.com> writes:
    I wonder if we
    should focus on first efforts on really simple cases like CSV files
    (as you mentioned) and perhaps something like memcached, which has
    different properties than a CSV file, but extremely simple ones. I
    think it's inevitable that the API is going to get more complicated
    from release to release and probably not in backward-compatible ways;
    I think it's too early to be worried about that.
    +1 ... we can *not* assume that we know enough to get the API right
    the first time. It'll have to be stated up front that it's subject to
    change and you probably shouldn't be trying to develop FDWs separately
    from the main project for awhile yet.

    I think that looking only at CSV might be too simple, though. It'd be
    good to be at least trying to build a remote-PG adapter, even with the
    knowledge that it's all throwaway code.
    Thanks for comments, and sorry for delayed reply.

    I propose revised version of FDW API which supports both of CSV and PG.
    It would be a minimal API to support SQL/MED, so some features such as
    optimizing query or using remote index are not considered much.

    ======================
    This FDW API is designed to support scanning a foreign table. As
    SQL/MED standard says, updating external data is not supported. The
    instance of a foreign table may be a CSV file or a table on an
    external PostgreSQL server, or anything which could generate
    PG-style tuples.

    Scanning of a foreign table will be done in some steps below.
    Note: FdwRoutine->XXX() means FDW API function.

    1. Parser parses a query string and generate a query tree. In this
    step, a foreign table is not distinguished from a local table.

    2. Optimizer generates a Path pathnode (pathtype = T_ForeignPath) for
    each foreign table in the query. Because foreign tables don't have
    TID nor index (at least now), we don't add new Path type.

    In this step, optimizer calls FdwRoutine->GetStatistics() to get
    estimated costs of the foreign scan. FDW can calculate costs by
    itself with some statistics, or get costs from remote-side, or leave
    the costs as is.

    3. Planner generates a ForeignScan plannode for each foreign Path.
    ForeignScan could be typedef of Scan because it has no other member,
    but it have been defined as a independent structure to make it be able
    to add other members in the future.

    4. Executor executes each ForeignScan node in some steps. Through the
    Step-4, FDW can keep their own information about the foreign scan in the
    ForeignScan->FdwReply.

    4-a. To initialize a ForeignScan plannodes, ExecInitForeignScan() is
    called for each ForeignScan node. In ExecInitForeignScan(),
    ForeignScanState will be created from:

    * ScanState which hold same information as SeqScan.
    * FdwRoutine pointer as cache (for cache).
    * some catalog information about the foreign table (for cache).
    * connection established with FdwRoutine->ConnectServer().
    FDW can use GetFSConnectionByName(name) to get pooled connection by
    name in FdwRoutine->ConnectServer().

    Then, ExecInitForeignScan() calls FdwRouteine->Open() to tell FDW that
    it's time to start the query. FDW can do some initialization if
    necessary.

    4-b. To retrieve a tuple from the foreign table, ExecForeignScan()
    calls FdwRoutine->Iterate(). FDW should set next tuple into
    ss_ScanTupleSlot of ScanState or empty the slot to indicate EOF.
    Projection and qual evaluation will be done in ExecScan() later, so
    ExecForeignScan() should return all columns/tuples in the external
    data store.

    4-c. To reset the scan and rewind cursor to the head of the foreign
    table, ExecForeignReScan() calls FdwRoutine->ReOpen(). This occurs
    when a ForeignScan node is a inner node of a nested loop join.
    FDW is required to return the first tuple again at next
    FdwRoutine->Iterate() call.

    4-d. At the end of execution of ForeignScan, ExecEndForeignScan() is
    called. ExecEndForeignScan() calls FdwRoutine->Close() to tell FDW
    that no more Iterate will called. FDW can do some finalization if
    necessary.

    5. Connections which have established via FdwRoutine->ConnectServer()
    are pooled in the backend for future query which accesses same foreign
    server. Pooling mechanism is implemented in core module, not in the
    each FDW.

    When a user executes DISCARD ALL command, or backend dies,
    FdwRoutine->FreeFSConnection() is called for each foreign connection
    to discard it.

    To achieve features above, I propose following FDW API:

    /*
    * Almost same as SeqScan, but defined as a structure to allow add
    * members in the future.
    */
    typedef struct ForeignScan {
    Scan scan;
    } ForeignScan;

    /*
    * Handle to access FDW-depend data.
    * Each FDW can use ForeignScanState->reply with casting between FdwReply
    * and actual type.
    */
    typedef FdwReply FdwReply;

    /*
    * ForeignScanState node is used to store scan status.
    */
    typedef struct ForeignScanState {
    ScanState ss; /* its first field is NodeTag */

    FdwRoutine *routine; /* set of FDW routines */
    ForeignDataWrapper *wrapper;/* foreign data wrapper */
    ForeignServer *server; /* foreign server */
    FSConnection *conn; /* connection to the foreign server */
    UserMapping *user; /* user mapping */
    ForeignTable *table; /* foreign table */
    FdwReply *reply; /* private data for each data wrapper */
    } ForeignScanState;

    typedef struct FdwRoutine
    {
    /*
    * Connect to the foreign server identified by server and user.
    */
    FSConnection* (*ConnectServer)(ForeignServer *server, UserMapping *user);

    /*
    * Disconnect from the foreign server and free FSConnection object.
    */
    void (*FreeFSConnection)(FSConnection *conn);

    /*
    * Estimate costs of a foreign path. FDW should update startup_cost
    * and total_cost of the Path.
    * To estimate cost, PG FDW might generate "EXPLAIN" SQL and
    * execute it on the remote side, or collect statistics in
    * somewhere and calculate from them.
    * This parameters are same as cost_foreignscan() in
    * optimizer/path/costsize.c (the function is just a proxy).
    */
    void (*GetStatistics)(Path *path, PlannerInfo *root, RelOptInfo *baserel);

    /*
    * Prepare to return tuples.
    */
    void (*Open)(ForeignScanState *scanstate);

    /*
    * Fetch the next tuple and fill tupleslot with it, or clear the slot
    * to indicate EOF.
    */
    void (*Iterate)(ForeignScanState *scanstate);

    /*
    * End the foreign scan and do some cleanup if necessary.
    */
    void (*Close)(ForeignScanState *scanstate);

    /*
    * Re-initialize the foreign scan, used when the INNER executor node is
    * executed again.
    */
    void (*ReOpen)(ForeignScanState *scanstate);
    } FdwRoutine;
    ======================

    Issues from ideas above:

    == Connection Pooling ==
    Connect on each foreign scan is obviously useless, but it's open to
    argument that who should manage the pooled connections, because for
    CSV wrapper, no "connection" is needed at all, but PG wrapper might
    connect once per database per user.

    Maybe DBAs want to see how much and what kind of connections are
    established now, so we should show list of active connections via a
    view or function. To treat connections transparently, I propose
    that connection pooling mechanism in the core module, not in each FDW
    module. Or should we add API to return list of active connections in
    common format?

    If we have decided to leave connection management to each FDWs,
    ConnectServer() and FreeFSConnection() can be removed from API.

    == ANALYZE support ==
    Even if a FDW wants to keep statistics in local area such as
    pg_class.reltuples, current API doesn't provide appropriate timing.

    Should we add API which is called from ANALYZE to allow FDWs to handle
    statistics when user wants.

    Any comments are welcome.

    Regards,
    --
    Shigeru Hanada
  • Robert Haas at Oct 11, 2010 at 5:22 pm

    On Thu, Oct 7, 2010 at 7:20 AM, Shigeru HANADA wrote:
    [ design for FDW ]
    This seems mostly sensible.
    In this step, optimizer calls FdwRoutine->GetStatistics() to get
    estimated costs of the foreign scan.  FDW can calculate costs by
    itself with some statistics, or get costs from remote-side, or leave
    the costs as is.
    I think this should be called EstimateCosts rather than GetStatistics.
    Maybe DBAs want to see how much and what kind of connections are
    established now, so we should show list of active connections via a
    view or function.  To treat connections transparently, I propose
    that connection pooling mechanism in the core module, not in each FDW
    module.  Or should we add API to return list of active connections in
    common format?
    I think for starters we should make this the responsibility of each
    FDW. It seems a bit hard to speculate about what we might want down
    the road at this point.
    Should we add API which is called from ANALYZE to allow FDWs to handle
    statistics when user wants.
    I would probably put this off also, to a later phase of the project.

    --
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
  • Shigeru HANADA at Oct 12, 2010 at 6:28 am

    On Mon, 11 Oct 2010 13:22:13 -0400 Robert Haas wrote:
    On Thu, Oct 7, 2010 at 7:20 AM, Shigeru HANADA
    wrote:
    In this step, optimizer calls FdwRoutine->GetStatistics() to get
    estimated costs of the foreign scan. ?FDW can calculate costs by
    itself with some statistics, or get costs from remote-side, or leave
    the costs as is.
    I think this should be called EstimateCosts rather than GetStatistics.
    Agreed, the name you suggested would be better.
    I've chosen the name GetStatistics because a function with that name
    is defined in SQL/MED standard as below, but GetStatistics might
    rather match handling of ANALYZE request in PostgreSQL.

    Section 22.3.28 GetStatistics (ISO/IEC 9075-9:2003)
    Retrieve implementation-defined statistics associated with a
    foreign server request.
    Maybe DBAs want to see how much and what kind of connections are
    established now, so we should show list of active connections via a
    view or function. ?To treat connections transparently, I propose
    that connection pooling mechanism in the core module, not in each FDW
    module. ?Or should we add API to return list of active connections in
    common format?
    I think for starters we should make this the responsibility of each
    FDW. It seems a bit hard to speculate about what we might want down
    the road at this point.
    Agreed. I would move connection cache mechanism to postgresql_fdw.
    I also would remove pg_foreign_connections view because it might need
    new interface to get list of active connections from FDWs.
    Should we add API which is called from ANALYZE to allow FDWs to handle
    statistics when user wants.
    I would probably put this off also, to a later phase of the project.
    Agreed.

    Thanks for the review!
    I'll finish the SQL/MED patch by applying your comments.

    Regards,
    --
    Shigeru Hanada

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedSep 15, '10 at 6:04a
activeOct 15, '10 at 1:33a
posts66
users11
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase