On 5/8/13 12:54 PM, Jonathan S. Katz wrote:
On May 8, 2013, at 1:16 PM, Tom Lane wrote:

Heikki Linnakangas <hlinnakangas@vmware.com> writes:
On 08.05.2013 19:44, Tom Lane wrote:
No there isn't; what you suggest would require FE/BE protocol
extensions, making it several orders of magnitude more work than the
other thing.
I'd imagine that the flow would go something like this:
That would require the backend to buffer the entire query response,
which isn't a great idea. I would expect that such an operation would
need to interleave CopyData to the backend with DataRow responses. Such
a thing could possibly be built on COPY_BOTH mode, but it would be a lot
of work (at both ends) for extremely debatable value.

The general idea of COPY is to load data as fast as possible, so weighing
it down with processing options seems like a pretty dubious idea even if
the implementation were easy.
There are cases that I indeed want to load data very quickly, but I want to perform an operation on it immediately after, e.g. removing bad data that was immediately added from that copy. For instance, I do have this scenario:

WITH new_data AS (
RETURNING id, field_to_check
USING new_data
table.id = new_data.id AND
new_data.field_to_check ~* 'bad data';

Now I can take care of that all in one step, and I know I'm only removing fields I just added. This comes up when I am importing external files from other sources where I may not necessarily want all of the rows or some of the rows contain bad data.

This also presumes that COPY works in a CTE, which I'm not sure it does (and I will do the TIAS test after I hit send on this message).
What you're really asking for here is some kind of stream processing capability. There are spin-offs of Postgres that provide that capability (I know Neil Conway worked on some). Those are geared completely around stream processing, but I think it would be extremely interesting to provide some minimal support for that in community Postgres.

Using your use case as an example, something like this would be very interesting:

COPY table FROM ...
    WHERE field_to_check !~* 'bad data'

In this case we're just applying a simple WHERE clause against each incoming row.

Perhaps what I'm suggesting could be implemented with a CTE, but I'm not sure it makes sense to do it the way you propose, at least not initially. A CTE would provide so much flexibility that it'd be difficult for the optimizer to be efficient about it. Something like a WHERE clause directly on COPY would be a lot easier to handle. As someone mentioned, FDW might be another option there.
Jim C. Nasby, Data Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 12 of 15 | next ›
Discussion Overview
grouppgsql-hackers @
postedMay 8, '13 at 3:50p
activeMay 8, '13 at 7:44p



site design / logo © 2021 Grokbase