On Thu, Aug 31, 2006 at 09:17:54AM -0400, Merlin Moncure wrote:
On 8/30/06, Randall Lucas wrote:

An inverted way of thinking about the problem is the notion of getting a
source document (say, a "company registration form") and parsing and
storing it in a normalized format. How do you "tag" the origin of each
and every row in every table that resulted from that source document?
your form should have a code, either entered by the customer or by the
preparer who enters it into the database, which becomes the key that
identifies the registration document. Put that key into other tables.
Yes -- I have flirted with this idea for the case of importing a set of
facts from a single source. But where it breaks down is in being able
to do the reverse -- ex post facto, to certify the results of a query
as being accurate, and thereby attesting to the underlying facts.
be careful, you are flirting with EAV thinking. I think EAV designs
are terrible.
(Though religion is not at issue here, I am pretty devoutly relational.
I just want a good way to perform audits and other functions on my
relations.)
It seems to me that the elegant way to do this would be to get the
entire graph of dependencies for not only all tables that reference the
entity, but only those rows within those tables that refer specifically
to my entity.

The query engine should have a pretty good idea about which tables and
which rows would actually be used in forming the responses to a given
query. How can I get this information? Or am I off the deep end (-ency
graph) with this one?
I am not sure where you are going with this. Maybe you should mock up
some simple tables and repost your question.
Imagine that I am the SEC. I look for cases of insider trading, and
when I find a case, I have to go back and double-check all the pieces
of evidence I am using against them.

create table company (
id serial primary key,
name text not null
);

create table officer (
id serial primary key,
company_id int not null references company(id) on delete cascade,
name text not null,
title text not null
);

create table insider_trade (
id serial primary key,
officer_id int not null references officer(id) on delete cascade,
shares_traded numeric,
share_price numeric,
trade_date timestamptz
);

insert into company (name) values ('goodco');
insert into company (name) values ('badco');
insert into officer (company_id, name, title) values (1, 'johnny b. good', 'ceo');
insert into officer (company_id, name, title) values (1, 'mother teresa', 'saint');
insert into officer (company_id, name, title) values (2, 'leroy brown', 'ceo');
insert into insider_trade (officer_id, shares_traded, share_price, trade_date) values
(3, '50000', '6.66', '2006-07-04 1:23 PM PST');

Now, my database looks like this:

select * from company left join officer on company.id=officer.company_id left join insider_trade on officer.id=insider_trade.officer_id;
id | name | id | company_id | name | title | id | officer_id | shares_traded | share_price | trade_date
----+--------+----+------------+----------------+-------+----+------------+---------------+-------------+------------------------
1 | goodco | 1 | 1 | johnny b. good | ceo | | | | |
1 | goodco | 2 | 1 | mother teresa | saint | | | | |
2 | badco | 3 | 2 | leroy brown | ceo | 1 | 3 | 50000 | 6.66 | 2006-07-04 14:23:00-07
(3 rows)

If I want to know whom to investigate, I might do a query like this:

select * from company left join officer on company.id=officer.company_id left join insider_trade on officer.id=insider_trade.officer_id where insider_trade.id is not null;
id | name | id | company_id | name | title | id | officer_id | shares_traded | share_price | trade_date
----+-------+----+------------+-------------+-------+----+------------+---------------+-------------+------------------------
2 | badco | 3 | 2 | leroy brown | ceo | 1 | 3 | 50000 | 6.66 | 2006-07-04 14:23:00-07
(1 row)

Now that I have this query, in order to make my case, I need to "sign
off" on all of the individual data that went into it. I would like to
do something like:

select last_query_shown_tuples();
schema | table_name | pk_columns | pk_values
--------+---------------+------------+-----------
public | company | [id] | [2]
public | officer | [id] | [3]
public | insider_trade | [id] | [1]
(3 rows)

(I am aware that, at least theoretically, my query is just as reliant
upon the tuples that it *didn't* show, since they were acted upon by
the where clause.)

(I am also aware that this example is sub-optimal in that the primary
keys are all single ints called 'id' and that they are present
individually in the query result -- imagine that I used an aggregate
function in a GROUP BY, for example, and you can see how the ids relied
upon would become opaque, or imagine a multicolumn pk and you
understand my odd notation.)

Finally, I'd want to end up with something like:

select last_query_shown_tuples();
schema | table_name | pk_columns | pk_values | audited_ts
--------+---------------+------------+-----------+------------
public | company | [id] | [2] | 2006-08-31 18:52
public | officer | [id] | [3] | 2006-08-31 18:55
public | insider_trade | [id] | [1] | 2006-08-31 18:57
(3 rows)

Is something like this even possible, much less doable at present? I
have an inkling that with all of the capabilities of information_schema,
rules, and triggers, it should be pretty close to doable.

Best,

Randall

--
Randall Lucas Tercent, Inc. DF93EAD1

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 6 | next ›
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedAug 30, '06 at 11:23p
activeSep 5, '06 at 6:02p
posts6
users2
websitepostgresql.org
irc#postgresql

2 users in discussion

Merlin Moncure: 3 posts Randall Lucas: 3 posts

People

Translate

site design / logo © 2022 Grokbase