For about 5 years now, I have been using a text search engine that I wrote
and maintain.

In the beginning, I hacked up function mechanisms to return multiple value
sets and columns. Then PostgreSQL aded "setof" and it is was cool. Then it
was able to return a set of rows, which was even better.

Lately, I have been thinking that a cool form of index would be some sort
of "persistent reference" index. Like the old ISAM days of yore, a fixed
number could point you right to the row that you want. I'm not sure if the
"persistent reference" is a specific auto numbering column type or
separate index structure or both.

I asked the question how do you get a record without going through an
index, the answer was CTID, which unfortunately changes when the row is
updated.

Now, what I want to brainstorm is some sort of "persistent reference"
where the value is not algorithmically stored, maybe just an offset into a
table. The number of operations should be about 1 per lookup.

Imagine a dynamically growing array that has one slot per row. Every row
is considered unique. Rows which are updated, their CTID is updated in the
reference. (with vacuum?)

Imagine something like this:

create table foobar(id reference, name varchar, value varchar);

select * from foobar where id = 100;

The reference type has an implicit index that is basically a lookup table.
On unique references where the reference value is fairly arbitrary, this
would be a HUGE gain for direct lookups. There is no need for the NlogN of
a tree.

On the surface level, this would be a huge win for websites that use
semi-fixed tables of data.

Search Discussions

  • Bort, Paul at Feb 10, 2005 at 5:55 pm
    If that ID is the only thing you use to access that data, why not just store
    it in a flat file with fixed-length records? seek() (or your language's
    equivalent) is usually fast.

    If you need to drive that from within PostgreSQL, you would need an
    untrusted language to read the file, but you could also generate it from a
    table using a trigger.

    Or maybe use a serial column, an index on that column, and cluster the table
    on that index. It's more than one lookup, but not much with a Btree index.
    (Not sure if this is better than just using a serial and an index.
    http://www.postgresql.org/docs/8.0/interactive/sql-cluster.html says it
    isn't, if I read it correctly.)

    Then anytime there is a batch of updates to the table, re-cluster it.
    -----Original Message-----
    From: pgsql@mohawksoft.com
    Sent: Thursday, February 10, 2005 11:22 AM
    To: pgsql-hackers@postgresql.org
    Subject: [HACKERS] New form of index "persistent reference"


    For about 5 years now, I have been using a text search engine
    that I wrote
    and maintain.

    In the beginning, I hacked up function mechanisms to return
    multiple value
    sets and columns. Then PostgreSQL aded "setof" and it is was
    cool. Then it
    was able to return a set of rows, which was even better.

    Lately, I have been thinking that a cool form of index would
    be some sort
    of "persistent reference" index. Like the old ISAM days of
    yore, a fixed
    number could point you right to the row that you want. I'm
    not sure if the
    "persistent reference" is a specific auto numbering column type or
    separate index structure or both.

    I asked the question how do you get a record without going through an
    index, the answer was CTID, which unfortunately changes when
    the row is
    updated.

    Now, what I want to brainstorm is some sort of "persistent reference"
    where the value is not algorithmically stored, maybe just an
    offset into a
    table. The number of operations should be about 1 per lookup.

    Imagine a dynamically growing array that has one slot per
    row. Every row
    is considered unique. Rows which are updated, their CTID is
    updated in the
    reference. (with vacuum?)

    Imagine something like this:

    create table foobar(id reference, name varchar, value varchar);

    select * from foobar where id = 100;

    The reference type has an implicit index that is basically a
    lookup table.
    On unique references where the reference value is fairly
    arbitrary, this
    would be a HUGE gain for direct lookups. There is no need for
    the NlogN of
    a tree.

    On the surface level, this would be a huge win for websites that use
    semi-fixed tables of data.



    ---------------------------(end of
    broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to
    majordomo@postgresql.org
  • Mark L. Woodward at Feb 10, 2005 at 6:08 pm

    If that ID is the only thing you use to access that data, why not just
    store
    it in a flat file with fixed-length records? seek() (or your language's
    equivalent) is usually fast.
    As a matter of policy, I would never manage data outside of the database.
    If you need to drive that from within PostgreSQL, you would need an
    untrusted language to read the file, but you could also generate it from a
    table using a trigger.
    Very ugly.
    Or maybe use a serial column, an index on that column, and cluster the
    table
    on that index. It's more than one lookup, but not much with a Btree index.
    (Not sure if this is better than just using a serial and an index.
    http://www.postgresql.org/docs/8.0/interactive/sql-cluster.html says it
    isn't, if I read it correctly.)
    Clustering is OK, but it doesn't handle updates and additions until you
    recluster the data.

    If a static reference is all that is needed, then merely using CTID would
    suffice. I was thinking a little overhead for a reference table would
    allow it to hook into PostgreSQL and keep it up to date.


    Then anytime there is a batch of updates to the table, re-cluster it.
    Yea, like I said, there are easier ways of doing that with fairly static
    data.

    -----Original Message-----
    From: pgsql@mohawksoft.com
    Sent: Thursday, February 10, 2005 11:22 AM
    To: pgsql-hackers@postgresql.org
    Subject: [HACKERS] New form of index "persistent reference"


    For about 5 years now, I have been using a text search engine
    that I wrote
    and maintain.

    In the beginning, I hacked up function mechanisms to return
    multiple value
    sets and columns. Then PostgreSQL aded "setof" and it is was
    cool. Then it
    was able to return a set of rows, which was even better.

    Lately, I have been thinking that a cool form of index would
    be some sort
    of "persistent reference" index. Like the old ISAM days of
    yore, a fixed
    number could point you right to the row that you want. I'm
    not sure if the
    "persistent reference" is a specific auto numbering column type or
    separate index structure or both.

    I asked the question how do you get a record without going through an
    index, the answer was CTID, which unfortunately changes when
    the row is
    updated.

    Now, what I want to brainstorm is some sort of "persistent reference"
    where the value is not algorithmically stored, maybe just an
    offset into a
    table. The number of operations should be about 1 per lookup.

    Imagine a dynamically growing array that has one slot per
    row. Every row
    is considered unique. Rows which are updated, their CTID is
    updated in the
    reference. (with vacuum?)

    Imagine something like this:

    create table foobar(id reference, name varchar, value varchar);

    select * from foobar where id = 100;

    The reference type has an implicit index that is basically a
    lookup table.
    On unique references where the reference value is fairly
    arbitrary, this
    would be a HUGE gain for direct lookups. There is no need for
    the NlogN of
    a tree.

    On the surface level, this would be a huge win for websites that use
    semi-fixed tables of data.



    ---------------------------(end of
    broadcast)---------------------------
    TIP 1: subscribe and unsubscribe commands go to
    majordomo@postgresql.org
  • Merlin Moncure at Feb 10, 2005 at 6:27 pm

    Lately, I have been thinking that a cool form of index would be some sort
    of "persistent reference" index. Like the old ISAM days of yore, a fixed
    number could point you right to the row that you want. I'm not sure if the
    "persistent reference" is a specific auto numbering column type or
    separate index structure or both.
    What you are talking about is a 'relative file'. It turns out on modern
    ISAM file systems, the win you get over b-tree indexing is not worth
    losing the ability to do simple things like run-length compression on
    strings.

    Anyways, while storing a physical offset is O(1), so is computing a
    hash. How would a hash index not fill your need?

    Merlin
  • Andreas Zeugswetter at Feb 11, 2005 at 11:44 am

    I asked the question how do you get a record without going through an
    index, the answer was CTID, which unfortunately changes when the row is
    updated.
    The ctid is a physical location of the row. On update a new tuple is written
    in a new location, that is why the ctid changes. The old tuple has a system field
    t_ctid which is then a forward pointer to the new tuple.

    Thus you can follow that chain until the visible tuple is found.
    The current tid = tid does not do that (I think because the ODBC driver
    which was the first to use it (for result set modification) needed to notice when the
    tuple was updated underneath).

    But you can use:
    select * from atab where ctid = currtid2('atab', '(0,1)'); -- '(0,1)' is the old ctid

    Of course the old ctid is only valid until (auto)vacuum marks it free.
    Without vacuum you are currently safe to use the currtid functions.

    Andreas

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedFeb 10, '05 at 4:18p
activeFeb 11, '05 at 11:44a
posts5
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase