I have a table that looks something like this:

url - character varying(1024)
date - timestamptz
body - text

Url is a unique primary key. Body can potentially be a couple hundred
k of text.

There will at first be perhaps 100,000 rows in this table, but at some
point it might get into the millions.

I need to be able to quickly insert into this table (I might be
inserting several rows per second at times). I also need to be able
to very quickly see if a URL already exists in the table, and what the
date value is. Or, query for all "urls" that have a "date" older than
x days.

Am I better off with two tables such as:

Table1:
id - uuid or integer (primary key)
url - unique index
date

Table2:

id - FK to Table2.id
body - text

It makes the program flow a bit more complicated, and I'd have to use
transactions and stuff when inserting new rows. However, for years
I've been told that having rows with large chunks of text is bad for
perf and forces that data to be paged into memory and causes other
various issues. Any advice on this one? Thanks!

Mike

Search Discussions

  • Scott Marlowe at Sep 18, 2011 at 1:02 am

    On Sat, Sep 17, 2011 at 6:46 PM, Mike Christensen wrote:
    I have a table that looks something like this:

    url - character varying(1024)
    date - timestamptz
    body - text

    Url is a unique primary key.  Body can potentially be a couple hundred
    k of text.

    There will at first be perhaps 100,000 rows in this table, but at some
    point it might get into the millions.

    I need to be able to quickly insert into this table (I might be
    inserting several rows per second at times).  I also need to be able
    to very quickly see if a URL already exists in the table, and what the
    date value is.  Or, query for all "urls" that have a "date" older than
    x days.

    Am I better off with two tables such as:

    Table1:
    id - uuid or integer (primary key)
    url - unique index
    date

    Table2:

    id - FK to Table2.id
    body - text

    It makes the program flow a bit more complicated, and I'd have to use
    transactions and stuff when inserting new rows.  However, for years
    I've been told that having rows with large chunks of text is bad for
    perf and forces that data to be paged into memory and causes other
    various issues.  Any advice on this one?  Thanks!
    What would be really cool is if postgresql took values for body that
    were over a few k and compressed them and stored them out of line in
    another table. Luckily for you, that's EXACTLY what it already does.
    http://www.postgresql.org/docs/9.1/static/storage-toast.html Cool eh?
  • Mike Christensen at Sep 18, 2011 at 1:15 am

    On Sat, Sep 17, 2011 at 6:46 PM, Mike Christensen wrote:
    I have a table that looks something like this:

    url - character varying(1024)
    date - timestamptz
    body - text

    Url is a unique primary key.  Body can potentially be a couple hundred
    k of text.

    There will at first be perhaps 100,000 rows in this table, but at some
    point it might get into the millions.

    I need to be able to quickly insert into this table (I might be
    inserting several rows per second at times).  I also need to be able
    to very quickly see if a URL already exists in the table, and what the
    date value is.  Or, query for all "urls" that have a "date" older than
    x days.

    Am I better off with two tables such as:

    Table1:
    id - uuid or integer (primary key)
    url - unique index
    date

    Table2:

    id - FK to Table2.id
    body - text

    It makes the program flow a bit more complicated, and I'd have to use
    transactions and stuff when inserting new rows.  However, for years
    I've been told that having rows with large chunks of text is bad for
    perf and forces that data to be paged into memory and causes other
    various issues.  Any advice on this one?  Thanks!
    What would be really cool is if postgresql took values for body that
    were over a few k and compressed them and stored them out of line in
    another table.  Luckily for you, that's EXACTLY what it already does.
    http://www.postgresql.org/docs/9.1/static/storage-toast.html  Cool eh?
    Man I've been reading this list for years now, and I kept on seeing
    this "TOAST" thing and just figured you people liked it for sandwiches
    or something.

    I feel like the programmer who thinks he's smart using a left bitshift
    operator to double an integer value just to find out the compiler
    already takes that optimization anyway. Are you saying I don't
    actually need to de-frag my hard drive these days either?

    Thanks for the quick reply! I will design my table in a way that
    logically makes sense to me.

    Mike
  • Scott Marlowe at Sep 18, 2011 at 1:20 am

    On Sat, Sep 17, 2011 at 7:15 PM, Mike Christensen wrote:
    Man I've been reading this list for years now, and I kept on seeing
    this "TOAST" thing and just figured you people liked it for sandwiches
    or something.

    I feel like the programmer who thinks he's smart using a left bitshift
    operator to double an integer value just to find out the compiler
    already takes that optimization anyway.  Are you saying I don't
    actually need to de-frag my hard drive these days either?
    Not if you're running most flavors of unix and not filling up the hard
    drive too much. Ext3 and ext4 both tend to defrag as they go, and not
    frag much anyway, especially if you leave a little bit of free space
    for overhead. :)
    Thanks for the quick reply!  I will design my table in a way that
    logically makes sense to me.
    Cool. Toast is one of those mad scientist ideas that really works and
    works well. Before it pg had a pretty severe limit on text / varchar
    columns.
  • Jerry Sievers at Sep 19, 2011 at 10:30 pm

    Mike Christensen writes:

    On Sat, Sep 17, 2011 at 6:46 PM, Mike Christensen wrote:

    What would be really cool is if postgresql took values for body that
    were over a few k and compressed them and stored them out of line in
    another table.  Luckily for you, that's EXACTLY what it already does.
    http://www.postgresql.org/docs/9.1/static/storage-toast.html  Cool eh?
    Man I've been reading this list for years now, and I kept on seeing
    this "TOAST" thing and just figured you people liked it for sandwiches
    or something.

    I feel like the programmer who thinks he's smart using a left bitshift
    operator to double an integer value just to find out the compiler
    already takes that optimization anyway. Are you saying I don't
    actually need to de-frag my hard drive these days either?

    Thanks for the quick reply! I will design my table in a way that
    Do so at your own risk :-)

    Depending on the data model, it could still be good for performance to
    store the large payload data in a secondary table linked to the main
    table containing other data or metadata.

    Especially true if the payload fields are quite often small enough to
    *not* cause toasting. In that scenario, you could end up with some
    very long physical tuples in the main table that will result in a low
    tuple/page ratio and heavy disk reading for any query including those
    not caring about the payload data itself.

    Yet more true if some of the non payload data is frequently updated.

    YMMV
    Mike

    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general
    --
    Jerry Sievers
    Postgres DBA/Development Consulting
    e: postgres.consulting@comcast.net
    p: 305.321.1144

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedSep 18, '11 at 12:46a
activeSep 19, '11 at 10:30p
posts5
users3
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase