FAQ
It looks like the problem is that the default value is getting inserted
without benefit of conversion, ie, whatever the given text is will get
dropped into the finished tuple without padding/truncation to the
specified char(n) length.

Later, when we try to read out the tuple, the tuple access routines
figure they know how big a char(n) is, so they don't actually look
to see what the varlena count is. This results in misalignment of
following fields, causing either wrong data readout or a full-bore
crash.

Test case:

CREATE TABLE test (
plt int2 PRIMARY KEY,
state CHAR(5) NOT NULL DEFAULT 'new',
used boolean NOT NULL DEFAULT 'f',
id int4
);

INSERT INTO test (plt, id) VALUES (2, 3);

Examination of the stored tuple shows it contains 32 bytes of data:

0x400d7f30: 0x00 0x02 0x00 0x00 0x00 0x00 0x00 0x07
0x400d7f38: 0x6e 0x65 0x77 0x00 0x00 0x00 0x00 0x03

which deconstructs as follows:

00 02 int2 '2' (bigendian hardware here)
00 00 pad space to align varlena char field to long boundary
00 00 00 07 varlena header, size 7 => 3 bytes of actual data (whoops)
6e 65 77 ASCII 'new'
00 boolean 'f' (no pad needed for bool)
00 00 00 03 int4 '3' (no pad, it's on a long boundary already)

But the tuple readout routines will assume without looking that char(5)
occupies 9 bytes altogether, so they pick up the bool field 2 bytes over
from where it actually was put and pick up the int4 field 4 bytes over
from where it should be (due to alignment); result is garbage. If there
were another varlena field after the char(n) field, they'd pick up a
wrong field length and probably crash.


So, the question still remains "where and why"? My guess at this point
is that this is a bad side-effect of the fact that text and char(n) are
considered binary-equivalent. Probably, whatever bit of code ought to
be coercing the default value into the correct type for the column is
deciding that it doesn't have to do anything because they're already
equivalent types. I'm not sure where to look for that code (help
anyone?). But I am sure that it needs to be coercing the value to the
specified number of characters for char(n).

It also strikes me that there should be a check in the low-level
tuple construction routines that what they are handed for a char(n)
field is the right length. If tuple readout is going to assume that
char(n) is always n bytes of data, good software engineering dictates
that the tuple-writing code ought to enforce that assumption. At
the very least there should be an Assert() for it.

regards, tom lane

Search Discussions

  • Bruce Momjian at May 14, 1999 at 1:11 am

    But the tuple readout routines will assume without looking that char(5)
    occupies 9 bytes altogether, so they pick up the bool field 2 bytes over
    from where it actually was put and pick up the int4 field 4 bytes over
    from where it should be (due to alignment); result is garbage. If there
    were another varlena field after the char(n) field, they'd pick up a
    wrong field length and probably crash.


    So, the question still remains "where and why"? My guess at this point
    is that this is a bad side-effect of the fact that text and char(n) are
    considered binary-equivalent. Probably, whatever bit of code ought to
    be coercing the default value into the correct type for the column is
    deciding that it doesn't have to do anything because they're already
    equivalent types. I'm not sure where to look for that code (help
    anyone?). But I am sure that it needs to be coercing the value to the
    specified number of characters for char(n).
    Good analysis. I am sure this is a byproduct of my change in 6.? that
    allowed optimzation of char() fields by assuming they are all a fixed
    length. Of course, 99% of the time they were, so it never bit us,
    except with default. Not sure if default was added before or after my
    optimization.
    It also strikes me that there should be a check in the low-level
    tuple construction routines that what they are handed for a char(n)
    field is the right length. If tuple readout is going to assume that
    char(n) is always n bytes of data, good software engineering dictates
    that the tuple-writing code ought to enforce that assumption. At
    the very least there should be an Assert() for it.
    At least an Assert(). However, the tuple access routines do an
    auto-compute of column offsets on the first table access, so it never
    really looks at the tuples in between. However, an Assert should check
    that when you access a char() field, that it is really the proper
    length. Good idea.

    BTW, I couldn't find the default stuffing code myself either.

    --
    Bruce Momjian | http://www.op.net/~candle
    maillist@candle.pha.pa.us | (610) 853-3000
    + If your life is a hard drive, | 830 Blythe Avenue
    + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
  • Tom Lane at May 14, 1999 at 1:33 am

    Bruce Momjian writes:
    Good analysis. I am sure this is a byproduct of my change in 6.? that
    allowed optimzation of char() fields by assuming they are all a fixed
    length. Of course, 99% of the time they were, so it never bit us,
    except with default.
    There's nothing wrong with your optimization --- a char(n) field should
    be n characters 100% of the time. It's the default-insertion code
    that's busted.
    At least an Assert(). However, the tuple access routines do an
    auto-compute of column offsets on the first table access, so it never
    really looks at the tuples in between. However, an Assert should check
    that when you access a char() field, that it is really the proper
    length. Good idea.
    No, I think the Assert ought to be on the output side. You might never
    try to access the char(n) field itself, only the following fields;
    if the attcacheoff fields are already set up when you come to the
    bogus tuple, an Assert in the reading side wouldn't catch it.

    regards, tom lane

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedMay 14, '99 at 12:57a
activeMay 14, '99 at 1:33a
posts3
users2
websitepostgresql.org...
irc#postgresql

2 users in discussion

Tom Lane: 2 posts Bruce Momjian: 1 post

People

Translate

site design / logo © 2021 Grokbase