Hi,
Suppose I create a table as follows:

CREATE TABLE test2 (name TEXT, age INTEGER) WITH oids;

Now, for every tuple in this table is associated with a unique oid, which I can retrieve by:

SELECT oid, name, age FROM test2;

which works great. So far so good.

Now, if look at the corresponding WAL entry for any insert into this relation, it creates the following WAL entry which I can decode.

XLogRecord ---> Followed by --> xl_heap_insert struct -->followed by --> xl_heap_header --> Followed by --> tuple data

If I use the macro  HeapTupleHeaderGetOid([xl_heap_header struct]) or equivalently if I test ([xl_heap_header struct]->t_infomask & HEAP_HASOID), it tells me that the tuple oid is not stored with this record.

Where is the OID of tuple stored in a WAL record of a tuple? If not with xl_heap_header, where is it stored? Is it stored at all?

Thanks for any responses.
Jagan

Search Discussions

  • Heikki Linnakangas at Apr 12, 2011 at 6:55 am

    On 11.04.2011 23:35, jagan wrote:
    Hi,
    Suppose I create a table as follows:

    CREATE TABLE test2 (name TEXT, age INTEGER) WITH oids;

    Now, for every tuple in this table is associated with a unique oid, which I can retrieve by:

    SELECT oid, name, age FROM test2;

    which works great. So far so good.

    Now, if look at the corresponding WAL entry for any insert into this relation, it creates the following WAL entry which I can decode.

    XLogRecord ---> Followed by --> xl_heap_insert struct -->followed by --> xl_heap_header --> Followed by --> tuple data

    If I use the macro HeapTupleHeaderGetOid([xl_heap_header struct])
    Yeah, that won't work because xl_heap_header is not a HeapTupleHeader.
    or equivalently if I test ([xl_heap_header struct]->t_infomask& HEAP_HASOID), it tells me that the tuple oid is not stored with this record.
    That's strange, the flag should be set.
    Where is the OID of tuple stored in a WAL record of a tuple? If not with xl_heap_header, where is it stored? Is it stored at all?
    It's stored in the "tuple data" portion. See the code that writes the
    WAL record, here:

    http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/backend/access/heap/heapam.c;h=1fbd8b39b4a73f6aee337fccf2dcce548d0850bb;hb=HEAD#l1987

    Maybe the WAL record you're looking at is a full-page image? A record
    with a full-page image includes a verbatim copy of the page, and the
    individual tuple is omitted in that case.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Jagan at Apr 12, 2011 at 4:42 pm
    Hi,
    Where is the OID of tuple stored in a WAL record of a
    tuple? If not with xl_heap_header, where is it stored? Is it
    stored at all?
    It's stored in the "tuple data" portion.
    Is the OID also recorded with xl_heap_delete record as well or just the xl_heaptid? From my reading of the code it is not recorded.

    In general, why is OID of a tuple relegated as just another tuple data, when it can replace xl_heaptid as a much more stable tuple identifier. I understand that by recording tuples in terms of xl_heaptid (i.e., page/offset) means that the db can recover quickly but isn't there some value to storing a much "higher level" information in the WAL logs? For example, tuple manipulations, if recorded in terms of manipulation on tuple oids, can support applications such as these:

    http://blogs.enterprisedb.com/2011/03/03/more-musings-on-logical-replication/

    Just a thought.

    Thanks,
    Jagan
  • Heikki Linnakangas at Apr 12, 2011 at 4:49 pm

    On 12.04.2011 19:42, jagan wrote:
    Where is the OID of tuple stored in a WAL record of a
    tuple? If not with xl_heap_header, where is it stored? Is it
    stored at all?
    It's stored in the "tuple data" portion.
    Is the OID also recorded with xl_heap_delete record as well or just the xl_heaptid? From my reading of the code it is not recorded.
    Correct, it's not.
    In general, why is OID of a tuple relegated as just another tuple data, when it can replace xl_heaptid as a much more stable tuple identifier. I understand that by recording tuples in terms of xl_heaptid (i.e., page/offset) means that the db can recover quickly but isn't there some value to storing a much "higher level" information in the WAL logs? For example, tuple manipulations, if recorded in terms of manipulation on tuple oids, can support applications such as these:

    http://blogs.enterprisedb.com/2011/03/03/more-musings-on-logical-replication/
    Possibly. On the other hand, another common complaint is that PostgreSQL
    writes a lot of WAL, much more than other DBMSs. Adding more
    higher-level information to WAL records would make that issue even
    worse. At the moment we only include the minimum amount of information
    needed for recovery.

    --
    Heikki Linnakangas
    EnterpriseDB http://www.enterprisedb.com
  • Tom Lane at Apr 13, 2011 at 12:43 am

    Heikki Linnakangas writes:
    On 12.04.2011 19:42, jagan wrote:
    In general, why is OID of a tuple relegated as just another tuple
    data, when it can replace xl_heaptid as a much more stable tuple
    identifier.
    Possibly. On the other hand, another common complaint is that PostgreSQL
    writes a lot of WAL, much more than other DBMSs. Adding more
    higher-level information to WAL records would make that issue even
    worse.
    Actually, the real problem with that proposal is that it assumes that
    the OID (a) exists and (b) is unique, neither of which is guaranteed.
    Especially not when you consider that WAL log entries have to correctly
    distinguish which MVCC version of a tuple is being modified.

    regards, tom lane
  • Greg Stark at Apr 12, 2011 at 11:09 pm

    On Tue, Apr 12, 2011 at 7:55 AM, Heikki Linnakangas wrote:
    Maybe the WAL record you're looking at is a full-page image? A record with a
    full-page image includes a verbatim copy of the page, and the individual
    tuple is omitted in that case.
    It is? I thought full page images were only in addition to, not
    instead of any data in the record. Doesn't pg_lesslog depend on this?

    --
    greg
  • Tom Lane at Apr 12, 2011 at 11:35 pm

    Greg Stark writes:
    On Tue, Apr 12, 2011 at 7:55 AM, Heikki Linnakangas
    wrote:
    Maybe the WAL record you're looking at is a full-page image? A record with a
    full-page image includes a verbatim copy of the page, and the individual
    tuple is omitted in that case.
    It is? I thought full page images were only in addition to, not
    instead of any data in the record.
    No, that's not the case.
    Doesn't pg_lesslog depend on this?
    One hopes not.

    regards, tom lane
  • Shigeru Hanada at Apr 13, 2011 at 2:23 am

    (2011/04/13 8:34), Tom Lane wrote:
    Greg Stark<gsstark@mit.edu> writes:
    Doesn't pg_lesslog depend on this?
    One hopes not.
    AFAIK it's safe because pg_lesslog removes full-page image only when it
    has enough information for substitute incremental log.

    For example of XLOG_HEAP_INSERT, pg_lesslog generates incremental logs
    such as xl_heap_header and tuple image from the inserted tuple in the
    full-page image, and replaces full-page image with them.

    Regards,
    --
    Shigeru Hanada
  • Jagan at Apr 14, 2011 at 8:30 pm
    Hi,
    Where is the OID of tuple stored in a WAL record of a
    tuple? If not with xl_heap_header, where is it stored? Is it
    stored at all?

    It's stored in the "tuple data" portion.
    I see it now. I was having alignment issues, which I resolved. Thanks for your help. I am still not sure if I understand how attributes are packed in to a buffer. In particular, according to the inline comments in code, this is how a xl_heap_insert stores data:

    Oid + [padding bits] + [data]

    Now, to unpack the data, we would have to know what type of data is stored, which is obtained from pg_type. Different types require either a "no (char) align", "short align", "int aligned" or "double align". I went over the varattrib code and I am also able to unpack varchar, text and other variable length types.

    The code I wrote works fine but runs into alignment issues with some of the records. So, I am still missing something even though I have spent weeks reading postgres code.

    I notice the following:

    1. Oid starts at MAXALIGNed, which is fine.

    2. The subsequent attribute starts right away. There never seems to be padding even if the subsequent tuple needs to be double aligned. Hence, I skip aligning the first attribute by default, which is bizarre but works.

    3. When unpacking goes out of alignment, I print the buffer see if this is because they are externally stored. That is not the case as I can see text fields are actually stored in the buffer.

    4. The alignment goes out of whack even for entries with no varattrib for certain tuples.

    In general, what is the best way to unpack buffers containing tuples? I came across "heap_form_tuple" but not sure if that is most optimal way to go about. Is there some documentation on how tuples are internally stored?

    Thanks in advance for any pointers.
    Jagan
  • Alvaro Herrera at Apr 15, 2011 at 2:05 pm

    Excerpts from jagan's message of jue abr 14 17:30:17 -0300 2011:

    In general, what is the best way to unpack buffers containing tuples?
    I came across "heap_form_tuple" but not sure if that is most optimal
    way to go about. Is there some documentation on how tuples are
    internally stored?
    Unpack buffers containing tuples? You don't do that. If you have a
    page that comes from a table, you grab the tuple offset from the line
    pointer, and start decoding there (probably heap_deform_tuple is what
    you need; note you need a tuple descriptor). If you have something that
    comes from a WAL record, I don't think that's considered unpackable
    directly -- you have to copy it elsewhere first. I might be wrong about
    this though.

    What is the use case for this tool you're writing?

    --
    Álvaro Herrera <alvherre@commandprompt.com>
    The PostgreSQL Company - Command Prompt, Inc.
    PostgreSQL Replication, Consulting, Custom Development, 24x7 support
  • Jagan at Apr 15, 2011 at 5:29 pm
    Hi,
    In general, what is the best way to unpack buffers
    containing tuples?
    Unpack buffers containing tuples?  You don't do
    that.  If you have a page that comes from a table, you grab the tuple offset
    from the line pointer, and start decoding there (probably
    heap_deform_tuple is what you need; note you need a tuple descriptor).

    Thanks for the pointer. I will try that route of recreating tuple
    descriptors and using heap_deform_tuple, although I suspect that "heap_deform_tuple"
    cannot be used by a stand-alone application (as it is not externally visible).
    So, I will have to basically rewrite parts of it.
    I am hoping to build something that is robust that makes use of as much
    of the existing unpacking code as possible. Any additional
    pointers you can give me would be much appreciated.

    What is the use case for this tool you're writing?
    I am trying to build a WAL dumper and hoping to resurrect xlogdump along the way.

    Jagan
  • Greg Stark at Apr 15, 2011 at 10:45 pm

    On Fri, Apr 15, 2011 at 6:29 PM, jagan wrote:
    Thanks for the pointer. I will try that route of recreating tuple
    descriptors and using heap_deform_tuple, although I suspect that "heap_deform_tuple"
    cannot be used by a stand-alone application (as it is not externally visible).
    So, I will have to basically rewrite parts of it.
    I am hoping to build something that is robust that makes use of as much
    of the existing unpacking code as possible. Any additional
    pointers you can give me would be much appreciated.
    heap_deform_tuple isn't very complex itself, and doesn't have any
    external dependencies aside from the macros in postgres.h. However it
    *does* depend on having a tuple descriptor. This is the data structure
    which says what columns to expect and what types they are. You
    actually only need to know their sizes and whether they're variable
    size, but that in the server that comes from their type.

    The difficulty doing this from wal is that there's nowhere to get this
    information. If you get it from the live database the structure may
    not match the structure that was in place at the time the WAL was
    written.

    --
    greg

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedApr 11, '11 at 8:42p
activeApr 15, '11 at 10:45p
posts12
users6
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase