Forwarded message:
Perhaps mmap() would be a good idea. My system has msync() to flush
mmap()'ed pages to the underlying file. You would still run fsync()
after that. This may give us the best of both worlds: a shared-memory
area of variable size, and control of when it get flushed to disk. Do
I like it. FreeBSD supports

MAP_ANON Map anonymous memory not associated with any specific file.

It would be nice to use mmap to get more "shared" memory, but I don't see
reasons to mmap any particular file to memory. Having two last pg_log pages
in memory + xact commit/abort writeback optimization (updation of commit/abort
xmin/xmax status in tuples by any scan - we already have this) reduce access
to "old" pg_log pages to zero.
I totally agree. There is no advantage to mmap() vs. shared memory for
us. I thought if we could control when the mmap() gets flushed to disk,
we could let the OS handle the syncing, but I doubt this is going to be

Though, we could mmap() pg_log, and that way backends would not have to
read/write the blocks, and they could all see the same data. But with
the new scheme, they have most transaction ids in shared memory.

Interesting you mention the scan updating the transaction status. We
would have a problem here. It is possible a backend will update the
commit status of a data page, and that data page will make it to disk,
but if there is a crash before the update pg_log gets sync'ed, there
would be a partial transaction in the system.

I don't know any way that a backend would know the transaction has hit
disk, and the data commit flag could be set. You don't want to update
the commit flag of the data page until entire transaction has been
sync'ed. The only way to do that would be to have a 'commit and synced'
flag, but you want to save that for nested transactions.

Another case this could come in handy is to allow reuse of superceeded
data rows. If the transaction is committed and synced, the row space
could be reused by another transaction.
I have been thinking about the mmap() issue, and it seems a natural for
pg_log. You can have every backend mmap() pg_log. It becomes a dynamic
shared memory area that is auto-initialized to the contents of pg_log,
and all changes can be made by all backends. No locking needed. We can
also flush the changes to the underlying file. Under bsdi, you can also
have the mmap area follow you across exec() calls, so each backend
doesn't have to do anything. I want to replace exec with fork also, so
the stuff would be auto-loaded in the address space of each backend.

This way, you don't have to have two on-line pages and move them around
as pg_log grows.

The only problem remains how to mark certain transactions as synced or
force only synced transactions to hit the pg_log file itself, and data
row commit status only should be updated for synced transactions.

- --
Bruce Momjian


Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-hackers @
postedNov 10, '97 at 2:50a
activeNov 10, '97 at 2:50a

1 user in discussion

Bruce Momjian: 1 post



site design / logo © 2021 Grokbase