On 2013-08-28 15:20:57 -0400, Robert Haas wrote:
That way any corruption in that area will prevent restarts without
reboot unless you use ipcrm, or such, right?
The way I've designed it, no. If what we expect to be the control
segment doesn't exist or doesn't conform to our expectations, we just
assume that it's not really the control segment after all - e.g.
someone rebooted, clearing all the segments, and then an unrelated
process (malicious, perhaps, or just a completely different cluster)
reused the same name. This is similar to what we do for the main
shared memory segment.
The case I am mostly wondering about is some process crashing and
overwriting random memory. We need to be pretty sure that we'll never
fail partially through cleaning up old segments because they are
corrupted or because we died halfway through our last cleanup attempt.
I think we want that during development, but I'd rather not go there
when releasing. After all, we don't support a manual choice between
anonymous mmap/sysv shmem either.
That's true, but that decision has not been uncontroversial - e.g. the
NetBSD guys don't like it, because they have a big performance
difference between those two types of memory. We have to balance the
possible harm of one more setting against the benefit of letting
people do what they want without needing to recompile or modify code.
But then, it made them fix the issue afaik :P
In addition, I've included an implementation based on mmap of a plain
file. As compared with a true shared memory implementation, this
obviously has the disadvantage that the OS may be more likely to
decide to write back dirty pages to disk, which could hurt
performance. However, I believe it's worthy of inclusion all the
same, because there are a variety of situations in which it might be
more convenient than one of the other implementations. One is
Hm. Not sure what's the advantage over a corefile here.
You can look at it while the server's running.
That's what debuggers are for.
On MacOS X, for example, there seems to be no way to list
POSIX shared memory segments, and no easy way to inspect the contents
of either POSIX or System V shared memory segments.
Shouldn't we ourselves know which segments are around?
Sure, that's the point of the control segment. But listing a
directory is a lot easier than figuring out what the current control
segment contents are.
But without a good amount of tooling - like in a debugger... - it's not
very interesting to look at those files either way? The mere presence of
a segment doesn't tell you much and the contents won't be easily
Another use case is working around an administrator-imposed or
OS-imposed shared memory limit. If you're not allowed to allocate
shared memory, but you are allowed to create files, then this
implementation will let you use whatever facilities we build on top
of dynamic shared memory anyway.
I don't think we should try to work around limits like that.
I do. There's probably someone, somewhere in the world who thinks
that operating system shared memory limits are a good idea, but I have
not met any such person.
"Let's drive users away from sysv shem" is the only one I heard so far ;)
I would never advocate deliberately trying to circumvent a
carefully-considered OS-level policy decision about resource
utilization, but I don't think that's the dynamic here. I think if we
insist on predetermining the dynamic shared memory implementation
based on the OS, we'll just be inconveniencing people needlessly, or
flat-out making things not work. [...]
But using file-backed memory will *suck* performancewise. Why should we
ever want to offer that to a user? That's what I was arguing about
If we're SURE
that a Linux user will prefer "posix" to "sysv" or "mmap" or "none" in
100% of cases, and that a NetBSD user will always prefer "sysv" over
"mmap" or "none" in 100% of cases, then, OK, sure, let's bake it in.
But I'm not that sure.
I think posix shmem will be preferred to sysv shmem if present, in just
about any relevant case. I don't know of any system with lower limits on
posix shmem than on sysv.
I think this case is roughly similar
to wal_sync_method: there really shouldn't be a performance or
reliability difference between the ~6 ways of flushing a file to disk,
but as it turns out, there is, so we have an option.
Well, most of them actually give different guarantees, so it makes sense
to have differing performance...
Why do we want to expose something unreliable as preferred_address to
the external interface? I haven't read the code yet, so I might be
missing something here.
I shared your opinion that preferred_address is never going to be
reliable, although FWIW Noah thinks it can be made reliable with a
large-enough hammer.
I think we need to have the arguments for that on list then. Those are
pretty damn fundamental design decisions.
I for one cannot see how you even remotely could make that work a) on
windows (check the troubles we have to go through to get s_b
consistently placed, and that's directly after startup) b) 32bit systems.
But even if it isn't reliable, there doesn't seem to be all that much
value in forbidding access to that part of the OS-provided API. In
the world where it's not reliable, it may still be convenient to map
things at the same address when you can, so that pointers can't be
used. Of course you'd have to have some fallback strategy for when
you don't get the same mapping, and maybe that's painful enough that
there's no point after all. Or maybe it's worth having one code path
for relativized pointers and another for non-relativized pointers.
It seems likely to me that will end up with untested code in that
case. Or even unsupported platforms.
To be honest, I'm not real sure. I think it's clear enough that this
will meet the minimal requirements for parallel query - ONE dynamic
shared memory segment that's not guaranteed to be at the same address
in every backend, and can't be resized after creation. And we could
pare the API down to only support that. But I'd rather get some
experience with this first before we start taking away options.
Otherwise, we may never really find out the limits of what is possible
in this area, and I think that would be a shame.
On the other hand, adding capabilities annoys people far much than
deciding that we can't support them in the end and taking them away.


Andres Freund

  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 8 | next ›
Discussion Overview
grouppgsql-hackers @
postedAug 14, '13 at 1:09a
activeAug 31, '13 at 12:27p



site design / logo © 2021 Grokbase