On Fri, Aug 30, 2013 at 11:45 AM, Andres Freund wrote:
The way I've designed it, no. If what we expect to be the control
segment doesn't exist or doesn't conform to our expectations, we just
assume that it's not really the control segment after all - e.g.
someone rebooted, clearing all the segments, and then an unrelated
process (malicious, perhaps, or just a completely different cluster)
reused the same name. This is similar to what we do for the main
shared memory segment.
The case I am mostly wondering about is some process crashing and
overwriting random memory. We need to be pretty sure that we'll never
fail partially through cleaning up old segments because they are
corrupted or because we died halfway through our last cleanup attempt.
Right. I had those considerations in mind and I believe I have nailed
the hatch shut pretty tight. The cleanup code is designed never to
die with an error. Of course it might, but it would have to be
something like an out of memory failure or similar that isn't really
what we're concerned about here. You are welcome to look for holes,
but these issues are where most of my brainpower went during
development.
That's true, but that decision has not been uncontroversial - e.g. the
NetBSD guys don't like it, because they have a big performance
difference between those two types of memory. We have to balance the
possible harm of one more setting against the benefit of letting
people do what they want without needing to recompile or modify code.
But then, it made them fix the issue afaik :P
Pah. :-)
You can look at it while the server's running.
That's what debuggers are for.
Tough crowd. I like it. YMMV.
I would never advocate deliberately trying to circumvent a
carefully-considered OS-level policy decision about resource
utilization, but I don't think that's the dynamic here. I think if we
insist on predetermining the dynamic shared memory implementation
based on the OS, we'll just be inconveniencing people needlessly, or
flat-out making things not work. [...]
But using file-backed memory will *suck* performancewise. Why should we
ever want to offer that to a user? That's what I was arguing about
primarily.
I see. There might be additional writeback traffic, but it might not
be that bad in common cases. After all the data's pretty hot.
If we're SURE
that a Linux user will prefer "posix" to "sysv" or "mmap" or "none" in
100% of cases, and that a NetBSD user will always prefer "sysv" over
"mmap" or "none" in 100% of cases, then, OK, sure, let's bake it in.
But I'm not that sure.
I think posix shmem will be preferred to sysv shmem if present, in just
about any relevant case. I don't know of any system with lower limits on
posix shmem than on sysv.
OK, how about this.... SysV doesn't allow extending segments, but
mmap does. The thing here is that you're saying "remove mmap and keep
sysv" but Noah suggested to me that we remove sysv and keep mmap.
This suggests to me that the picture is not so black and white as you
think it is.
I shared your opinion that preferred_address is never going to be
reliable, although FWIW Noah thinks it can be made reliable with a
large-enough hammer.
I think we need to have the arguments for that on list then. Those are
pretty damn fundamental design decisions.
I for one cannot see how you even remotely could make that work a) on
windows (check the troubles we have to go through to get s_b
consistently placed, and that's directly after startup) b) 32bit systems.
Noah?
But even if it isn't reliable, there doesn't seem to be all that much
value in forbidding access to that part of the OS-provided API. In
the world where it's not reliable, it may still be convenient to map
things at the same address when you can, so that pointers can't be
used. Of course you'd have to have some fallback strategy for when
you don't get the same mapping, and maybe that's painful enough that
there's no point after all. Or maybe it's worth having one code path
for relativized pointers and another for non-relativized pointers.
It seems likely to me that will end up with untested code in that
case. Or even unsupported platforms.
Maybe. I think for the amount of code we're talking about here, it's
not worth getting excited about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 8 of 8 | next ›
Discussion Overview
grouppgsql-hackers @
categoriespostgresql
postedAug 14, '13 at 1:09a
activeAug 31, '13 at 12:27p
posts8
users4
websitepostgresql.org...
irc#postgresql

People

Translate

site design / logo © 2017 Grokbase