On 2011-02-25 20:40, Jaime Casanova wrote:
On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havingawrote:
I also did some initial testing on this patch and got the queue related
errors with> 1 clients. With the code change from Jaime above I still got a
lot of 'not on queue warnings'.
I tried to understand how the queue was supposed to work - resulting in the
changes below that also incorporates a suggestion from Fujii upthread, to
early exit when myproc was found
yes, looking at the code, the warning and your patch... it seems yours
is the right solution...
I'm compiling right now to test again and see the effects, Robert
maybe you can test your failure case again? i'm really sure it's
related to this...
I did some more testing over the weekend with this patched v17 patch.
Since you've posted a v18 patch, let me write some findings with the v17
patch before continuing with the v18 patch.
The tests were done on a x86_64 platform, 1Gbit network interfaces, 3
servers. Non default configuration changes are copy pasted at the end of
1) no automatic switch to other synchronous standby
- start master server, add synchronous standby 1
- change allow_standalone_primary to off
- add second synchronous standby
- wait until pg_stat_replication shows both standby's are in STREAMING state
- stop standby 1
what happens is that the master stalls, where I expected that it
would've switched to standby 2 acknowledge commits.
The following thing was pilot error, but since I was test-piloting a new
plane, I still think it might be usual feedback. In my opinion, any
number and order of pg_ctl stops and starts on both the master and
standby servers, as long as they are not with -m immediate, should never
cause the state I reached.
2) reaching some sort of shutdown deadlock state
- start master server, add synchronous standby
- change allow_standalone_primary to off
then I did all sorts of test things, everything still ok. Then I wanted
to shutdown everything, and maybe because of some symmetry (stack like)
I did the following because I didn't think it through
- pg_ctl stop on standby (didn't actualy wait until done, but
immediately in other terminal)
- pg_ctl stop on master
O wait.. master needs to sync transactions
- start standby again. but now: FATAL: the database system is shutting down
There is no clean way to get out of this situation.
allow_standalone_primary in the face of shutdowns might be tricky. Maybe
shutdown must be prohibited to enter the shutting down phase in
allow_standalone_primary = off together with no sync standby, that would
allow for the sync standby to attach again.
3) PANIC on standby server
At some point a standby suddenly disconnected after I started a new
pgbench run on a existing master/standby pair, with the following error
in the logfile.
LOCATION: libpqrcv_connect, libpqwalreceiver.c:171
PANIC: XX000: heap_update_redo: failed to add tuple
CONTEXT: xlog redo hot_update: rel 1663/16411/16424; tid 305453/15; new
LOCATION: heap_xlog_update, heapam.c:4724
LOG: 00000: startup process (PID 32597) was terminated by signal 6: Aborted
This might be due to pilot error as well; I did a several tests over the
weekend and after this error I was more alert on remembering immediate
shutdowns/starting with a clean backup after that, and didn't see
similar errors since.
4) The performance of the syncrep seems to be quite an improvement over
the previous syncrep patches, I've seen tps-ses of O(650) where the
others were more like O(20). The O(650) tps is limited by the speed of
the standby server I used-at several times the master would halt only
because of heavy disk activity at the standby. A warning in the docs
might be right: be sure to use good IO hardware for your synchronous
replicas! With that bottleneck gone, I suspect the current syncrep
version can go beyond 1000tps over 1 Gbit.
standby_mode = 'on'
primary_conninfo = 'host=mg73 user=repuser password=pwd
trigger_file = '/tmp/postgresql.trigger.5432'
postgresql.conf nondefault parameters:
log_error_verbosity = verbose
log_min_messages = warning
log_min_error_statement = warning
listen_addresses = '*' # what IP address(es) to listen on;
search_path='\"$user\", public, hl7'
archive_mode = on
archive_command = 'test ! -f /data/backup_in_progress || cp -i %p
/archive/%f < /dev/null'
checkpoint_completion_target = 0.9
checkpoint_segments = 16
default_statistics_target = 500
constraint_exclusion = on
max_connections = 120
maintenance_work_mem = 128MB
effective_cache_size = 1GB
work_mem = 44MB
wal_buffers = 8MB
shared_buffers = 128MB
wal_level = 'archive'
max_wal_senders = 4
wal_keep_segments = 1000 # 16000MB (for production increase this)
synchronous_standby_names = 'standby1,standby2,standby3'
synchronous_replication = on
allow_standalone_primary = off