We recently discovered, quite by accident, that our streaming replication server was no longer replicating. We noticed this in our master server log file:
2011-08-26 00:00:05 PDT postgres 192.168.17.4 [unknown]LOG: replication connection authorized: user=postgres host=192.168.17.4 port=53542
2011-08-26 00:00:05 PDT postgres 192.168.17.4 [unknown]FATAL: requested WAL segment 00000001000001D10000006B has already been removed
As it turned out this has been going on for at least a week as everyday's log files were crammed with these messages. Whatever caused the replication server to end up needing the WAL file is a mystery for another day. What I would like to do is setup a simple method of alerting us if replication stops. We could do a simple grep of log files on the replication side, but I am guessing that there is some SQL command that could be run against the postgres internals that would be cleaner. Is there such an animal?