Hi,
its more for blog post but Im not blogging so maybe share here :) .
Looks like we hit (in 10.2.0.3 env) :
*DBMS_SERVER_ALERT.SET_THRESHOLD HANGS FOREVER AT RELIABLE MESSAGE [ID
794589.1]
looks not bad (relaible message is idle wait right ?) but when I've
tried to deal with hanging processes via kill -9 processes are no
longer on os pid lists but
from Oracle point of view we still got sessions for that ospids and PMON
is unable to proper clear that session .
From PMON trace:
**** 2011-11-25 13:44:35.047
found process 0x25f5f8bd0 pid=40 serial=2 ospid = 15291 dead
found process 0x25f5f93b8 pid=42 serial=1 ospid = 30345 dead
*** 2011-11-25 13:44:45.060
found process 0x25f5f8bd0 pid=40 serial=2 ospid = 15291 dead
found process 0x25f5f93b8 pid=42 serial=1 ospid = 30345 dead
*** 2011-11-25 13:44:47.064
found process 0x25f5f8bd0 pid=40 serial=2 ospid = 15291 dead
found process 0x25f5f93b8 pid=42 serial=1 ospid = 30345 dead
in alert log PMON is unable to clean up process bla bla .
After restarting EM grid agents there are two new hanging processes on
dbms_server_alert.set_threshold still reliable message .
When You strace that proces You can see
strace -p 30540
Process 30540 attached - interrupt to quit
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
getrusage(RUSAGE_SELF, {ru_utime={0, 352946}, ru_stime={0, 95985}, ...}) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 352946}, ru_stime={0, 95985}, ...}) = 0
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
semtimedop(819218, 0x7fbfff7270, 1, {1, 0}) = -1 EAGAIN (Resource
temporarily unavailable)
so its timeout on semaphore set operation call .
There is SR open but Oracle not responded so far .
Dont want to be so dramatic but Im sure shutdown immediate will not help
here :) .
Any ideas how to deal with session hanging on that event (reliable
message ) ?
Regards
GregG