FAQ
Unfortunately cycling the database is your only option since there is no parent session nor any O/S process to kill.

David Fitzjarrell

________________________________
From: Kellyn Pedersen
To: oracle Freelists
Sent: Tue, June 1, 2010 2:25:39 PM
Subject: Oracle 10g "Ghost" SID Eating up CPU

The last two weekends, due to some new code, my main datawarehouse/OLTP, (yes, I know it's an oxy-moron and it's 10.2.0.4 on Linux, 64bit with a number of one off patches for parallel bugs...)  has been overwhelmed by 32 CTAS concurrently running, all requesting 4 parallel on large table selects.  Parallel was downgraded a number of times 75% or more during this step in their package.
This is the second time I've come back in after the occurrance to find one parallel coordinator session running-  on it's own, no other producers/consumers, no parent SID, just this one process eating up CPU-

SID SERIAL# STATUS OSUSER PROCESS MACHINE PROGRAM ROW_WAIT_OBJ# PDDL_STATUS PQ_STATUS EVENT
P1TEXT                              SECONDS_IN_WAIT
540 20564 ACTIVE sdev_user 31988 appmachine prodmachine (P039) 2815532 ENABLED ENABLED PX Deq: Execution  sleeptime/senderid 141170

If you try to search for the OS Process, (31988), it doesn't exist, the SQL_ID is unknown but I can see it was sitting on the primary key for a particular table, (although different one than the last time this ghost was present last week!)  What I believe happened is that the parallel query died, but the coorindator is still out there.

ERROR at line 1:
ORA-12805: parallel query server died unexpectedly

I found a couple of these errors, (12805) in trace files from the times that parallel was downgraded.  The process doesn't exist on the app server, I don't have an OS PID to kill and I can't kill it at the Oracle session level, (ORA-00030:  User session ID does not exist.)  Last time we had a maintenance window and solved the problem quickly with a database cycle, but here I am again-  HOW DO I GET RID of this thing!?!?  It's starting to eat up CPU and won't die... :(

SID PID Coordinator SPID  Group Set Degree Req Degree Wait Event
540 134 540                31988                                                        PX Deq: Execution Msg

Anybody have any ideas?  I actually have two P039 processes in my database right now!  This cannot be good! :(

Kellyn Pedersen
Sr. Database Administrator
I-Behavior Inc.
http://www.linkedin.com/in/kellynpedersen
www.dbakevlar.blogspot.com

"Go away before I replace you with a very small and efficient shell script..."

Search Discussions

  • Tim Gorman at Jun 2, 2010 at 1:02 pm
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

    Kellyn,
    Rather than just bouncing the database instance, especially if there is
    no direct proof that application performance is impacted, then it is
    better to stick it out and research it into the ground.
    If there is no process in the OS, then how is it possible that CPU is
    being consumed?  By what OS utility is that observation coming from?
    What is the chain of evidence indicating massive CPU consumption from
    outside the Oracle database instance?
    You can use "top" or a hand-rolled version (i.e. "ps -eaf
    sort -n +3 | tail") next time this happens, to see if any
    process on the server is eating up CPU.  The nice thing about the
    hand-rolled "ps" command is that it doesn't just show you the
    executable name (i.e. "oracle") as "top" does, but
    shows you the functional command-name (i.e. "ora_pmon_XXX",
    etc) that "ps -eaf" shows, which can be helpful.  Whatever you
    find, you can backtrack the OS PID into the database using "select
    * from v$session where paddr in (select addr from v$process where spid
    = NNNN)" and see whether it is a "ghost" session or something more
    normal and expected.
    Also, from within Oracle, what are the values for STATUS, EVENT, P1,
    P2, P3 for this session?  If the session seems to be waiting on
    something, then it is further unlikely that CPU is being consumed.
    Particularly if the wait-event seems to be related to an OS system
    call, such as I/O.
    If the Oracle instance is holding a session open when its underlying OS
    process has actually been killed, then certainly it is not chewing up
    CPU, but it may be holding locks, which could possibly be more
    devastating.  You may want to query V$LOCK for that session's SID value
    to see what, if any, locks are being held.  More importantly, you would
    want to look at the BLOCKS column in V$LOCK to see if the session is
    blocking other sessions (i.e. BLOCKS > 0) -- bear in mind that the
    column name (BLOCKS) is being used as a verb in the context of V$LOCK,
    not as a noun.  If BLOCKS > 0, you may want to look at what those
    other sessions are by querying V$LOCK again, this time by the same TYPE
    and ID1 value as the blocking lock.
    -----
    As far as why this is happening, there should be trace files left
    behind in the BACKGROUND_DUMP_DEST directory for the terminated Pnnn
    processes, and those trace files might give some general clue, in the
    first 50-100 lines of text.  If I had to guess off the top of my head,
    purely speculating, I would guess that you're encountering some OS
    resource limitation.  The reason for this guess is because the failures
    are occurring when the demand for OS resources are sky-high (i.e. 32
    concurrent CTAS operations, each degree 4, etc).  Please verify that
    allocated swap space is 75% of physical RAM, that process limits are
    configured correctly.  There's more to check, and of course it varies
    by platform, so it can be fairly laborious when doing it the first time.
    -----
    For a faster way to do this, especially on a regular basis, the RDA
    HCVE (i.e. Health Check Validation Engine) for your RDBMS version and
    platform is pretty helpful and authoritative.  To download, go to
    support.oracle.com, search on keywords "rda hcve download", download
    the appropriate ".tar" file to your database server, unpack, and just
    run the command "./rda.sh -T hcve" and follow the shell-script
    prompts.  Be aware that HCVE is intended as a "pre-installation"
    validation, and that some of the validations will fail expectedly in an
    already-installed environment (i.e. "LD_LIBRARY_PATH is set", etc).  If
    you need help interpreting the report, feel free to post back.  I get
    the feeling not many people know about HCVE.  I like to run it
    periodically on existing environments because the OS configuration gets
    changed as often as the RDBMS configuration, and a server which was
    configured correctly when Oracle was installed may no longer be
    configured within recommended guidelines.
    Hope this helps...
    Tim Gorman
    consultant -> Evergreen Database Technologies, Inc.
    postal => P.O. Box 630791, Highlands Ranch CO 80163-0791
    website => http://www.EvDBT.com/
    email => Tim_at_EvDBT.com
    mobile => +1-303-885-4526
    fax => +1-303-484-3608
    Lost Data? => http://www.ora600.be/ for info about DUDE...

    David Fitzjarrell wrote:

    <!-- DIV {margin:0px;} -->

    Unfortunately cycling the database is your only option since
    there is no parent session nor any O/S process to kill.

    David Fitzjarrell


    From:
    Kellyn Pedersen
    To: oracle Freelists

    Sent: Tue, June 1,
    2010 2:25:39 PM
    Subject: Oracle 10g
    "Ghost" SID Eating up CPU

    The last two weekends, due to some new code, my main
    datawarehouse/OLTP, (yes, I know it's an oxy-moron and it's 10.2.0.4 on
    Linux, 64bit with a number of one off patches for parallel bugs...)
    has been overwhelmed by 32 CTAS concurrently running, all requesting
    4 parallel on large table selects.  Parallel was downgraded a number of
    times 75% or more during this step in their package.
    This is the second time I've come back in after the occurrance
    to find one parallel coordinator session running-  on it's own, no
    other producers/consumers, no parent SID, just this one process eating
    up CPU-

    SID SERIAL# STATUS OSUSER PROCESS MACHINE PROGRAM ROW_WAIT_OBJ# PDDL_STATUS PQ_STATUS EVENT
    P1TEXT                              SECONDS_IN_WAIT
    540 20564 ACTIVE sdev_user 31988 appmachine prodmachine
    (P039) 2815532 ENABLED ENABLED PX Deq: Execution  sleeptime/senderid 141170

    If you try to search for the OS Process, (31988), it doesn't
    exist, the SQL_ID is unknown but I can see it was sitting on the
    primary key for a particular table, (although different one than the
    last time this ghost was present last week!)  What I believe happened
    is that the parallel query died, but the coorindator is still out
    there.

    ERROR at line 1:
    ORA-12805: parallel query server died unexpectedly

    I found a couple of these errors, (12805) in trace files from
    the times that parallel was downgraded.  The process doesn't exist on
    the app server, I don't have an OS PID to kill and I can't kill it at
    the Oracle session level, (ORA-00030:  User session ID does not
    exist.)  Last time we had a maintenance window and solved the problem
    quickly with a database cycle, but here I am again-  HOW DO I GET RID
    of this thing!?!?  It's starting to eat up CPU and won't die... :(

    SID PID Coordinator SPID  Group Set Degree Req
    Degree Wait Event
    540 134 540
    31988                                                        PX Deq:
    Execution Msg

    Anybody have any ideas?  I actually have two P039 processes in
    my database right now!  This cannot be good! :(

    Kellyn Pedersen
    Sr. Database Administrator
    I-Behavior Inc.
    http://www.linkedin.com/in/kellynpedersen
    www.dbakevlar.blogspot.com

    "Go away before I replace you with a very small and efficient
    shell script..."

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedJun 2, '10 at 11:27a
activeJun 2, '10 at 1:02p
posts2
users2
websiteoracle.com

2 users in discussion

Tim Gorman: 1 post David Fitzjarrell: 1 post

People

Translate

site design / logo © 2022 Grokbase