This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
4K striping is a major Major MAJOR mistake.
bad.
2-4x improvement.
convert the mirroring to hardware.
etc.
-----Original Message-----
From: VIVEK_SHARMA
Sent: Wednesday, December 20, 2000 11:13 AM
To: Multiple recipients of list ORACLE-L
Subject: RE: SOS Alert
CASE In a Bank, Interest Calculation Batch Processing Unable to go beyond
500 A/cs per minute
ACTION Moved the Database to Another IBM Machine ( 4 CPU, 4 GB RAM ) with
HARDWARE Mirroring & 64 K Stripe Size
NOTE - On the Under-performing DB Server S80 Model of IBM) there exists
Software Mirroring (By the AIX O.S.) & a Stripe Size of 4K exist
Firing Off the Interest Calculation Runs from 2 APP Servers Giving an
Overall Total Processing of Number of 3000 A/cs per minute ( 1500 from each
)
Qs. Which of the 2 i.e. Hardware Mirroring OR 64 K Striping, do you think
is the Cause of the Increase in performance as we put BOTH features Together
at the SAME Time OR do Both Contribute Equally ?
Thanks to Vivek for everything
-----Original Message-----
From: VIVEK_SHARMA
Sent: Thursday, December 07, 2000 6:35 PM
Subject: RE: SOS Alert
ACTION Identified the 2 indexes having the waits of "db file
sequential read"
Created 2 NEW Tablespaces Each containing 1 Index
Also Specified parameter log_small_entry_max_size = 0
NOTE - ORA 7.3.4.5
RESULT Performance thruput increased to 1500 A/cs per minute from
1000 A/cs per minute
Allow me to begin :-
There were 2 routes in between the APP & DB Servers .
One had a Bandwidth of 10 MBPS while Another had a Bandwidth of 100
MBPS
The 10 MBPS route was being used all this while . On Enabling the
100 MBPS route
(by Specifying the I.P. Address Instead of Hostname in the
tnsnames.ora & listener.ora)
the performance has jumped to 1100 records per minute (from 500
records per minute)
i.e. Almost Doubled .
100 MBPS is the MAX Possible Bandwidth Possible as per the SA
Qs. Your Views on this please
Additional responses Follow your Views BELOW :-
-----Original Message-----
From: Vivek [SMTP:Vivek_at_1800FLOWERS.com]
Sent: Tuesday, November 28, 2000 6:30 PM
To: 'VIVEK_SHARMA'
Cc: Arun Chakrapani
Subject: RE: SOS Alert
Vivek:
Sorry for not replying to you yesterday itself. A Busy day. Took a
look at
you report.txt. They were having 2 different profiles. Since You had
mentioned that it also stalled in Test DB I concentrated more on
the test
report.txt. One thing which stood out on the other one (live) on
opening you
report.txt is the SQL _AREA GET_HIT_RATIO is very poor. I don't know
whether
you have already pinned your SYS objects and other objects that are
executed
often in shared pool. Please use DBMS_SHARED_POOL.keep to pin
package,procedure,function,sequence,cursors, triggers that are used
often.
Ans OK will do so
So now I will concentrate on the Test DB.
Even though the SQL_AREA GET_HIT_RATIO is higer(82%) when
compared to
live this is not good enough. It should be close to 100%(1 some
where like
.98 alteast). Please pin the SYS objects to start with.
OK
Allow me to say that the shared_pool_size has by experience been
kept at bare minimum required
for we did experience performance poblems with a shred_pool_size of
300 MB
( Though on Another machine on SUN OS )
It is recommended to keep shared_pool_size at rock bottom just above
the Limit
where ORA-4031 out of shared pool segments occurs
2. On seeing you system wide wait events the following stands out.
1.log file sync 2863 83922 29.31
2.write complete waits 905 71401 78.9
3.free buffer waits 60 4718 78.63
The first event log file sync event mostly comes when there is a
disk
i/o bottleneck for the online redo logs.This happens when the
processes have
to wait for the LGWR to sync a particular REDO log. The p1 parameter
in a
Log sync in the corresponding LOg Buffer.
The Write complete waits comes
when the user process is waiting for the LGWR to complete the
writing of the
redo information. So Please take a look at the placing of the
online redo
log. In which disk have you placed this. Also if it has been place
in its
own file system it doesn't mean that it is not going to a busy disk.
Since
you have 0+1, The slices may belong to a HOT disk. What you need to
do is
ask your SA to monitor the disk I/O usage and see whats happening to
the
disk containing the redo log file. Also you can use iostat -xPnce 5
5 (I
think its available in AIX). the output will look something like
us sy wt id
23 7 49 21
extended device statistics ---- errors
---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn
tot
device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 13.4 0 0 0 0 0
c1t10d0s0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 0 0 0
c1t10d0s1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0
c1t10d0s2
0.0 1.3 0.3 10.1 0.0 0.1 4.6 105.8 0 1 0 0 0
c1t10d0s3
Watch out for %b column. If its somewhere around 60 then you may be
experiencing contention on those slices. Ask you SA by giving
him/her the
device name and they will be able to give you what the FS is.
NOTE - 4 Datafiles Containing a Particular Table & it's
Corresponding indexes (to Which INSERTS
happen ) were Moved out of the RAID onto a Local Disk whereafter
performance Jumped from
850 per minute to 1100 per minute
Ans - OK . Will Issue the Command
Also I see
Free Buffer waits. This is caused because there are no free buffers
when
they are requested - pointing directly to the inefficiency of the
DBWR.
Again the DBWR may be slow because you have a I/O bottleneck for the
DB
files and indirectly because of the Write complete waits.
Qs. Would you recommend Setting Multiple db_writer processes
even though async_io is set to true ?
" Thus a Volume Group Contains 8 Disks with 4 Disks
Stripe Size = 4K "
[Vivek] Please confirm me that the Stripe size is 4K. Usually you
will have
a stripe size of 32K,64K,128K,256K .... If you are having a stripe
size of
4K and your db_block_size as 8K, It is going to take 2 reads to
satisfy one
request for a db_file_sequential_read (Usually requests 1 Oracle
block)
[Vivek] . This doesn't sound good if thats true.
NOTE - The "test" Database is a 4K DB_BLOCK_SIZE while the Live
Database is
an 8K DB_BLOCK_SIZE, But the SAME performance Limits were Observed
on Both
So if possible could you please do the following.
Please shutdown the Test db and start fresh.
Run the same process again and also run the iostat -xPnce 60
10000 for
the duration of the run (output this to a file)
Run vmstat and see whats the CPU usage for USER and SYSTEM
When the 500 A/c per sec reaches - Please run
set lines 132
select sid, substr(event,1,40),p1,p2,p3 from v$session_wait
where event in ('db file scattered read','db file sequential
read','latch
free','buffer busy waits','log file sync','db file scattered
read','enqueue');
and give me the output.
OK - Will Action & revert to you
5. IF may waits are showing as "latch free" in the event column of
v$session_wait, Please query v$latch for
latch#,name,level,GETS,MISSES,SLEEPS IMMEDIATE_GETS,WAITS_HOLDING_LATCH, SPIN_GETS, SLEEP1.
Ans OK
6. Preserve the output for v$latch_children.
Ans OK
Qs. Are their any IBM OS Specific Parameters which are considered
benefitial for performance ?
I will take a look at both the report.txt in more detail today and
get back
to you.
Thanks Again so much
Where are you Located / Phone No ? if you please
We are Currently Located in Bangalore & Addressing The Problem at
Mumbai over a WAN
Phone 91 80 6588668
HTH
Vivek
-----Original Message-----
From: VIVEK_SHARMA [SMTP:VIVEK_SHARMA_at_infy.com]
Sent: Monday, November 27, 2000 11:16 AM
To: 'Vivek'
Cc: Arun Chakrapani
Subject: RE: SOS Alert >
THANKS so much for the association in this issue >
Attached 2 report.txt each taken for 1 Hour approx . Is it
Sufficient ?
>
Attached report.txt of Problem on the "Live" Database >
<>
NOTE - A "test" Database Created on the DB Server Itself in the Same
Partitions as the "Live" Database
Runs Done on this "test" Database Also failing to go beyond 500 A/cs per
minute
>
>
<> >
NOTE - Interest Runs were Done on the "Test" Database During Down-Time
Periods of the "Live" Database >
You Wrote :-
Since there is some write complete waits Please take a look at the
place the on line redo logs are and what is the read, write on these
disks. >
Ans Online Redo logs ALSO present on the RAID 0+1 Disks & are
Multiplexed
Size of Online Redo logfile on the "Live" Database = 30 MB >
NOTE - On the test Database we increased the Size of the Online Redo
logfile
to 100MB but with NO benefit >
You wrote :-
Please give me the DB Version as well >
Ans Database Oracle version = 7.3.4.5.0 >
You wrote :-
I see some waits on More data from client.
Qs. What is to be done about the Same ? >
NOTE - Will Run the 2 SQL Scripts Suggested by you & revert back >
Thanks Again
>
>
-----Original Message-----
From: Vivek [SMTP:Vivek_at_1800FLOWERS.com]
Sent: Monday, November 27, 2000 7:44 PM
To: 'VIVEK_SHARMA_at_infy.com'
> >
Cc: Arun Chakrapani
Subject: SOS Alert
> >
> >
Will it be possible to send the whole report.txt to me. Since
you have
not
given for how long this was taken (time from utlbstat and
utlestat) it
is
more difficult to say anything. Based on the report.txt snippet
you
have
given there seems to me there are some issues with the enqueues
and
latch
free waits. Please run the following SQL as SYS so bet' 2
different
times
(
say 1/2 apart) and send me both the values(It will be great if
you could
run
this when the batch processing is running).
> >
select KSQSTTYP,KSQSTGET,KSQSTWAT from x$ksqst where KSQSTGET>0
and
> >
The delta bet' the 2 results will give you information on what
is the
enqueue that we see more waits on. Also Please run this sql
given below
in
a
cron every minute or so and output this to a file
> >
SELECT p1 "File", p2 "Block", p3 "Reason" FROM v$session_wait
WHERE
event
like 'buffer busy waits';
> >
This will give you and idea about the buffer busy waits. Since
there is
some
write complete waits Please take a look at the place the on line
redo
logs
are and what is the read, write on these disks. Also Please give
me
infomation on the relog file size. Please give me the DB Version
as
well.
I
see some waits on More data from client. Again since I don't
know for
how
long the stats were run can't say whether they are significant
or not.
> >
> >
> >
> >
> >
From: VIVEK_SHARMA
Date: Mon, 27 Nov 2000 12:11:59 +0530
Subject: SOS Alert
> >
CASE In a Bank, Interest Calculation Batch Processing Unable to
go
beyond
500 A/cs per minute
> >
CAUSE of the Problem is UNKNOWN
> >
ORACLE 7.3.4.5 on AIX 4.3.3
DB Server - IBM S80 Model - 12 CPUs, 3 GB RAM
APP Server 1 - IBM S80 Model - 6 CPUs, 2 GB RAM
APP Servers 2 & 3 - IBM Thin Nodes model - 4 CPUs, 1 GB RAM
> >
Storage Box :-
===========
SS Class Storage
RAID 0+1 - (First Striped & then Mirrored)
NOTE - 2 Storage Boxes Exist, one being the Mirror of the Other
Striping exists across a set 4 Disks (in one Box) with another 4
being
it's
mirror
(in another Box).
Thus a Volume Group Contains 8 Disks with 4 Disks
Stripe Size = 4K
> >
NOTE - Runs Tried from BOTH the APP Servers 1 OR 2,But with the
SAME
MAX
of
500 A/cs
processed per minute
> >
CPU Utilizations on BOTH APP & DB Server = 40 %
wio% on BOTH APP & DB Servers = 35 %
No paging happening on Both APP & DB Servers
> >
- Oracle Contention Values Seem Small to us as shown Below or so
they
> >
SVRMGR> Rem System wide wait events for non-background processes
(PMON,
SVRMGR> Rem SMON, etc). Times are in hundreths of seconds.
Each one of
>
SVRMGR> Rem these is a context switch which costs CPU time. By
looking
at
SVRMGR> Rem the Total Time you can often determine what is the
bottleneck
SVRMGR> Rem that processes are waiting for. This shows the
total time
spent
SVRMGR> Rem waiting for a specific event and the average time
per wait
on
SVRMGR> Rem that event.
SVRMGR> select n1.event "Event Name",
2> n1.event_count "Count",
3> n1.time_waited "Total Time",
4> round(n1.time_waited/n1.event_count, 2) "Avg Time"
5> from stats$event n1
6> where n1.event_count > 0
7> order by n1.time_waited desc;
Event Name Count Total Time Avg
Time
>
-------------------------------- ------------- -------------
-------------
SQL*Net message from client 10856276 31977110 2.95
enqueue 1295 374980 289.56
db file sequential read 3614044 303848 .08
write complete waits 5812 295937 50.92
latch free 5045060 242170 .05
SQL*Net more data from client 13939 165275 11.86
log file sync 12794 146409 11.44
buffer busy waits 100443 92477
.92
> >
> >
> >
- ALL Literal SQLs were Converted to using Bind variables
- ALL Tables Running on Indexes Without Any FULL Scans happening
.
> >
- All the Literal SQLs (Dynamic) Converted to using Bind
variables
> >
- event="10181 trace name context forever, level 1000"
NOTE - Set nevertheless, None of the SQLs taking Excessive time
to
> >
- NO statistics are Analyzed
- 7 tables involved in the interest Calc.
inserts to 2 Tables
selects, updates in 5 tables
> >
Qs. Are there Any BASIC O.S./Hardware Features to Enhance
Performance on
> >
Qs. Is Any Other portion of the report.txt required for perusal
?
> >
Please Revert to me for any further info / Clarifications ?
> >
> >
> >
<< File: report.txt_Live >> << File: report.txt_TEST >>
--
Please see the official ORACLE-L FAQ:
http://www.orafaq.com--
Author: VIVEK_SHARMA
INET: VIVEK_SHARMA_at_infy.com
Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051
San Diego, California -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from). You may
also send the HELP command for other information (like subscribing).
------_=_NextPart_001_01C06AC2.46995EF0
Content-Type: text/html;
charset="iso-8859-1"
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
RE: SOS Alert
4K striping is a major Major MAJOR mistake.
Doing it in software makes it at least 3x as
bad.
Either fix alone would have provided at least a
2-4x improvement.
Given a choice, I'd fix the stripe size first, then
convert the mirroring to hardware.
hth
imho
ymmv
afaik,
etc.
- Ross Mohan
-----Original Message-----
From: VIVEK_SHARMA [mailto:VIVEK_SHARMA_at_infy.com]
Sent: Wednesday, December 20, 2000 11:13 AM
To: Multiple recipients of list ORACLE-L
Subject: RE: SOS Alert
CASE In a Bank, Interest Calculation Batch Processing Unable to go beyond
500 A/cs per minute
ACTION Moved the Database to Another IBM Machine ( 4 CPU, 4 GB RAM ) with
HARDWARE Mirroring & 64 K Stripe Size
NOTE - On the Under-performing DB Server S80 Model of IBM) there exists
Software Mirroring (By the AIX O.S.) & a Stripe Size of 4K exist
Firing Off the Interest Calculation Runs from 2 APP Servers Giving an
Overall Total Processing of Number of 3000 A/cs per minute ( 1500 from each
)
Qs. Which of the 2 i.e. Hardware Mirroring OR 64 K Striping, do you think
is the Cause of the Increase in performance as we put BOTH features Together
at the SAME Time OR do Both Contribute Equally ?
Thanks to Vivek for everything
-----Original Message-----
From: VIVEK_SHARMA
Sent: Thursday, December 07, 2000 6:35 PM
Subject: RE: SOS Alert
ACTION Identified the 2 indexes having the waits of "db file
sequential read"
Created 2 NEW Tablespaces Each containing 1 Index
Also Specified parameter log_small_entry_max_size = 0
NOTE - ORA 7.3.4.5
RESULT Performance thruput increased to 1500 A/cs per minute from
1000 A/cs per minute
==============================================
Allow me to begin :-
There were 2 routes in between the APP & DB Servers .
One had a Bandwidth of 10 MBPS while Another had a Bandwidth of 100
MBPS
The 10 MBPS route was being used all this while . On Enabling the
100 MBPS route
(by Specifying the I.P. Address Instead of Hostname in the
tnsnames.ora & listener.ora)
the performance has jumped to 1100 records per minute (from 500
records per minute)
i.e. Almost Doubled .
100 MBPS is the MAX Possible Bandwidth Possible as per the SA
Qs. Your Views on this please
Additional responses Follow your Views BELOW :-
-----Original Message-----
From: Vivek [SMTP:Vivek_at_1800FLOWERS.com]
Sent: Tuesday, November 28, 2000 6:30 PM
To: 'VIVEK_SHARMA'
Cc: Arun Chakrapani
Subject: RE: SOS Alert
Vivek:
Sorry for not replying to you yesterday itself. A Busy day. Took a
look at
you report.txt. They were having 2 different profiles. Since You had
mentioned that it also stalled in Test DB I concentrated more on
the test
report.txt. One thing which stood out on the other one (live) on
opening you
report.txt is the SQL _AREA GET_HIT_RATIO is very poor. I don't know
whether
you have already pinned your SYS objects and other objects that are
executed
often in shared pool. Please use DBMS_SHARED_POOL.keep to pin
package,procedure,function,sequence,cursors, triggers that are used
often.
Ans OK will do so
So now I will concentrate on the Test DB.
1. Even though the SQL_AREA GET_HIT_RATIO is higer(82%) when
compared to
live this is not good enough. It should be close to 100%(1 some
where like
.98 alteast). Please pin the SYS objects to start with.
OK
Allow me to say that the shared_pool_size has by experience been
kept at bare minimum required
for we did experience performance poblems with a shred_pool_size of
300 MB
( Though on Another machine on SUN OS )
It is recommended to keep shared_pool_size at rock bottom just above
the Limit
where ORA-4031 out of shared pool segments occurs
2. On seeing you system wide wait events the following stands out.
1.log file sync 2863 83922 29.31
2.write complete waits 905 71401 78.9
3.free buffer waits 60 4718 78.63
The first event log file sync event mostly comes when there is a
disk
i/o bottleneck for the online redo logs.This happens when the
processes have
to wait for the LGWR to sync a particular REDO log. The p1 parameter
in a
Log sync in the corresponding LOg Buffer.
The Write complete waits comes
when the user process is waiting for the LGWR to complete the
writing of the
redo information. So Please take a look at the placing of the
online redo
log. In which disk have you placed this. Also if it has been place
in its
own file system it doesn't mean that it is not going to a busy disk.
Since
you have 0+1, The slices may belong to a HOT disk. What you need to
do is
ask your SA to monitor the disk I/O usage and see whats happening to
the
disk containing the redo log file. Also you can use iostat -xPnce 5
5 (I
think its available in AIX). the output will look something like
us sy wt id
23 7 49 21
extended device statistics ---- errors
---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn
tot
device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 13.4 0 0 0 0 0
c1t10d0s0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.4 0 0 0 0 0
c1t10d0s1
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0
c1t10d0s2
0.0 1.3 0.3 10.1 0.0 0.1 4.6 105.8 0 1 0 0 0
c1t10d0s3
Watch out for %b column. If its somewhere around 60 then you may be
experiencing contention on those slices. Ask you SA by giving
him/her the
device name and they will be able to give you what the FS is.
NOTE - 4 Datafiles Containing a Particular Table & it's
Corresponding indexes (to Which INSERTS
happen ) were Moved out of the RAID onto a Local Disk whereafter
performance Jumped from
850 per minute to 1100 per minute
Ans - OK . Will Issue the Command
Also I see
Free Buffer waits. This is caused because there are no free buffers
when
they are requested - pointing directly to the inefficiency of the
DBWR.
Again the DBWR may be slow because you have a I/O bottleneck for the
DB
files and indirectly because of the Write complete waits.
Qs. Would you recommend Setting Multiple db_writer processes
even though async_io is set to true ?
" Thus a Volume Group Contains 8 Disks with 4 Disks
> Stripe Size = 4K "
[Vivek] Please confirm me that the Stripe size is 4K. Usually you
will have
a stripe size of 32K,64K,128K,256K .... If you are having a stripe
size of
4K and your db_block_size as 8K, It is going to take 2 reads to
satisfy one
request for a db_file_sequential_read (Usually requests 1 Oracle
block)
[Vivek] . This doesn't sound good if thats true.
NOTE - The "test" Database is a 4K DB_BLOCK_SIZE while the Live
Database is
an 8K DB_BLOCK_SIZE, But the SAME performance Limits were Observed
on Both
So if possible could you please do the following.
1. Please shutdown the Test db and start fresh.
2. Run the same process again and also run the iostat -xPnce 60
10000 for
the duration of the run (output this to a file)
3. Run vmstat and see whats the CPU usage for USER and SYSTEM
4. When the 500 A/c per sec reaches - Please run
set lines 132
select sid, substr(event,1,40),p1,p2,p3 from v$session_wait
where event in ('db file scattered read','db file sequential
read','latch
free','buffer busy waits','log file sync','db file scattered
read','enqueue');
and give me the output.
OK - Will Action & revert to you
5. IF may waits are showing as "latch free" in the event column of
v$session_wait, Please query v$latch for
latch#,name,level,GETS,MISSES,SLEEPS IMMEDIATE_GETS
,WAITS_HOLDING_LATCH, SPIN_GETS, SLEEP1.
Ans OK
6. Preserve the output for v$latch_children.
Ans OK
Qs. Are their any IBM OS Specific Parameters which are considered
benefitial for performance ?
I will take a look at both the report.txt in more detail today and
get back
to you.
Thanks Again so much
Where are you Located / Phone No ? if you please
We are Currently Located in Bangalore & Addressing The Problem at
Mumbai over a WAN
Phone 91 80 6588668
HTH
Vivek
> -----Original Message-----
> From: VIVEK_SHARMA [SMTP:VIVEK_SHARMA_at_infy.com]
> Sent: Monday, November 27, 2000 11:16 AM
> To: 'Vivek'
> Cc: Arun Chakrapani
> Subject: RE: SOS Alert
>
> THANKS so much for the association in this issue
>
> Attached 2 report.txt each taken for 1 Hour approx . Is it
Sufficient ?
>
> Attached report.txt of Problem on the "Live" Database
>
> <<report.txt_Live>>
> NOTE - A "test" Database Created on the DB Server Itself in the
Same
> Partitions as the "Live" Database
> Runs Done on this "test" Database Also failing to go beyond 500
A/cs per
> minute
>
>
> <<report.txt_TEST>>
>
> NOTE - Interest Runs were Done on the "Test" Database During
Down-Time
> Periods of the "Live" Database
>
> You Wrote :-
> Since there is some write complete waits Please take a look
at the
> place the on line redo logs are and what is the read, write on
these
> disks.
>
> Ans Online Redo logs ALSO present on the RAID 0+1 Disks & are
Multiplexed
> Size of Online Redo logfile on the "Live" Database = 30 MB
>
> NOTE - On the test Database we increased the Size of the Online
Redo
> logfile
> to 100MB but with NO benefit
>
> You wrote :-
> Please give me the DB Version as well
>
> Ans Database Oracle version = 7.3.4.5.0
>
> You wrote :-
> I see some waits on More data from client.
> Qs. What is to be done about the Same ?
>
> NOTE - Will Run the 2 SQL Scripts Suggested by you & revert back
>
> Thanks Again
>
>
> > -----Original Message-----
> > From: Vivek [SMTP:Vivek_at_1800FLOWERS.com]
> > Sent: Monday, November 27, 2000 7:44 PM
> > To: 'VIVEK_SHARMA_at_infy.com'
> >
> > Cc: Arun Chakrapani
> > Subject: SOS Alert
> >
> > Hi Vivek:
> >
> > Will it be possible to send the whole report.txt to me. Since
you have
> not
> > given for how long this was taken (time from utlbstat and
utlestat) it
> is
> > more difficult to say anything. Based on the report.txt snippet
you
> have
> > given there seems to me there are some issues with the enqueues
and
> latch
> > free waits. Please run the following SQL as SYS so bet' 2
different
> times
> > (
> > say 1/2 apart) and send me both the values(It will be great if
you could
> > run
> > this when the batch processing is running).
> >
> > select KSQSTTYP,KSQSTGET,KSQSTWAT from x$ksqst where KSQSTGET>0
and
> > KSQSTWAT
> > >0;
> >
> > The delta bet' the 2 results will give you information on what
is the
> > enqueue that we see more waits on. Also Please run this sql
given below
> in
> > a
> > cron every minute or so and output this to a file
> >
> > SELECT p1 "File", p2 "Block", p3 "Reason" FROM v$session_wait
WHERE
> event
> > like 'buffer busy waits' ;
> >
> > This will give you and idea about the buffer busy waits. Since
there is
> > some
> > write complete waits Please take a look at the place the on line
redo
> logs
> > are and what is the read, write on these disks. Also Please give
me
> > infomation on the relog file size. Please give me the DB Version
as
> well.
> > I
> > see some waits on More data from client. Again since I don't
know for
> how
> > long the stats were run can't say whether they are significant
or not.
> >
> > HTH
> >
> > Vivek
> >
> >
> >
> > From: VIVEK_SHARMA <VIVEK_SHARMA_at_infy.com>
> > Date: Mon, 27 Nov 2000 12:11:59 +0530
> > Subject: SOS Alert
> >
> > CASE In a Bank, Interest Calculation Batch Processing Unable to
go
> beyond
> > 500 A/cs per minute
> >
> > CAUSE of the Problem is UNKNOWN
> >
> > ORACLE 7.3.4.5 on AIX 4.3.3
> > DB Server - IBM S80 Model - 12 CPUs, 3 GB RAM
> > APP Server 1 - IBM S80 Model - 6 CPUs, 2 GB RAM
> > APP Servers 2 & 3 - IBM Thin Nodes model - 4 CPUs, 1 GB RAM
> >
> > Storage Box :-
> > ===========
> > SS Class Storage
> > RAID 0+1 - (First Striped & then Mirrored)
> > NOTE - 2 Storage Boxes Exist, one being the Mirror of the Other
> > Striping exists across a set 4 Disks (in one Box) with another 4
being
> > it's
> > mirror
> > (in another Box).
> > Thus a Volume Group Contains 8 Disks with 4 Disks
> > Stripe Size = 4K
> >
> > NOTE - Runs Tried from BOTH the APP Servers 1 OR 2,But with the
SAME
> MAX
> > of
> > 500 A/cs
> > processed per minute
> >
> > CPU Utilizations on BOTH APP & DB Server = 40 %
> > wio% on BOTH APP & DB Servers = 35 %
> > No paging happening on Both APP & DB Servers
> >
> > - Oracle Contention Values Seem Small to us as shown Below or so
they
> seem
> > to us :-
> >
> > SVRMGR> Rem System wide wait events for non-background processes
(PMON,
> > SVRMGR> Rem SMON, etc). Times are in hundreths of seconds.
Each one of
>
> > SVRMGR> Rem these is a context switch which costs CPU time. By
looking
> at
> > SVRMGR> Rem the Total Time you can often determine what is the
> bottleneck
> > SVRMGR> Rem that processes are waiting for. This shows the
total time
> > spent
> > SVRMGR> Rem waiting for a specific event and the average time
per wait
> on
> > SVRMGR> Rem that event.
> > SVRMGR> select n1.event "Event Name",
> > 2> n1.event_count "Count",
> > 3> n1.time_waited "Total Time",
> > 4> round(n1.time_waited/n1.event_count, 2) "Avg Time"
> > 5> from stats$event n1
> > 6> where n1.event_count > 0
> > 7> order by n1.time_waited desc;
> > Event Name Count Total Time Avg
Time
>
> > -------------------------------- ------------- -------------
> -------------
> > SQL*Net message from client 10856276 31977110
> 2.95
> > enqueue 1295 374980
> 289.56
> > db file sequential read 3614044 303848
> .08
> > write complete waits 5812 295937
> 50.92
> > latch free 5045060 242170
> .05
> > SQL*Net more data from client 13939 165275
> 11.86
> > log file sync 12794 146409
> 11.44
> > buffer busy waits 100443 92477
> .92
> >
> >
> >
> > - ALL Literal SQLs were Converted to using Bind variables
> > - ALL Tables Running on Indexes Without Any FULL Scans happening
.
> >
> > - All the Literal SQLs (Dynamic) Converted to using Bind
variables
> (Static
> > Queries)
> >
> > - event="10181 trace name context forever, level 1000"
> > NOTE - Set nevertheless, None of the SQLs taking Excessive time
to
> parse
> > though
> >
> > - NO statistics are Analyzed
> > - 7 tables involved in the interest Calc.
> > inserts to 2 Tables
> > selects, updates in 5 tables
> >
> > Qs. Are there Any BASIC O.S./Hardware Features to Enhance
Performance on
> > IBM
> > AIX Systems ?
> >
> > Qs. Is Any Other portion of the report.txt required for perusal
?
> >
> > Please Revert to me for any further info / Clarifications ?
> >
> >
> >
> > << File: report.txt_Live >> << File: report.txt_TEST >>