FAQ
hi,

well, for a wannabe your quite good analyzing. I'd glad to have some wannabes like you..

first: the high free buffer waits indicates that you're reading a lot from disk. As the dbw usually writes asynchronously I suspect that it also writes with low priority. So this could explain the high average write times. I'd first check the buffer cache size to lower free buffer waits and than check again

regards

Felix Castillo Sanchez

Am 07.12.2010 um 17:35 schrieb Oracle Dba Wannabe :
Hi All, this is a 10.2.0.4 single instance database (non asm). I see the following events from awr (1 hour snapshot - however hourly snapshots after this show the same trend with respect to wait events):

Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
free buffer waits 17,926,869 193,146 11 67.7 Configuration
log file switch (private strand flush incomplete) 41,550 30,538 735 10.7 Configuration
log file sync 211,675 25,156 119 8.8 Commit
buffer busy waits 42,093 23,218 552 8.1 Concurrency
db file parallel write 376 14,274 37,963 5.0 System I/O

I know that db file parallel write only contributes to 5% of the total call time - but its avg wait time looks extremely poor - that and the fact that free buffer waits appear at top indicate that there's a db writer issue (db_writer_processes=4) - which leads me to believe perhaps its the IO subsystem. Now the storage team report there is nothing up with the storage. I was hoping someone could help with the following questions:
1. Is there someway from awr that I can determine that the Oracle server is issuing more IO than the storage system can handle for example:
Physical reads: 954.74 16.68
Physical writes: 418.89 7.32
Phy Reads + Phy Writes = 1372 IOPS
Can I then say that if each disk can do 100 IOPS, that the storage system should at least have 13 Disks? (13x100 IOPS)? Or is that an over simplification?

2. Interestingly this DB server was moved onto a new box with a different storage and the issue is no longer observed there.
Transactions Per Second on old box = 57.22
Transactions Per Second on new box = 225
Phy Reads + Phy Writes for New Box, are slightly under half compared to the old box:
Physical reads: 243.02 1.08
Physical writes: 564.62 2.51

That said, the redo size per second on the new box is twice that of the old box (7mb/s : 3mb/s). The buffer cache and db writer processes are the same on both boxes.
Other than the storage aspect of things, I'm thinking (and will check) whether theres an o.s/kernel misconfiguration w.r.t to async io, etc that might be causing this on the old box.
Appreciate any thoughts on 1 or/and 2
Thanks
--
http://www.freelists.org/webpage/oracle-l

Search Discussions

  • Taylor, Chris David at Dec 7, 2010 at 5:04 pm
    If you're going to post to the list, I'd personally rather address someone with a name. (From my own skeptical mind I want to question the motive behind not giving a real name, or semi-real name)

    You're questions show a pretty good understanding of some underlying principles - so I'm a little concerned what your angle is. Again, it may just be my skepticism, or you may in fact be angling for some information to use in a manner inconsistent with the principles of this list.

    Chris Taylor
    Sr. Oracle DBA
    Ingram Barge Company
    Nashville, TN 37205
    Office: 615-517-3355
    Cell: 615-663-1673
    Email: chris.taylor_at_ingrambarge.com

    CONFIDENTIALITY NOTICE: This e-mail and any attachments are confidential and may also be privileged. If you are not the named recipient, please notify the sender immediately and delete the contents of this message without disclosing the contents to anyone, using them for any purpose, or storing or copying the information on any medium.

    From: oracle-l-bounce_at_freelists.org On Behalf Of Oracle Dba Wannabe
    Sent: Tuesday, December 07, 2010 10:36 AM
    To: oracle-l@freelists.org
    Subject: Is my Oracle Server issuing more IO than it can handle

    Hi All, this is a 10.2.0.4 single instance database (non asm). I see the following events from awr (1 hour snapshot - however hourly snapshots after this show the same trend with respect to wait events):
    Event

    Waits

    Time(s)

    Avg Wait(ms)

    % Total Call Time

    Wait Class

    free buffer waits

    17,926,869

    193,146

    11

    67.7

    Configuration

    log file switch (private strand flush incomplete)

    41,550

    30,538

    735

    10.7

    Configuration

    log file sync

    211,675

    25,156

    119

    8.8

    Commit

    buffer busy waits

    42,093

    23,218

    552

    8.1

    Concurrency

    db file parallel write

    376

    14,274

    37,963

    5.0

    System I/O

    I know that db file parallel write only contributes to 5% of the total call time - but its avg wait time looks extremely poor - that and the fact that free buffer waits appear at top indicate that there's a db writer issue (db_writer_processes=4) - which leads me to believe perhaps its the IO subsystem. Now the storage team report there is nothing up with the storage. I was hoping someone could help with the following questions:
    1. Is there someway from awr that I can determine that the Oracle server is issuing more IO than the storage system can handle for example:
    Physical reads:

    954.74

    16.68

    Physical writes:

    418.89

    7.32

    Phy Reads + Phy Writes = 1372 IOPS
    Can I then say that if each disk can do 100 IOPS, that the storage system should at least have 13 Disks? (13x100 IOPS)? Or is that an over simplification?

    2. Interestingly this DB server was moved onto a new box with a different storage and the issue is no longer observed there.
    Transactions Per Second on old box = 57.22
    Transactions Per Second on new box = 225
    Phy Reads + Phy Writes for New Box, are slightly under half compared to the old box:
    Physical reads:

    243.02

    1.08

    Physical writes:

    564.62

    2.51

    That said, the redo size per second on the new box is twice that of the old box (7mb/s : 3mb/s). The buffer cache and db writer processes are the same on both boxes.
    Other than the storage aspect of things, I'm thinking (and will check) whether theres an o.s/kernel misconfiguration w.r.t to async io, etc that might be causing this on the old box.
    Appreciate any thoughts on 1 or/and 2
    Thanks
  • OKH at Dec 7, 2010 at 5:05 pm
    There is also another thing. Your log files are too small and you have too few of them. I suppose your log files and data files share the same disk(s).

    Felix Castillo Sanchez

    Am 07.12.2010 um 17:35 schrieb Oracle Dba Wannabe :
    Hi All, this is a 10.2.0.4 single instance database (non asm). I see the following events from awr (1 hour snapshot - however hourly snapshots after this show the same trend with respect to wait events):

    Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
    free buffer waits 17,926,869 193,146 11 67.7 Configuration
    log file switch (private strand flush incomplete) 41,550 30,538 735 10.7 Configuration
    log file sync 211,675 25,156 119 8.8 Commit
    buffer busy waits 42,093 23,218 552 8.1 Concurrency
    db file parallel write 376 14,274 37,963 5.0 System I/O

    I know that db file parallel write only contributes to 5% of the total call time - but its avg wait time looks extremely poor - that and the fact that free buffer waits appear at top indicate that there's a db writer issue (db_writer_processes=4) - which leads me to believe perhaps its the IO subsystem. Now the storage team report there is nothing up with the storage. I was hoping someone could help with the following questions:
    1. Is there someway from awr that I can determine that the Oracle server is issuing more IO than the storage system can handle for example:
    Physical reads: 954.74 16.68
    Physical writes: 418.89 7.32
    Phy Reads + Phy Writes = 1372 IOPS
    Can I then say that if each disk can do 100 IOPS, that the storage system should at least have 13 Disks? (13x100 IOPS)? Or is that an over simplification?

    2. Interestingly this DB server was moved onto a new box with a different storage and the issue is no longer observed there.
    Transactions Per Second on old box = 57.22
    Transactions Per Second on new box = 225
    Phy Reads + Phy Writes for New Box, are slightly under half compared to the old box:
    Physical reads: 243.02 1.08
    Physical writes: 564.62 2.51

    That said, the redo size per second on the new box is twice that of the old box (7mb/s : 3mb/s). The buffer cache and db writer processes are the same on both boxes.
    Other than the storage aspect of things, I'm thinking (and will check) whether theres an o.s/kernel misconfiguration w.r.t to async io, etc that might be causing this on the old box.
    Appreciate any thoughts on 1 or/and 2
    Thanks
    --
    http://www.freelists.org/webpage/oracle-l
  • Niall Litchfield at Dec 7, 2010 at 5:55 pm
    And finally the wait time for log file sync is way too high (like at least 1
    if not 2 orders of magnitude) . This can be an indication of either poorly
    performing storage (I got to learn this when we had a client with a duff
    disk in a raid 5 array on which the redo logs had been located!) or due to
    CPU starvation. I don't believe it is ever a symptom of memory problems
    though..

    Given your other stats, I'd *guess* at the disk subsystem, but want to take
    more observations.
    On Tue, Dec 7, 2010 at 5:05 PM, OKH wrote:

    There is also another thing. Your log files are too small and you have too
    few of them. I suppose your log files and data files share the same disk(s).



    Felix Castillo Sanchez

    Am 07.12.2010 um 17:35 schrieb Oracle Dba Wannabe <
    oracledbawannabe_at_yahoo.com>:

    Hi All, this is a 10.2.0.4 single instance database (non asm). I see the
    following events from awr (1 hour snapshot - however hourly snapshots after
    this show the same trend with respect to wait events):

    Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class free buffer
    waits 17,926,869 193,146 11 67.7 Configuration log file switch (private
    strand flush incomplete) 41,550 30,538 735 10.7 Configuration log file
    sync 211,675 25,156 119 8.8 Commit buffer busy waits 42,093 23,218 552 8.1
    Concurrency db file parallel write 376 14,274 37,963 5.0 System I/O
    I know that db file parallel write only contributes to 5% of the total call
    time - but its avg wait time looks extremely poor - that and the fact that
    free buffer waits appear at top indicate that there's a db writer issue
    (db_writer_processes=4) - which leads me to believe perhaps its the IO
    subsystem. Now the storage team report there is nothing up with the storage.
    I was hoping someone could help with the following questions:
    1. Is there someway from awr that I can determine that the Oracle server is
    issuing more IO than the storage system can handle for example:
    Physical reads: 954.74 16.68 Physical writes: 418.89 7.32Phy Reads + Phy
    Writes = 1372 IOPS
    Can I then say that if each disk can do 100 IOPS, that the storage system
    should at least have 13 Disks? (13x100 IOPS)? Or is that an over
    simplification?

    2. Interestingly this DB server was moved onto a new box with a different
    storage and the issue is no longer observed there.
    Transactions Per Second on old box = 57.22
    Transactions Per Second on new box = 225
    Phy Reads + Phy Writes for New Box, are slightly under half compared to the
    old box:
    Physical reads: 243.02 1.08 Physical writes: 564.62 2.51
    That said, the redo size per second on the new box is twice that of the old
    box (7mb/s : 3mb/s). The buffer cache and db writer processes are the same
    on both boxes.
    Other than the storage aspect of things, I'm thinking (and will check)
    whether theres an o.s/kernel misconfiguration w.r.t to async io, etc that
    might be causing this on the old box.
    Appreciate any thoughts on 1 or/and 2
    Thanks
    --
    Niall Litchfield
    Oracle DBA
    http://www.orawin.info

    --
    http://www.freelists.org/webpage/oracle-l
  • Harel Safra at Dec 7, 2010 at 7:12 pm
    This calculation is only true if you work with disk drive directly
    attached to you servers, and even then when there is no cache in the
    raid controller.
    Once you start working with central storage systems things like cache
    size, i/o distribution and disk sharing start to come into play.

    For example, our EMC dMX4 storage has 96GB of cache. If your database is
    smaller than that (and nothing else uses the cache in this example) you
    could get very high throughput even with a single physical disk.

    Do you know which kind of storage you use? Is you system the only system
    attached to it?

    Harel Safra
    On 07/12/2010 20:05, Oracle Dba Wannabe wrote:
    Do any of you have any thoughts w.r.t to question 1 - whether those
    calculations can be representative of the disks i may need.
    thanks
    ------------------------------------------------------------------------
    **1. Is there someway from awr that I can determine that the Oracle
    server is issuing more IO than the storage system can handle for example:
    Physical reads: 954.74 16.68
    Physical writes: 418.89 7.32

    Phy Reads + Phy Writes = 1372 IOPS
    Can I then say that if each disk can do 100 IOPS, that the
    storage system should at least have 13 Disks? (13x100 IOPS)? Or
    is that an over simplification?
    --
    http://www.freelists.org/webpage/oracle-l
  • Niall Litchfield at Dec 7, 2010 at 7:15 pm
    Hi Harel,

    I assume you intended to compare the working set of database changes to the
    cache size rather than the database size itself. The 2 things are not well
    related.

    On 7 Dec 2010 19:12, "Harel Safra" wrote:

    This calculation is only true if you work with disk drive directly attached
    to you servers, and even then when there is no cache in the raid controller.
    Once you start working with central storage systems things like cache size,
    i/o distribution and disk sharing start to come into play.

    For example, our EMC dMX4 storage has 96GB of cache. If your database is
    smaller than that (and nothing else uses the cache in this example) you
    could get very high throughput even with a single physical disk.

    Do you know which kind of storage you use? Is you system the only system
    attached to it?

    Harel Safra
    On 07/12/2010 20:05, Oracle Dba Wannabe wrote:


    Do any of you have any thoughts w.r.t to question 1 - whether those
    calculations can be represe...
    thanks

    **
    1. Is there someway from awr that I can determine that the Oracle server
    is issuing more IO than ...
  • Harel Safra at Dec 7, 2010 at 7:32 pm
    Hi Niall,
    This example was in perfect physics land where cows are spherical
    (http://en.wikipedia.org/wiki/Spherical_cow) and there is no friction.

    In the real world you comment holds very true and you need to compare
    the active part of the database to the cache size.

    Harel Safra
    On 07/12/2010 21:15, Niall Litchfield wrote:

    Hi Harel,

    I assume you intended to compare the working set of database changes
    to the cache size rather than the database size itself. The 2 things
    are not well related.

    On 7 Dec 2010 19:12, "Harel Safra" > wrote:

    This calculation is only true if you work with disk drive directly
    attached to you servers, and even then when there is no cache in the
    raid controller.
    Once you start working with central storage systems things like cache
    size, i/o distribution and disk sharing start to come into play.

    For example, our EMC dMX4 storage has 96GB of cache. If your database
    is smaller than that (and nothing else uses the cache in this example)
    you could get very high throughput even with a single physical disk.

    Do you know which kind of storage you use? Is you system the only
    system attached to it?

    Harel Safra


    On 07/12/2010 20:05, Oracle Dba Wannabe wrote:

    Do any of you have any thoughts w.r.t to question 1 - whether those
    calculations can be represe...

    thanks
    ------------------------------------------------------------------------
    **

    1. Is there someway from awr that I can determine that the Oracle
    server is issuing more IO than ...
    --
    http://www.freelists.org/webpage/oracle-l
  • Taylor, Chris David at Dec 7, 2010 at 8:37 pm
    I don't "think" this is exactly right. One physical disk can handle [realistically - assuming 20ms response time on each disk]:

    15K SCSI = 135 IOPS
    10K SCSI = 90 IOPS
    10K SATA = 50 IOPS

    7200RPM SATA = 38 IOPS

    So, depending on the single physical disk, you're still going to run into the IOPS cap on that disk. Let's say your database is small but is generating enough changes to fill up the cache [assuming I understand the way the cache works in your case], the storage server is going to have to write that information out to that 1 disk at some point and I would assume it is going to write it out faster than 135 IOPS if the changes stored in the cache are changing faster than 135 IOPS.

    Am I off base on that?

    Chris Taylor
    Sr. Oracle DBA
    Ingram Barge Company
    Nashville, TN 37205
    Office: 615-517-3355
    Cell: 615-663-1673
    Email: chris.taylor_at_ingrambarge.com

    CONFIDENTIALITY NOTICE: This e-mail and any attachments are confidential and may also be privileged. If you are not the named recipient, please notify the sender immediately and delete the contents of this message without disclosing the contents to anyone, using them for any purpose, or storing or copying the information on any medium.

    From: oracle-l-bounce_at_freelists.org On Behalf Of Harel Safra
    Sent: Tuesday, December 07, 2010 1:13 PM
    To: oracledbawannabe_at_yahoo.com
    Cc: Niall Litchfield; okh_at_oraconsult.de; oracle-l@freelists.org
    Subject: Re: Is my Oracle Server issuing more IO than it can handle

    This calculation is only true if you work with disk drive directly attached to you servers, and even then when there is no cache in the raid controller.
    Once you start working with central storage systems things like cache size, i/o distribution and disk sharing start to come into play.

    For example, our EMC dMX4 storage has 96GB of cache. If your database is smaller than that (and nothing else uses the cache in this example) you could get very high throughput even with a single physical disk.

    Do you know which kind of storage you use? Is you system the only system attached to it?

    Harel Safra

    On 07/12/2010 20:05, Oracle Dba Wannabe wrote:
    Do any of you have any thoughts w.r.t to question 1 - whether those calculations can be representative of the disks i may need.
    thanks

    Is there someway from awr that I can determine that the Oracle server is issuing more IO than the storage system can handle for example:
    Physical reads:

    954.74

    16.68

    Physical writes:

    418.89

    7.32

    Phy Reads + Phy Writes = 1372 IOPS
    Can I then say that if each disk can do 100 IOPS, that the storage system should at least have 13 Disks? (13x100 IOPS)? Or is that an over simplification?
  • Amaral, Rui at Dec 7, 2010 at 7:32 pm
    you said moved to a different storage so a couple of questions come to mind on this:

    ask your storage guys how the 2 differ in configuration, ie., RAID types, FC configuration, I/O port sharing ( different db systems sharing the same port can be a problem)

    from the OS how the filesystem is configured. Misaligned block sizes between the array and the os filesystem can slow things down (think of it in terms of chained rows and you'll get close to the idea). This applies equally to ASM and regular filesystems

    and of course if they can tell if the luns are on the outside of the physical disks or not.

    in terms of calculating IOPS use Orion or Bonnie for pure OS throughput and then go from there. Just some thoughts

    Rui Amaral
    Database Administrator
    ITS - SSG

    TD Bank Financial Group
    220 Bay St., 11th Floor
    Toronto, ON, CA, M5K1A2
    (bb) (647) 204-9106

    From: oracle-l-bounce_at_freelists.org On Behalf Of Oracle Dba Wannabe
    Sent: Tuesday, December 07, 2010 2:26 PM
    To: Harel Safra
    Cc: Niall Litchfield; okh_at_oraconsult.de; oracle-l@freelists.org
    Subject: Re: Is my Oracle Server issuing more IO than it can handle

    No in fact a san is being used which is being shared with other database servers in the environment - so based on what you;ve said the san cache would be shared too. The servers are hp servers connected to an xp san.
    That said, if this hp server was the only one attached to it - the calculation i used below would then be valid? If not does anyone know how to calculate how many iops are possible from a storage and how much the oracle server is issuing?
    thanks

    From: Harel Safra
    To: oracledbawannabe_at_yahoo.com
    Cc: Niall Litchfield; okh_at_oraconsult.de; "oracle-l_at_freelists.org"
    Sent: Wed, December 8, 2010 12:12:42 AM
    Subject: Re: Is my Oracle Server issuing more IO than it can handle

    This calculation is only true if you work with disk drive directly attached to you servers, and even then when there is no cache in the raid controller.
    Once you start working with central storage systems things like cache size, i/o distribution and disk sharing start to come into play.

    For example, our EMC dMX4 storage has 96GB of cache. If your database is smaller than that (and nothing else uses the cache in this example) you could get very high throughput even with a single physical disk.

    Do you know which kind of storage you use? Is you system the only system attached to it?

    Harel Safra

    On 07/12/2010 20:05, Oracle Dba Wannabe wrote:
    Do any of you have any thoughts w.r.t to question 1 - whether those calculations can be representative of the disks i may need.
    thanks

    Is there someway from awr that I can determine that the Oracle server is issuing more IO than the storage system can handle for example:
    Physical reads: 954.74 16.68
    Physical writes: 418.89 7.32
    Phy Reads + Phy Writes = 1372 IOPS
    Can I then say that if each disk can do 100 IOPS, that the storage system should at least have 13 Disks? (13x100 IOPS)? Or is that an over simplification?

    NOTICE: Confidential message which may be privileged. Unauthorized use/disclosure prohibited. If received in error, please go to www.td.com/legal for instructions.
    AVIS : Message confidentiel dont le contenu peut être privilégié. Utilisation/divulgation interdites sans permission. Si reçu par erreur, prière d'aller au www.td.com/francais/avis_juridique pour des instructions.
  • OKH at Dec 7, 2010 at 7:36 pm
    As far as I know there is no way but talking with the SAN guys. But - you cannot issue more IOPS than the subsystem can handle as it would just increase the queue and you'd get higher wait times. The system performs as much as it can.
    You should take a look at v$event_histogram to check how this average values have been distributed. Using SAN you should have lots of IOs in 1ms, 2ms and 4ms. The higher the value the more the SAN is accessing the disks directly because it cannot cache. Everything that goes beyond the average access of the disk is an indication for too much IO Requests and therefore higher wait times.

    Take a look at http://blog.oraconsult.de/2010/09/analyse-von-wait-events-gewichtung-der-wartezeiten. it is in German but the graph is self explaining

    regards

    Felix Castillo Sanchez

    Am 07.12.2010 um 20:25 schrieb Oracle Dba Wannabe :
    No in fact a san is being used which is being shared with other database servers in the environment - so based on what you;ve said the san cache would be shared too. The servers are hp servers connected to an xp san.
    That said, if this hp server was the only one attached to it - the calculation i used below would then be valid? If not does anyone know how to calculate how many iops are possible from a storage and how much the oracle server is issuing?
    thanks

    From: Harel Safra
    To: oracledbawannabe_at_yahoo.com
    Cc: Niall Litchfield; okh_at_oraconsult.de; "oracle-l_at_freelists.org"
    Sent: Wed, December 8, 2010 12:12:42 AM
    Subject: Re: Is my Oracle Server issuing more IO than it can handle

    This calculation is only true if you work with disk drive directly attached to you servers, and even then when there is no cache in the raid controller.
    Once you start working with central storage systems things like cache size, i/o distribution and disk sharing start to come into play.

    For example, our EMC dMX4 storage has 96GB of cache. If your database is smaller than that (and nothing else uses the cache in this example) you could get very high throughput even with a single physical disk.

    Do you know which kind of storage you use? Is you system the only system attached to it?

    Harel Safra
    On 07/12/2010 20:05, Oracle Dba Wannabe wrote:


    Do any of you have any thoughts w.r.t to question 1 - whether those calculations can be representative of the disks i may need.
    thanks
    1. Is there someway from awr that I can determine that the Oracle server is issuing more IO than the storage system can handle for example:
    Physical reads: 954.74 16.68
    Physical writes: 418.89 7.32 Phy Reads + Phy Writes = 1372 IOPS
    Can I then say that if each disk can do 100 IOPS, that the storage system should at least have 13 Disks? (13x100 IOPS)? Or is that an over simplification?
    >
  • Amaral, Rui at Dec 7, 2010 at 8:36 pm
    When you have some time you can check :

    http://jamesmorle.wordpress.com/

    He's a founder of BAARF - Battle Against All Raid 5 - and very knowledgable about storage subsystems in the Oracle world.

    Rui Amaral
    Database Administrator
    ITS - SSG

    TD Bank Financial Group
    220 Bay St., 11th Floor
    Toronto, ON, CA, M5K1A2
    (bb) (647) 204-9106


    -----Original Message-----
    From: oracle-l-bounce_at_freelists.org On Behalf Of Amaral, Rui
    Sent: Tuesday, December 07, 2010 3:34 PM
    To: 'Oracle Dba Wannabe'; harel.safra_at_gmail.com
    Cc: niall.litchfield_at_gmail.com; okh_at_oraconsult.de; oracle-l_at_freelists.org
    Subject: RE: Is my Oracle Server issuing more IO than it can handle

    It's a good chance that it's cache coming into play - though that is a very big generalization. Reads on a storage subsystem are typically faster because it doesn't have to worry about such things as parity whereas writes do (this is where you're RAID levels come into play).

    Rui Amaral
    Database Administrator
    ITS - SSG

    TD Bank Financial Group
    220 Bay St., 11th Floor
    Toronto, ON, CA, M5K1A2
    (bb) (647) 204-9106


    -----Original Message-----
    From: Oracle Dba Wannabe
    Sent: Tuesday, December 07, 2010 3:28 PM
    To: Amaral, Rui; harel.safra_at_gmail.com
    Cc: niall.litchfield_at_gmail.com; okh_at_oraconsult.de; oracle-l_at_freelists.org
    Subject: RE: Is my Oracle Server issuing more IO than it can handle

    Thanks, just another thought - if I go with the theory of the storage having an issue, I notice other io related requests - reads such as db file sequential/scattered having avg wait times under or at 5ms during this issue. Would that then imply that perhaps the storage is not at fault. I would have thought if the storage was io bound, that reads would be poor too? Or is this cache coming into play....thanks again
    On Tue Dec 7th, 2010 2:32 PM EST Amaral, Rui wrote:

    you said moved to a different storage so a couple of questions come to mind on this:

    1) ask your storage guys how the 2 differ in configuration, ie., RAID
    types, FC configuration, I/O port sharing ( different db systems
    sharing the same port can be a problem)

    2) from the OS how the filesystem is configured. Misaligned block sizes
    between the array and the os filesystem can slow things down (think of
    it in terms of chained rows and you'll get close to the idea). This
    applies equally to ASM and regular filesystems

    3) and of course if they can tell if the luns are on the outside of the physical disks or not.

    in terms of calculating IOPS use Orion or Bonnie for pure OS throughput
    and then go from there. Just some thoughts

    Rui Amaral
    Database Administrator
    ITS - SSG
    TD Bank Financial Group
    220 Bay St., 11th Floor
    Toronto, ON, CA, M5K1A2
    (bb) (647) 204-9106



    ________________________________
    From: oracle-l-bounce_at_freelists.org
    On Behalf Of Oracle Dba Wannabe
    Sent: Tuesday, December 07, 2010 2:26 PM
    To: Harel Safra
    Cc: Niall Litchfield; okh_at_oraconsult.de; oracle-l_at_freelists.org
    Subject: Re: Is my Oracle Server issuing more IO than it can handle

    No in fact a san is being used which is being shared with other database servers in the environment - so based on what you;ve said the san cache would be shared too. The servers are hp servers connected to an xp san.
    That said, if this hp server was the only one attached to it - the calculation i used below would then be valid? If not does anyone know how to calculate how many iops are possible from a storage and how much the oracle server is issuing?
    thanks

    ________________________________
    From: Harel Safra
    To: oracledbawannabe_at_yahoo.com
    Cc: Niall Litchfield; okh_at_oraconsult.de;
    "oracle-l@freelists.org"
    Sent: Wed, December 8, 2010 12:12:42 AM
    Subject: Re: Is my Oracle Server issuing more IO than it can handle

    This calculation is only true if you work with disk drive directly attached to you servers, and even then when there is no cache in the raid controller.
    Once you start working with central storage systems things like cache size, i/o distribution and disk sharing start to come into play.

    For example, our EMC dMX4 storage has 96GB of cache. If your database is smaller than that (and nothing else uses the cache in this example) you could get very high throughput even with a single physical disk.

    Do you know which kind of storage you use? Is you system the only system attached to it?

    Harel Safra

    On 07/12/2010 20:05, Oracle Dba Wannabe wrote:
    Do any of you have any thoughts w.r.t to question 1 - whether those calculations can be representative of the disks i may need.
    thanks
    ________________________________
    1. Is there someway from awr that I can determine that the Oracle server is issuing more IO than the storage system can handle for example:
    Physical reads: 954.74 16.68
    Physical writes: 418.89 7.32
    Phy Reads + Phy Writes = 1372 IOPS
    Can I then say that if each disk can do 100 IOPS, that the storage system should at least have 13 Disks? (13x100 IOPS)? Or is that an over simplification?



    NOTICE: Confidential message which may be privileged. Unauthorized use/disclosure prohibited. If received in error, please go to www.td.com/legal for instructions.
    AVIS : Message confidentiel dont le contenu peut être privilégié. Utilisation/divulgation interdites sans permission. Si reçu par erreur, prière d'aller au www.td.com/francais/avis_juridique pour des instructions.


    --
    http://www.freelists.org/webpage/oracle-l

    --
    http://www.freelists.org/webpage/oracle-l
  • Rich Jesse at Dec 7, 2010 at 8:36 pm
    For the first question: your storage vendor should give you this number.
    I wouldn't rely on that number at all, as it's highly dependent on factors
    external to the SAN, most notably what's attached to it and how each is
    configured. Slower CPUs with slower IO cards cannot generate as many IO
    requests as faster CPUs and/or faster IO cards.

    The only way to be sure (is to nuke the site from orbit!) is to test it
    yourself with your own equipment.
    For the second question: You didn't mention you OS. In Linux/Unix you'd
    use iostat or something fancier if it's installed. In windows you'd use
    perfmon.
    I may have missed a response or three, but I don't think anyone's mentioned
    to ask if the filesystem is set for caching or DIO, as well as the
    asynchronous setting.

    Rich
  • Harel Safra at Dec 8, 2010 at 8:32 pm
  • Amaral, Rui at Dec 9, 2010 at 2:04 pm
    I am assuming you are referring to dbc_min_pct and dbc_max_pct and assuming that nbuf and bufpages are set to 0 then they would be dynamic so yes.

    But there's also a couple of other things to look at that might be different too:

    mincache
    convosync

    This would be regarding double buffering between the db and OS.

    Rui Amaral
    Database Administrator
    ITS - SSG

    TD Bank Financial Group
    220 Bay St., 11th Floor
    Toronto, ON, CA, M5K1A2
    (bb) (647) 204-9106


    -----Original Message-----
    From: oracle-l-bounce_at_freelists.org On Behalf Of Laimutis.Nedzinskas_at_seb.lt
    Sent: Thursday, December 09, 2010 2:04 AM
    To: Oracle-L_at_freelists.org
    Subject: Re: Is my Oracle Server issuing more IO than it can handle
    - is there a possibility that the extra memory on the server that
    performs
    well is being utilized by the file system buffer cache and therefore allowing writes or the acknowledgment of those writes to perform quicker (no direct io being used) - compared to the server with less memory?

    according to the recent findings we have on one of our databases - yes.
    After we switched to direct IO the performance of full table scans went down, dba_hist_seg_stats shows longer times.

    Please consider the environment before printing this e-mail

    Oracle Dba
    Wannabe
    Oracle-L_at_freelists.org
    Sent by: cc
    oracle-l-bounce_at_f
    reelists.org Subject
    Is my Oracle Server issuing more IO
    than it can handle
    2010.12.08 20:27

    Please respond to
    oracledbawannabe_at_
    yahoo.com

    Hi All,
    So I was wrong about a different storage being used - well basically the volumes on the poor performing server have been mounted onto another server which uses a different san. The volumes are veritas file systems for the HP-UX platform. So I'd assume the following would not really come into
    play:
    1. size and number of redo log files - since these are the same volumes on the server that perform poorly and the one that performs with acceptable performance 2. the possibility of a faulty disk - again its the same volumes so issues with the disk would apply to both servers - since again same volumes One difference I do see, however, is that the server that performs poorly has 32gb (and 5 procs) of memory whereas the one that performs well has 64gb (16 procs). Now both databases have an 8gb buffer cache (the good performing server however also hosts another oracle database). Now, that said, HP-UX has two kernel parameters that control the file system buffer cache - which by default (from what I've read) is pretty generous - is there a possibility that the extra memory on the server that performs well is being utilized by the file system buffer cache and therefore allowing writes or the acknowledgment of those writes to perform quicker (no direct io being used) - compared to
    the server with less memory?
    Here is a side by side comparison of the load profile:
    ---------+---------+---------+---------+---------+---------+------------|
    TPS |Redo |PhyW |PhyR |LogicalR |BlockChanges|
    ---------+---------+---------+---------+---------+---------+------------|
    Poor |54/s |4.2mb/s |428/s |1983/s |13,000/s |9155/s |
    Server | | | | | | |
    ---------+---------+---------+---------+---------+---------+------------|
    Good |225/s |7.4mb/s |564/s |243/s |110,000/s|7500/s |
    Server | | | | | | |
    ---------+---------+---------+---------+---------+---------+------------|
    Here's a comparison of top 5 events for both servers:

    Poor Server:

    Top 5 Timed Events
    ----------------+----------+-------+---------+---------+-------------|
    Event | Waits |Time(s)|Avg Wait | % Total | Wait Class |
    (ms) |Call Time| |
    ----------------+----------+-------+---------+---------+-------------|
    free buffer |12,471,382|133,152| 11| 52.7|Configuration|
    waits | | | | | |
    ----------------+----------+-------+---------+---------+-------------|
    log file switch | 50,033| 38,433| 768| 15.2|Configuration|
    (checkpoint | | | | | |
    incomplete) | | | | | |
    ----------------+----------+-------+---------+---------+-------------|
    log file sync | 198,985| 30,749| 155| 12.2|Commit |
    ----------------+----------+-------+---------+---------+-------------|
    buffer busy | 46,927| 17,681| 377| 7.0|Concurrency |
    waits? | | | | | |
    ----------------+----------+-------+---------+---------+-------------|
    db file parallel| 373| 13,772| 36,921| 5.5,|System I/O |
    write | | | | | |
    ----------------+----------+-------+---------+---------+-------------|
    Good Server:

    Top 5 Timed Events
    ---------------+-------+-------+-----------+-------------+---------|
    Event | Waits |Time(s)| Avg Wait |% Total Call | Wait |
    (ms) | Time | Class |
    ---------------+-------+-------+-----------+-------------+---------|
    log file sync |821,418| 35,503| 43| 83.2|Commit |
    ---------------+-------+-------+-----------+-------------+---------|
    CPU time | | 7,158| | 16.8| |
    ---------------+-------+-------+-----------+-------------+---------|
    db file |128,596| 2,851| 22| 6.7|System |
    parallel write | | | | |I/O |
    ---------------+-------+-------+-----------+-------------+---------|
    log file |235,049| 2,845| 12| 6.7|System |
    parallel write | | | | |I/O |
    ---------------+-------+-------+-----------+-------------+---------|
    db file |179,777| 727| 4| 1.7|User I/O |
    sequential read| | | | | |
    ---------------+-------+-------+-----------+-------------+---------|
    --
    http://www.freelists.org/webpage/oracle-l

    NOTICE: Confidential message which may be privileged. Unauthorized use/disclosure prohibited. If received in error, please go to www.td.com/legal for instructions.
    AVIS : Message confidentiel dont le contenu peut être privilégié. Utilisation/divulgation interdites sans permission. Si reçu par erreur, prière d'aller au www.td.com/francais/avis_juridique pour des instructions.
    --
    http://www.freelists.org/webpage/oracle-l
  • Amaral, Rui at Dec 9, 2010 at 2:14 pm
    Power failure is not the only thing can affect disk caches and yes my previous company had it happen during a patch on the array gone bad.

    I agree you're numbers are good but I have had it the other way around.... now I do admin maybe because not everyone was on the same page in configuring it because it was a shared resource with different demands on the array.

    Rui Amaral
    Database Administrator
    ITS - SSG

    TD Bank Financial Group
    220 Bay St., 11th Floor
    Toronto, ON, CA, M5K1A2
    (bb) (647) 204-9106

    From: Harel Safra
    Sent: Wednesday, December 08, 2010 11:42 AM
    To: Amaral, Rui
    Cc: 'Oracle Dba Wannabe'; niall.litchfield_at_gmail.com; okh_at_oraconsult.de; oracle-l_at_freelists.org
    Subject: Re: Is my Oracle Server issuing more IO than it can handle

    On 08/12/2010 17:13, Amaral, Rui wrote:

    Yes in theory that can be true but it depends on several factors:

    Why in theory? Please have a look at a snapshot of statistics from one node of our data warehouse cluster running on EMC DMX4 storage:
    AVG_WRITE_TIME TOTAL_WRITES AVG_READ_TIME TOTAL_READS

    1.1 40460 26.78 35864

    Much better write times than read times (the high read time is also contributed from the large IO the database tends to do).

    assuming write-back caches on the array have not been disabled (which in my opinion they should be)

    Why do you think write back cache should be disabled? Proper storage systems have sufficient battery backups built in to flush the whole cache to disk in case of power failure.

    Harel Safra

    NOTICE: Confidential message which may be privileged. Unauthorized use/disclosure prohibited. If received in error, please go to www.td.com/legal for instructions.
    AVIS : Message confidentiel dont le contenu peut être privilégié. Utilisation/divulgation interdites sans permission. Si reçu par erreur, prière d'aller au www.td.com/francais/avis_juridique pour des instructions.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedDec 7, '10 at 4:57p
activeDec 9, '10 at 2:14p
posts15
users6
websiteoracle.com

People

Translate

site design / logo © 2022 Grokbase