FAQ
We need some help.
RAC, Oracle 10.2.0.3 on windows 2003 servers 64 bit.

We did a fail over test. We disconnected one server from the network by
pulling the network cable.
The system worked fine, but once in a while a connection will take 6
seconds instead on 20 ms.
We understand that this happens because the VIP is moved to the second
computer and there is
nothing there to handle calls on that TCP address.

I would like to know how to shorten the time from 6 seconds to almost
nothing.

--
Adar Yechiel
Rechovot, Israel

--
http://www.freelists.org/webpage/oracle-l

Search Discussions

  • Andrey.Kriushin at Sep 2, 2008 at 10:02 am
    Hi,

    Are you using a cross-over cable or any active network device (hub,
    switch, ...) for the cluster interconnect?

    Also you can search ML as well as MS support for keywords "Media Sense"

    HTH

    Andrey

    Yechiel Adar wrote:
    We need some help.
    RAC, Oracle 10.2.0.3 on windows 2003 servers 64 bit.

    We did a fail over test. We disconnected one server from the network
    by pulling the network cable.
    The system worked fine, but once in a while a connection will take 6
    seconds instead on 20 ms.
    We understand that this happens because the VIP is moved to the second
    computer and there is
    nothing there to handle calls on that TCP address.

    I would like to know how to shorten the time from 6 seconds to almost
    nothing.
    --
    ________________________________________________________________________
    Andrey KRIUSHIN (Oracle9i Certified Master), Oracle products expert
    "Grid & Consolidation" Competence Center Director
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    E-mail: Andrey.Kriushin_at_rdtex.ru Mobile: +7 903 593 2408
    Phone: +7(4967)744581 (ext: 435) Fax: +7(4967)740754

    --
    http://www.freelists.org/webpage/oracle-l
  • Yechiel Adar at Sep 2, 2008 at 10:18 am
    We are using a switch for interconnect.
    There are no network problems.
    The problem is that after the VIP is moved to the other node, connecting
    through this VIP causes a six seconds wait before sqlnet get
    notification that this VIP is not working and please try the other nodes
    on the RAC.

    Adar Yechiel
    Rechovot, Israel

    Andrey.Kriushin wrote:
    Hi,

    Are you using a cross-over cable or any active network device (hub,
    switch, ...) for the cluster interconnect?

    Also you can search ML as well as MS support for keywords "Media Sense"

    HTH
    -- Andrey
    --
    http://www.freelists.org/webpage/oracle-l
  • Riyaj Shamsudeen at Sep 2, 2008 at 1:35 pm
    Adar

    Whole point of VIP is to communicate back to the client quickly that
    no listener is running in that IP address and so connection will try
    next failover address. Without VIP, connection has to wait for TCP
    timeouts and then failover. Default TCP timeout values are long
    affecting failover speed.

    Can you post your tnsnames string and anything in sqlnet.ora from
    the client machine? I think, there is a problem in the search order for
    next failover address.

    Cheers
    Riyaj
    The Pythian Group http://www.pythian.com
    Personal : http://orainternals.wordpress.com


    Yechiel Adar wrote:
    We are using a switch for interconnect.
    There are no network problems.
    The problem is that after the VIP is moved to the other node,
    connecting through this VIP causes a six seconds wait before sqlnet
    get notification that this VIP is not working and please try the other
    nodes on the RAC.

    Adar Yechiel
    Rechovot, Israel
    --
    http://www.freelists.org/webpage/oracle-l
  • Yechiel Adar at Sep 2, 2008 at 3:18 pm
    Problem update:

    We run trace level=16 and found that failover works fine. The client
    immediately get: NO LISTENER and fails over as expected.
    From time to time the response is bad and we get: TNS TIME OUT instead
    of NO LISTENER.

    Adar Yechiel
    Rechovot, Israel

    Yechiel Adar wrote:
    We need some help.
    RAC, Oracle 10.2.0.3 on windows 2003 servers 64 bit.

    We did a fail over test. We disconnected one server from the network
    by pulling the network cable.
    The system worked fine, but once in a while a connection will take 6
    seconds instead on 20 ms.
    We understand that this happens because the VIP is moved to the second
    computer and there is
    nothing there to handle calls on that TCP address.

    I would like to know how to shorten the time from 6 seconds to almost
    nothing.
    --
    http://www.freelists.org/webpage/oracle-l
  • Bobak, Mark at Sep 2, 2008 at 3:23 pm
    Are you sure you are referencing *only* the VIP, and *NOT* the physical IP, not only in the connect string definition, but also in the local_listener, and remote_listener parameter, on all nodes? Please check all of these.

    -Mark

    --
    Mark J. Bobak
    Senior Database Administrator, System & Product Technologies
    ProQuest
    789 E. Eisenhower, Parkway, P.O. Box 1346
    Ann Arbor MI 48106-1346
    +1.734.997.4059 or +1.800.521.0600 x 4059
    mark.bobak_at_proquest.com
    www.proquest.com
    www.csa.com

    ProQuest...Start here.

    -----Original Message-----
    From: oracle-l-bounce_at_freelists.org On Behalf Of Yechiel Adar
    Sent: Tuesday, September 02, 2008 11:18 AM
    Cc: ORACLE-L
    Subject: Re: Long connect time when one node in RAC goes down

    Problem update:

    We run trace level=16 and found that failover works fine. The client
    immediately get: NO LISTENER and fails over as expected.
    From time to time the response is bad and we get: TNS TIME OUT instead
    of NO LISTENER.

    Adar Yechiel
    Rechovot, Israel

    Yechiel Adar wrote:
    We need some help.
    RAC, Oracle 10.2.0.3 on windows 2003 servers 64 bit.

    We did a fail over test. We disconnected one server from the network
    by pulling the network cable.
    The system worked fine, but once in a while a connection will take 6
    seconds instead on 20 ms.
    We understand that this happens because the VIP is moved to the second
    computer and there is
    nothing there to handle calls on that TCP address.

    I would like to know how to shorten the time from 6 seconds to almost
    nothing.
    --
    http://www.freelists.org/webpage/oracle-l

    --
    http://www.freelists.org/webpage/oracle-l
  • Yechiel Adar at Sep 2, 2008 at 4:19 pm
    Hello Mark

    We connect from the client with failover and load balance to vip-node1
    and vip-node2.
    We pulled the plug on node2.
    Oracle moved vip-node2 to node1, of course there is no listener on that
    address + port in node1.
    We run connections with trace and found out that whenever we access via
    vip-node2 we have two errors in the trace file:
    1) 12541 - no listener - (as expected) takes about 1 second to return
    the error.
    2) 12535 - tns time out - takes about 6-7 seconds to return the error.

    After the error, sql net use vip-node1 and make a connection.

    We would like to eliminate the 12535 errors and shorted the 1 second for
    the 12541.

    I do not see where the local listener and remote listeners as of any
    concern here.

    Adar Yechiel
    Rechovot, Israel

    Bobak, Mark wrote:
    Are you sure you are referencing *only* the VIP, and *NOT* the physical IP, not only in the connect string definition, but also in the local_listener, and remote_listener parameter, on all nodes? Please check all of these.

    -Mark

    --
    Mark J. Bobak
    --
    http://www.freelists.org/webpage/oracle-l
  • freek D'Hooge at Sep 2, 2008 at 10:30 pm
    Hi,

    Can you post the output of the lsnrctl status command?
    What I would like to see is which ip address is used by an instance to
    registrate to the remote listener. It could be that your session was
    connection at node 1, but was redirected to node 2 (due to server load
    balancing), but was given the real address of node 2 instead of the vip
    address.

    Controlling which ip will be used (real or vip) is done via the
    local_listener parameter.

    Regards,



    Freek D'Hooge
    Uptime
    Oracle Database Administrator
    email: freek.dhooge_at_uptime.be
    tel +32(0)3 451 23 82
    http://www.uptime.be
    disclaimer

    -----Original Message-----
    From: oracle-l-bounce_at_freelists.org
    On Behalf Of Yechiel Adar
    Sent: dinsdag 2 september 2008 18:20
    Cc: ORACLE-L
    Subject: Re: Long connect time when one node in RAC goes down

    Hello Mark

    We connect from the client with failover and load balance to vip-node1
    and vip-node2.
    We pulled the plug on node2.
    Oracle moved vip-node2 to node1, of course there is no listener on that
    address + port in node1.
    We run connections with trace and found out that whenever we access via
    vip-node2 we have two errors in the trace file:
    1) 12541 - no listener - (as expected) takes about 1 second to return
    the error.
    2) 12535 - tns time out - takes about 6-7 seconds to return the error.

    After the error, sql net use vip-node1 and make a connection.

    We would like to eliminate the 12535 errors and shorted the 1 second for
    the 12541.

    I do not see where the local listener and remote listeners as of any
    concern here.

    Adar Yechiel
    Rechovot, Israel

    --
    http://www.freelists.org/webpage/oracle-l
  • Finn Jorgensen at Sep 3, 2008 at 12:15 am
    Adar,

    Did you read note 226880.1, which has step by step instructions in how to
    set up TAF? It explains what local_listener and remote_listener is used for
    and should fix your problem. I've never had a problem with a setup when
    following that note.

    Finn
    On Tue, Sep 2, 2008 at 6:30 PM, freek D'Hooge wrote:

    Hi,

    Can you post the output of the lsnrctl status command?
    What I would like to see is which ip address is used by an instance to
    registrate to the remote listener. It could be that your session was
    connection at node 1, but was redirected to node 2 (due to server load
    balancing), but was given the real address of node 2 instead of the vip
    address.

    Controlling which ip will be used (real or vip) is done via the
    local_listener parameter.
    --
    http://www.freelists.org/webpage/oracle-l
  • Ujang Jaenudin at Sep 3, 2008 at 8:08 am
    adar,

    i have same problem....
    it because in the listener entry of this line:
    (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))

    should at the first line in the address_list section.

    local_listener and remote_listener should be set of course.

    --
    thanks and regards
    ujang | oracle dba
    jakarta | http://ora62.wordpress.com
    On Wed, Sep 3, 2008 at 7:15 AM, Finn Jorgensen wrote:
    Adar,

    Did you read note 226880.1, which has step by step instructions in how to
    set up TAF? It explains what local_listener and remote_listener is used for
    and should fix your problem. I've never had a problem with a setup when
    following that note.

    Finn


    On Tue, Sep 2, 2008 at 6:30 PM, freek D'Hooge
    wrote:
    --
    http://www.freelists.org/webpage/oracle-l
  • KRIUSHIN, Andrey at Sep 4, 2008 at 1:35 am
    Hi,

    An itermediate summary:

    6 seconds is a very strange number. It doesn't look like a TCP
    timeout (1-few minutes by default).
    I.e. if the client managed to issue SYN, and ip is not reachable,
    the client will wait for ACK few 60'ies seconds _*unless*_ the client
    have been set the (ASYNC) callback (through the timer, par ex) interrupt
    before falling into TCP stack
    I see one of Oracle's magic numer (3 seconds) times 2. Would like to
    see 200-300 lines of sqlnet.trc on the client before the error report.
    Actually that said, it leads to investigations at the client side.

    Might be intriguing if that functionality is included somehow into
    the _*native*_ Oracle's Oracle*Net

    Common practice of poor design of client's network access is

    we need (busyness rules require) the guaranteed responce in XXX [m]sec
    thus the client will setup its own timeout for the case, when the
    responce is not received in the specified timeframe
    typical goblin's action - forget about previous attempt (to
    establish the connection) and start a new one (nobody cares about socket
    used, server process started, listener forked ...)

    Usually leads to a kind of DoS attack from approved application. Ghmm

    d) another symptom of goblin's design - stateless connections, i.e.
    almost any request causes a new connection establishment & session
    creation overhead... Client connection pooling is too complex to understand

    2. Mark have rised a good point (event if it not directly applicable to
    the this particular issue) - you should use only VIP's in any server
    tnsnames/listener.ora, i.e. never mix native node ip with VIP. Well,
    usually that seriously bothers when there are firewalls/ip remapping
    etc..., i.e. complex network organization. Anyway - I like the point.

    3. I'd add to Ujang's comment - if EXTPROC is not used, drop it from
    the listener.ora

    4. Did your answered Riyaj's request on the configuration files?

    Andrey
    PS. Sorry for emotions (goblins etc...)
  • Dan Norris at Sep 4, 2008 at 1:46 am
    I wrote a blog post on TNS listener configuration for RAC a while back
    that might help (if not this case, then maybe someone else stumbling
    upon this thread. The entry is at
    http://www.dannorris.com/2008/07/21/tns-listener-configuration-for-oracle-rac/

    Hope it helps. I think most of the points I covered in the post have
    already been addressed here. Perhaps most importantly, using only the
    VIP address in the listener's listening endpoints is something that is
    often misconfigured.

    Dan

    KRIUSHIN, Andrey wrote:
    Hi,

    An itermediate summary:

    1. 6 seconds is a very strange number. It doesn't look like a TCP
    timeout (1-few minutes by default).
    I.e. if the client managed to issue SYN, and ip is not reachable,
    the client will wait for ACK few 60'ies seconds _*unless*_ the client
    have been set the (ASYNC) callback (through the timer, par ex) interrupt
    before falling into TCP stack
    I see one of Oracle's magic numer (3 seconds) times 2. Would like to
    see 200-300 lines of sqlnet.trc on the client before the error report.
    Actually that said, it leads to investigations at the client side.

    Might be intriguing if that functionality is included somehow into
    the _*native*_ Oracle's Oracle*Net


    Common practice of poor design of client's network access is
    a) we need (busyness rules require) the guaranteed responce in XXX [m]sec
    b) thus the client will setup its own timeout for the case, when the
    responce is not received in the specified timeframe
    c) typical goblin's action - forget about previous attempt (to
    establish the connection) and start a new one (nobody cares about socket
    used, server process started, listener forked ...)

    Usually leads to a kind of DoS attack from approved application. Ghmm

    d) another symptom of goblin's design - stateless connections, i.e.
    almost any request causes a new connection establishment & session
    creation overhead... Client connection pooling is too complex to understand

    2. Mark have rised a good point (event if it not directly applicable to
    the this particular issue) - you should use only VIP's in any server
    tnsnames/listener.ora, i.e. never mix native node ip with VIP. Well,
    usually that seriously bothers when there are firewalls/ip remapping
    etc..., i.e. complex network organization. Anyway - I like the point.

    3. I'd add to Ujang's comment - if EXTPROC is not used, drop it from
    the listener.ora

    4. Did your answered Riyaj's request on the configuration files?

    -- Andrey
    PS. Sorry for emotions (goblins etc...)

    --
    http://www.freelists.org/webpage/oracle-l


    --
    http://www.freelists.org/webpage/oracle-l
  • Yechiel Adar at Sep 4, 2008 at 7:34 am
    With the help of Oracle support we narrowed the problem to names resolution.
    We shut down node 2 and started a session with client trace.
    I saw in the trace that sqlnet is deciding to use server2-vip.
    After that it try to convert the name to tcp/ip address.
    When sqlnet try to convert server2-vip to tcp/ip address he is stuck.

    It seems that somewhere in the network something is not updated
    when the vip is moved to the other node and it takes about 6 (or 6*2)
    seconds
    until sqlnet gets error from the network and then it try to connect with
    the second entry, server-vip1, and this works.

    Have you heard anything about this problem?

    We are going to do a test using the ip itself instead of names in the
    tnsnames
    and also to use a sniffer to find out what happens during these 6 seconds.

    Adar Yechiel
    Rechovot, Israel

    Yechiel Adar wrote:
    We need some help.
    RAC, Oracle 10.2.0.3 on windows 2003 servers 64 bit.

    We did a fail over test. We disconnected one server from the network
    by pulling the network cable.
    The system worked fine, but once in a while a connection will take 6
    seconds instead on 20 ms.
    We understand that this happens because the VIP is moved to the second
    computer and there is
    nothing there to handle calls on that TCP address.

    I would like to know how to shorten the time from 6 seconds to almost
    nothing.
    --
    http://www.freelists.org/webpage/oracle-l
  • Martin Klier at Sep 4, 2008 at 2:50 pm
    Hi Yechiel,

    Yechiel Adar schrieb:
    After that it try to convert the name to tcp/ip address.
    When sqlnet try to convert server2-vip to tcp/ip address he is stuck.

    It seems that somewhere in the network something is not updated
    when the vip is moved to the other node and it takes about 6 (or 6*2)
    seconds
    until sqlnet gets error from the network and then it try to connect with
    the second entry, server-vip1, and this works.

    Have you heard anything about this problem?
    It sounds a bit like an ARP refreshing issue. On large networks, the ARP
    caches of the switches might take their time to refresh their tables,
    and the more intermediate switch/routing stations/paths are different
    from "client" to the DB, the longer it might take to renew them.

    Can you or your network admin trace the managed switches that are
    immediately involved on client and server side?

    Regards
    Martin

    --
    Usn's IT Blog for Linux, Oracle, Asterisk
    http://www.usn-it.de

    --
    http://www.freelists.org/webpage/oracle-l
  • Roman Podshivalov at Sep 4, 2008 at 3:07 pm
    I second Martin,

    Seems like ARP refresh issue. So it's not name to IP address (because
    that pair doesn't change and you most likely getting it from client cache
    anyway), but IP to MAC resolution. With sysadmin help and tool like snoop it
    could be narrowed down.

    --romas
    On 9/4/08, Martin Klier wrote:

    Hi Yechiel,

    Yechiel Adar schrieb:
    After that it try to convert the name to tcp/ip address.
    When sqlnet try to convert server2-vip to tcp/ip address he is stuck.

    It seems that somewhere in the network something is not updated
    when the vip is moved to the other node and it takes about 6 (or 6*2)
    seconds
    until sqlnet gets error from the network and then it try to connect with
    the second entry, server-vip1, and this works.

    Have you heard anything about this problem?
    It sounds a bit like an ARP refreshing issue. On large networks, the ARP
    caches of the switches might take their time to refresh their tables,
    and the more intermediate switch/routing stations/paths are different
    from "client" to the DB, the longer it might take to renew them.

    Can you or your network admin trace the managed switches that are
    immediately involved on client and server side?

    Regards
    Martin

    --
    Usn's IT Blog for Linux, Oracle, Asterisk
    http://www.usn-it.de

    --
    http://www.freelists.org/webpage/oracle-l

    --
    http://www.freelists.org/webpage/oracle-l

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouporacle-l @
categoriesoracle
postedSep 2, '08 at 5:37a
activeSep 4, '08 at 3:07p
posts15
users10
websiteoracle.com

People

Translate

site design / logo © 2022 Grokbase