Saturday, January 10, 2009

Interconnect setup in RAC environment

http://momendba.blogspot.com/2007/07/how-to-find-cluster-interconnect-ip.html

How to find Cluster Interconnect IP address from Oracle Database

Hello, The easiest way to find the cluster interconnect is to view the “hosts” file. The “hosts” file is located under: UNIX .......... /etc Windows ...... C:\WINDOWS\system32\drivers\etc Following are the ways to find the cluster interconnect through Oracle database: 1) Query X$KSXPIA The following query provides the interconnect IP address registered with Oracle database:
  1. SQL> select IP_KSXPIA from x$ksxpia where PUB_KSXPIA = 'N';
  2. IP_KSXPIA
  3. ----------------
  4. 192.168.10.11
This query should be run on all instances to find the private interconnect IP address used on their respective nodes. 2) Query GV$CLUSTER_INTERCONNECTS view Querying GV$CLUSTER_INTERCONNECTS view lists the interconnect used by all the participating instances of the RAC database.
  1. SQL> select INST_ID, IP_ADDRESS from GV$CLUSTER_INTERCONNECTS;

  2. INST_ID IP_ADDRESS
  3. ---------- ----------------
  4. 1 192.168.10.11
  5. 2 192.168.10.12
3) ASM and Database log We can find out the cluster interconnect used by viewing the log file of both ASM instance and the database instance. When the ASM instance is started it registers the private interconnect used. Following is the excerpt from the ASM alert log on node 1:
  1. :
  2. :
  3. Cluster communication is configured to use the following interface(s) for this instance
  4. 192.168.10.11
  5. :
  6. :
Similarly, the database instance also registers the private interconnect with its instance and following message is recorded in the alert log:
  1. :
  2. :
  3. Cluster communication is configured to use the following interface(s) for this instance
  4. 192.168.10.11
  5. :
  6. :
4) CLUSTER_INTERCONNECT initialization parameter This parameter provides Oracle with information on the availability of additional cluster interconnects. It overrides the default interconnect settings at the OS level with a preferred one. You can view the current value of this parameter by issuing the following in SQL*Plus, connected as SYS user: SQL> show parameter cluster_interconect Thanks for reading :-)

http://www.rachelp.nl/index_kb.php?menu=articles&actie=show&id=35

Date 2007-10-12 22:22:21
Component RAC
Title Interconnect setup in RAC environment.
Version 9.2.0 - 11.1.0
Problem
Interconnect setup in RAC environment.

The interconnect is a very important part of the cluster environment it is on of the aorta’s of a cluster environment. The interconnect is used as physical layer between the cluster nodes to perform heartbeats as well as the cache fusion is using it. The interconnect must be a private connection. Cross over cable is not support.

In a day to day operation it is proven that when the interconnect is configured correctly the interconnect will not be the bottleneck in case of performance issues. In the rest of this article will be focus on the how to validate the interconnect is really used. An DBA must be able to validate the interconnect settings in case of performance problems. Out of scope is the physical attachment of the interconnect.

Although you should thread performance issues in a Cluster environment the way you would normally also do in no-cluster environments here some area’s you can focus on. Normally the average interconnect latency using gigabit must be <> connect / as sysdba
SQL> alter session set tracefile_identifier=oradebug_interc
SQL> oradebug setmypid
SQL> oradebug ipc
SQL> exit

Now if you open the tracefile, in the bdump location, you can find the IP address used for the interconnect. Here is the result of the above oradebug ipc command.

dmno 0x7902775e admport:
SSKGXPT 0x10569c44 flags SSKGXPT_READPENDING active network 0
info for network 0
socket no 7 IP 145.72.220.83 UDP 53032
HACMP network_id 0 sflags SSKGXPT_UP
context timestamp 0
no ports
sconno accono ertt state seq# sent async sync rtrans acks

Query x$ksxpia
The last option is to query the x$ksxpia, which is instance specific query. A query on this view is providing the information from which setting the interconnect is picked up. Depending on the environment this can be useful to indicate if an where the configuration went wrong. This query will work in Oracle 9i,10g and 11g.

Below is an example of the output.

col picked_ksxpia format a15
col indx format 9999
col name_ksxpia format a5
col ip_ksxpia format a20
select * from x$KSXPIA
ADDR INDX INST_ID PUB_KSXPIA PICKED_KSXPIA NAME_ IP_KSXPIA
---------------- ----- ---------- ------------------------------ --------------- ----- --------------------
00000001105D6540 0 1 N OCR en7 145.72.220.83
00000001105D6540 1 1 Y OCR en6 145.72.220.10

Note: The pub_ksxpia indicate if the interface is a public or private one. The picked_ksxpia indicated from where the information is collect. In the example the interface information from the OCR is used. Other values are OSD which is meaning third party clusterware is used. And CI is also possible which means the cluster_interconnect parameter is set. (Last I recommend not to do).

Using Alert.log
Instead of using one or more of the above queries you can also check the alert.log of the instances involved. During the startup of the RDBMS instance the interface used for the public and private connect is mentioned. Note this start from Oracle 10g, and is not available in Oracle 9i.

Example: alert.log

Interface type 1 eth1 10.10.10.0 configured from OCR for use as a cluster interconnect
Interface type 1 eth0 192.168.2.0 configured from OCR for use as a public interface

Using Oracle Cluster Registry to validate settings.
This method can be used to validate the settings in the OCR. But it does not mean it is also used by the database instances. To validate the settings of ocr you need to use the oifcfg command to retrieve the information from the OCR.

The oifcfg iflist list all the interfaces available on the Operating system and will not get this from the OCR.
The oifcfg getif list the configuration from the OCR.

An example:

racworkshop1:/export/home/oracle$ oifcfg iflist
eth0 192.168.2.0
eth1 10.10.10.0
racworkshop1:/export/home/oracle$ oifcfg getif
eth0 192.168.2.0 global public
eth1 10.10.10.0 global cluster_interconnect

Network statistics.
In case of performance issues, validating the network to see if there is an issue. This will quickly eliminate possible hardware/wire issues.

Validating this step can safe time.

Ping
With using the ping commando it is easy to validate if there is a network/hardware issue.
Beside this also the roundtrip of a ping can indicate if there is an issue.

When correct the ping using the interconnect IP will show a time <1ms icmp_seq="0" ttl="255" time="0" icmp_seq="1" ttl="255" time="0" icmp_seq="2" ttl="255" time="0" icmp_seq="3" ttl="255" time="0" icmp_seq="0" ttl="255" time="2" icmp_seq="1" ttl="255" time="1" icmp_seq="2" ttl="255" time="1" icmp_seq="3" ttl="255" time="1" i="eth1"> 15 ms then you probably have cache fusion latency. Normally the avg time is around < 5 ms. Depending on the type of application, OLTP/DSS and multi_block_read_count can influence the sending part.

Example:

select b1.inst_id, b2.value "GCS CR BLOCKS RECEIVED",
b1.value "GCS CR BLOCK RECEIVE TIME",
((b1.value / b2.value) * 10) "AVG CR BLOCK RECEIVE TIME (ms)"
from gv$sysstat b1, gv$sysstat b2
where b1.name = 'global cache cr block receive time' and
b2.name = 'global cache cr blocks received' and b1.inst_id = b2.inst_id
or b1.name = 'gc cr block receive time' and
b2.name = 'gc cr blocks received' and b1.inst_id = b2.inst_id

INST_ID GCS CR BLOCKS RECEIVED GCS CR BLOCK RECEIVE TIME AVG CR BLOCK RECEIVE TIME (ms)
-------------------- ---------------------- ------------------------- ------------
1 702187 214055 3.0
2 795547 182624 2.3


Conclusion.

If you configure the interconnect correctly and didn’t saw any problems related to dropped packages, latency, block lost, then you need to focus on the cache fusion part.

Although the above part doesn’t focus on what can be done to fix problems, there are some tips.
- If you have dropped packages start to validate the cables, switches.
- If you notices latency start to use Jumbo Frames. Make sure all the components used are changed (set the MTU size to 9000)
- if you notice block lost change the UDP settings, check the platform specific information for this, probably the message queue is not big enough.
- install oswatches (download from metalink) to collect OS level information where net statistics is part of.

If the query related to the cache fusion statistics shows bad values start to find out which part of the cache fusion this causing the problem. But also bad design can be the cause of the problem.

No comments: