节点2主机关停之后,VIP并没有failover到节点一
现象:
节点2主机关停之后,VIP并没有failover到节点一
如下所示,在节点一查看,VIP并没有FAILOVER过来。[root@MAA01 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr A4:BA:DB:13:E2:AB
inet addr:10.8.32.111 Bcast:10.0.15.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:e2ab/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:12896778 errors:0 dropped:0 overruns:0 frame:0
TX packets:9488933 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2875695560 (2.6 GiB) TX bytes:2411913446 (2.2 GiB)
Interrupt:114 Memory:d6000000-d6012800
eth0:1 Link encap:Ethernet HWaddr A4:BA:DB:13:E2:AB
inet addr:10.8.32.115 Bcast:10.0.15.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:114 Memory:d6000000-d6012800
eth1 Link encap:Ethernet HWaddr A4:BA:DB:13:E2:AD
inet addr:192.168.127.101 Bcast:192.168.127.255 Mask:255.255.255.0
inet6 addr: fe80::a6ba:dbff:fe13:e2ad/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:203865 errors:0 dropped:0 overruns:0 frame:0
TX packets:309076 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:217218808 (207.1 MiB) TX bytes:66031839 (62.9 MiB)
Interrupt:122 Memory:d8000000-d8012800
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:2264776 errors:0 dropped:0 overruns:0 frame:0
TX packets:2264776 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3249461652 (3.0 GiB) TX bytes:3249461652 (3.0 GiB)
[oracle@MAA01 ~]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE maa01
ora....01.lsnr application ONLINE ONLINE maa01
ora....t01.gsd application ONLINE ONLINE maa01
ora....t01.ons application ONLINE ONLINE maa01
ora....t01.vip application ONLINE ONLINE maa01
ora....SM2.asm application ONLINE OFFLINE
ora....02.lsnr application ONLINE OFFLINE
ora....t02.gsd application ONLINE OFFLINE
ora....t02.ons application ONLINE OFFLINE
ora....t02.vip application ONLINE OFFLINE
ora.rac.db application ONLINE ONLINE maa01
ora....c1.inst application ONLINE ONLINE maa01
ora....c2.inst application ONLINE OFFLINE
ora...._taf.cs application OFFLINE OFFLINE
ora....ac1.srv application OFFLINE OFFLINE
ora....ac2.srv application OFFLINE OFFLINE
ora....rac1.cs application OFFLINE OFFLINE
ora....ac1.srv application OFFLINE OFFLINE
ora....rac2.cs application OFFLINE OFFLINE
ora....ac2.srv application OFFLINE OFFLINE
[oracle@MAA01 ~]$
此时,在节点1上ping节点2,无法ping通:[oracle@MAA01 ~]$ ping 10.8.32.112
PING 10.8.32.112 (10.8.32.112) 56(84) bytes of data.
From 10.8.32.111 icmp_seq=1 Destination Host Unreachable
From 10.8.32.111 icmp_seq=2 Destination Host Unreachable
分析:
查看了节点1的监听配置文件,未发现有异常:
$CRS_HOME/log/<nodename>/*.log
$CRS_HOME/log/<nodename>/crsd/*.log
$CRS_HOME/log/<nodename>/cssd/*.log
$ORACLE_HOME/network/admin/listener.ora
[oracle@MAA01 ~]$
[oracle@MAA01 ~]$ cd $ORACLE_HOME
[oracle@MAA01 db]$ cd network/admin/
[oracle@MAA01 admin]$ cat listener.ora
# listener.ora.maa01 Network Configuration File: /oracle/app/11gR1/db/network/admin/listener.ora.maa01
# Generated by Oracle configuration tools.
INBOUND_CONNECT_TIMEOUT_LISTENER_MAA01=180
LISTENER_MAA01 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = MAA01-vip)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.8.32.111)(PORT = 1521)(IP = FIRST))
)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
)
)
)
查看节点1的相关日志文件,发现尝CRS进行了failover vip的尝试,但失败了。
[oracle@MAA01 admin]$
crsd.log:
[crsd(5072)]CRS-1201:CRSD started on node maa01.
2013-08-09 17:18:57.513
[crsd(5072)]CRS-1205:Auto-start failed for the CRS resource . Details in maa01.
2013-08-09 17:28:01.175
[cssd(5555)]CRS-1612:node joadbtest02 (2) at 50% heartbeat fatal, eviction in 14.102 seconds
2013-08-09 17:28:02.177
[cssd(5555)]CRS-1612:node joadbtest02 (2) at 50% heartbeat fatal, eviction in 13.102 seconds
2013-08-09 17:28:09.181
[cssd(5555)]CRS-1611:node joadbtest02 (2) at 75% heartbeat fatal, eviction in 6.102 seconds
2013-08-09 17:28:13.179
[cssd(5555)]CRS-1610:node joadbtest02 (2) at 90% heartbeat fatal, eviction in 2.102 seconds
2013-08-09 17:28:14.181
[cssd(5555)]CRS-1610:node joadbtest02 (2) at 90% heartbeat fatal, eviction in 1.102 seconds
2013-08-09 17:28:15.183
[cssd(5555)]CRS-1610:node joadbtest02 (2) at 90% heartbeat fatal, eviction in 0.092 seconds <--------------heart beat loss
2013-08-09 17:28:16.045
[cssd(5555)]CRS-1607:CSSD evicting node joadbtest02. Details in /oracle/app/11gR1/crs/log/maa01/cssd/ocssd.log.
[cssd(5555)]CRS-1601:CSSD Reconfiguration complete. Active nodes are maa01 . <----------------------------------------------------Node2 was evicted
alertmaa01.log:
[ CSSD]2013-08-09 17:28:31.188 [1158809920] >TRACE: clssnmUpdateNodeState: node 2, state (5/0) unique (1371182914/1371182914) prevConuni(1371182914) birth (244117402/244117402) (old/new)
[ CSSD]2013-08-09 17:28:31.188 [1158809920] >TRACE: clssnmDeactivateNode: node 2 (joadbtest02) left cluster
ocssd.log:
2013-08-09 17:18:57.506: [ CRSRES][1488656704] startRunnable: setting CLI values
2013-08-09 17:18:57.512: [ CRSRES][1486555456] maa01 : CRS-1019: Resource ora.joadbtest02.LISTENER_JOADBTEST02.lsnr (application) cannot run on maa01
2013-08-09 17:18:57.519: [ CRSRES][1488656704] Attempting to start `ora.maa01.ASM1.asm` on member `maa01`
2013-08-09 17:18:57.531: [ CRSRES][1490757952] startRunnable: setting CLI values
2013-08-09 17:18:57.541: [ CRSRES][1490757952] Attempting to start `ora.maa01.vip` on member `maa01`
2013-08-09 17:19:01.054: [ CRSRES][1490757952] Start of `ora.maa01.vip` on member `maa01` succeeded.
2013-08-09 17:19:01.079: [ CRSRES][1490757952] startRunnable: setting CLI values
2013-08-09 17:19:01.093: [ CRSRES][1490757952] Attempting to start `ora.maa01.LISTENER_MAA01.lsnr` on member `maa01`
2013-08-09 17:19:04.660: [ CRSRES][1490757952] Start of `ora.maa01.LISTENER_MAA01.lsnr` on member `maa01` succeeded.
2013-08-09 17:19:05.204: [ CRSRES][1513838912] CRS-1002: Resource 'ora.maa01.LISTENER_MAA01.lsnr' is already running on member 'maa01'
2013-08-09 17:28:31.192: [ OCRMAS][1213802816]th_master:13: I AM THE NEW OCR MASTER at incar 14. Node Number 1 <---Node 1 is master.
2013-08-09 17:28:31.194: [ CRSCOMM][1486555456] CLEANUP: Searching for connections to failed node joadbtest02
2013-08-09 17:28:31.194: [ CRSEVT][1486555456] Processing member leave for joadbtest02, incarnation: 244117407
2013-08-09 17:28:31.195: [ CRSD][1486555456] SM: recovery in process: 8
2013-08-09 17:28:31.195: [ CRSEVT][1486555456] Do failover for: joadbtest02 <-------在此时failover失败.
2013-08-09 17:28:31.399: [ CRSRES][1513838912] startRunnable: setting CLI values
2013-08-09 17:28:31.414: [ CRSRES][1513838912] Attempting to start `ora.joadbtest02.vip` on member `maa01` <---尝试vip failover到节点1
2013-08-09 17:28:31.421: [ CRSRES][1530632512] startRunnable: setting CLI values
2013-08-09 17:28:31.434: [ CRSRES][1530632512] Attempting to start `ora.rac.db` on member `maa01`
2013-08-09 17:28:31.542: [ CRSRES][1530632512] Start of `ora.rac.db` on member `maa01` succeeded.
2013-08-09 17:28:37.863: [ CRSAPP][1513838912] StartResource error for ora.joadbtest02.vip error code = 1
2013-08-09 17:28:41.057: [ CRSRES][1513838912] Start of `ora.joadbtest02.vip` on member `maa01` failed. <---------VIP failover failed.
2013-08-09 17:28:41.085: [ CRSEVT][1486555456] Post recovery done evmd event for: joadbtest02
2013-08-09 17:28:41.085: [ CRSD][1486555456] SM: recoveryDone: 0
2013-08-09 17:28:41.098: [ CRSEVT][1486555456] Processing RecoveryDone
再查看ora.joadbtest02.vip日志文件:
ora.joadbtest02.vip:
2013-08-09 17:28:34.723: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: checkIf: interface eth2 is down <--- is it clue?
Invalid parameters, or failed to bring up VIP (host=MAA01)
2013-08-09 17:28:34.729: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: clsrcexecut: cmd = /oracle/app/11gR1/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /oracle/app/11gR1/crs/bin/racgvip start joadbtest02
2013-08-09 17:28:34.729: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: clsrcexecut: rc = 1, time = 3.150s
2013-08-09 17:28:37.861: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: clsrcexecut: cmd = /oracle/app/11gR1/crs/bin/racgeut -e _USR_ORA_DEBUG=0 54 /oracle/app/11gR1/crs/bin/racgvip check joadbtest02
2013-08-09 17:28:37.861: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: clsrcexecut: rc = 1, time = 3.130s
2013-08-09 17:28:37.861: [ RACG][1353934704] [11316][1353934704][ora.joadbtest02.vip]: end for resource = ora.joadbtest02.vip, action = start, status = 1, time = 6.350s
此处已经看出线索了,看来问题出在网卡这里,节点1的Public IP的网卡是eth0,不知道何故,节点二Public IP的网卡却为eth2,
由于客户之前的messages日志并没有保留,Oracle和集群更早期的日志也没有。具体为什么两个节点的Public IP不一样不得而知。
解决方法:
将两个节点Public IP的网卡设置为一致,具体操作可参考我之前写的一篇文章:
VIP不能正常启动,报错CRS-1006
>更多相关文章
首页推荐
佛山市东联科技有限公司一直秉承“一切以用户价值为依归
- 01-11全球最受赞誉公司揭晓:苹果连续九年第一
- 12-09罗伯特·莫里斯:让黑客真正变黑
- 12-09谁闯入了中国网络?揭秘美国绝密黑客小组TA
- 12-09警示:iOS6 惊现“闪退”BUG
- 04-01“AI复活”生意的启示与挑战
- 04-01超200万人涌入直播间看卖“云” 上千家企业
- 04-01从虚拟到共生:数字人“花样百出”
- 03-29小米汽车“走进”京东,双方或将深化合作
- 03-29迎广交会,广州白云国际机场优化支付服务示
相关文章
24小时热门资讯
24小时回复排行
热门推荐
最新资讯
操作系统
黑客防御