当集群节点出现故障时,分为可恢复和不可恢复两种,对应的GBase 8a提供了2种节点状态来应对。
目录导航
failure 状态
针对可恢复的情况,当然也包括排查种的情况。。
被记录failure的节点,和该节点有关的event,不再检测各服务状态,不再下发任务,可以恢复到正常(normal)状态。
如下是一个节点模拟断电故障,且短时间无法恢复。从OFFLINE强行设置为FAILURE的操作过程。
注意:从gcadmin执行耗时看,OFFLINE时,明显在故障节点卡了一下,在等待检测超时。而在FAILURE时,系统忽略了检测,瞬间执行完成。
[root@rh6-1 ~]# gcadmin
CLUSTER STATE: ACTIVE
CLUSTER MODE: NORMAL
=================================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
=================================================================
| NodeName | IpAddress |gcware |gcluster |DataState |
-----------------------------------------------------------------
| coordinator1 | 10.0.2.201 | OPEN | OPEN | 0 |
-----------------------------------------------------------------
================================================================
| GBASE DATA CLUSTER INFORMATION |
================================================================
|NodeName | IpAddress | gnode |syncserver |DataState |
----------------------------------------------------------------
| node1 | 10.0.2.201 | OPEN | OPEN | 0 |
----------------------------------------------------------------
| node2 | 10.0.2.202 | OFFLINE | | |
----------------------------------------------------------------
[root@rh6-1 ~]# gcadmin setnodestate 10.0.2.202 failure
current user is not DBA user, please switch user to [gbase]
gcadmin set node state failed
[root@rh6-1 ~]# su - gbase
[gbase@rh6-1 ~]$ gcadmin setnodestate 10.0.2.202 failure
load gbase client dll start ......
load gbase client dll end ......
[gbase@rh6-1 ~]$ gcadmin
CLUSTER STATE: ACTIVE
CLUSTER MODE: NORMAL
=================================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
=================================================================
| NodeName | IpAddress |gcware |gcluster |DataState |
-----------------------------------------------------------------
| coordinator1 | 10.0.2.201 | OPEN | OPEN | 0 |
-----------------------------------------------------------------
================================================================
| GBASE DATA CLUSTER INFORMATION |
================================================================
|NodeName | IpAddress | gnode |syncserver |DataState |
----------------------------------------------------------------
| node1 | 10.0.2.201 | OPEN | OPEN | 0 |
----------------------------------------------------------------
| node2 | 10.0.2.202 | FAILURE | | |
----------------------------------------------------------------
[gbase@rh6-1 ~]$
unavailable 状态
当节点判定不可恢复故障,特别是RAID损坏,文件系统损坏,数据丢失时,设置这个状态。
被设置unavailable的节点,【不再】记录event, 不再检测节点状态,不再下发任务,不能恢复到正常(normal)状态,只能做节点替换。
gcadmin setnodestate 10.0.2.202 unavailable
具体操作过程,请参考 GBase 8a 强制节点离线和节点替换replace