如某个节点出现永久性故障,不可修复,且剩余节点也足以支撑现有业务,GBase 8a 可以通过缩容,重建集群主备关系,来剔除故障节点。本文通过一个实际例子介绍操作过程。
本文故障节点,指数据计算节点。 强烈建议管理,调度和计算节点分别部署,避免混用,除非节点少,成本优先。
目录导航
环境
3节点集群,其中115节点故障。本此操作,不仅将故障的115缩容,顺便将102节点也缩容。
数据库为9.5.2
[gbase@gbase_rh7_001 ~]$ gcadmin
CLUSTER STATE: ACTIVE
VIRTUAL CLUSTER MODE: NORMAL
=============================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
=============================================================
| NodeName | IpAddress | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 | OPEN | OPEN | 0 |
-------------------------------------------------------------
=========================================================================================================
| GBASE DATA CLUSTER INFORMATION |
=========================================================================================================
| NodeName | IpAddress | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
| node1 | 10.0.2.101 | 7 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node2 | 10.0.2.102 | 7 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node3 | 10.0.2.115 | 7 | CLOSE | CLOSE | 0 |
---------------------------------------------------------------------------------------------------------
缩容操作
与普通缩容过程完全一样,主要为了证明在节点故障时,也是可以缩容的。
创建不包含故障节点,以及计划缩容节点的分布策略
创建全新的策略
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ cat gcChangeInfo_one.xml
<?xml version="1.0" encoding="utf-8"?>
<servers>
<rack>
<node ip="10.0.2.101"/>
</rack>
</servers>
保留现有策略
如下IP为另一个例子,请注意区分,参考V95的另一个例子
如果希望保留以前的分布策略,用于后续再通过扩容的方式来实现节点替换的目的,则需要先拿到老的策略distribution
[gbase@localhost gcinstall]$ gcadmin getdistribution 7 distribution_info_7.xml
gcadmin getdistribution 7 distribution_info_7.xml ...
get segments information
write segments information to file [distribution_info_7.xml]
gcadmin getdistribution information successful
[gbase@localhost gcinstall]$ cat distribution_info_7.xml
<?xml version='1.0' encoding="utf-8"?>
<distributions>
<distribution>
<segments>
<segment>
<primarynode ip="10.0.2.102"/>
<duplicatenodes>
<duplicatenode ip="10.0.2.202"/>
</duplicatenodes>
</segment>
<segment>
<primarynode ip="10.0.2.202"/>
<duplicatenodes>
<duplicatenode ip="10.0.2.203"/>
</duplicatenodes>
</segment>
<segment>
<primarynode ip="10.0.2.203"/>
<duplicatenodes>
<duplicatenode ip="10.0.2.102"/>
</duplicatenodes>
</segment>
</segments>
</distribution>
</distributions>
[gbase@localhost gcinstall]$
创建不包含故障节点的策略
要做的事情是把故障的IP,从配置里去掉。包含了2种情况。
A、出现在 duplicatenode 部分,则删除这一行即可;
B、出现在 primarynode 部分,需要改造,将其duplicatenode 部分的某个IP(如果有多个的话),改造成 primarynode,记得把duplicatenode 删掉。 也就是残存的备份节点,成了主节点。
本例中,我们复制了一份配置文件,然后将配置文件中
1、将102的备份202删掉了
2、将202主分片【替换】成了其备份203,将203作为主分片。
[gbase@localhost gcinstall]$ cp distribution_info_7.xml distribution_info_8.xml
[gbase@localhost gcinstall]$ vi distribution_info_8.xml
[gbase@localhost gcinstall]$ cat distribution_info_8.xml
<?xml version='1.0' encoding="utf-8"?>
<distributions>
<distribution>
<segments>
<segment>
<primarynode ip="10.0.2.102"/>
<duplicatenodes>
</duplicatenodes>
</segment>
<segment>
<primarynode ip="10.0.2.203"/>
<duplicatenodes>
</duplicatenodes>
</segment>
<segment>
<primarynode ip="10.0.2.203"/>
<duplicatenodes>
<duplicatenode ip="10.0.2.102"/>
</duplicatenodes>
</segment>
</segments>
</distribution>
</distributions>
[gbase@localhost gcinstall]$
创建新的策略
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin distribution gcChangeInfo_one.xml p 1 d 0
gcadmin generate distribution ...
[warning]: parameter [d num] is 0, the new distribution will has no segment backup
please ensure this is ok, input [Y,y] or [N,n]: y
gcadmin generate distribution successful
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$
初始化和重分布
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gccli
GBase client 9.5.2.44.1045e3118. Copyright (c) 2004-2022, GBase. All Rights Reserved.
gbase> initnodedatamap;
Query OK, 0 rows affected, 3 warnings (Elapsed: 00:00:00.53)
gbase> rebalance instance;
Query OK, 11 rows affected (Elapsed: 00:00:00.74)
等待重分布结束
清理环境
删除nodedatamap
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gccli
GBase client 9.5.2.44.1045e3118. Copyright (c) 2004-2022, GBase. All Rights Reserved.
gbase> refreshnodedatamap drop 7;
Query OK, 0 rows affected, 3 warnings (Elapsed: 00:00:00.62)
gbase> ^CAborted
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$
清理event,因为重分布时,会导致故障节点出现ddl/dml的event.
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin rmdmlevent 2 10.0.2.115
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin rmddlevent 2 10.0.2.115
删除分布策略
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin rmdistribution 7
cluster distribution ID [7]
it will be removed now
please ensure this is ok, input [Y,y] or [N,n]: y
gcadmin remove distribution [7] success
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin
CLUSTER STATE: ACTIVE
VIRTUAL CLUSTER MODE: NORMAL
=============================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
=============================================================
| NodeName | IpAddress | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 | OPEN | OPEN | 0 |
-------------------------------------------------------------
=========================================================================================================
| GBASE DATA CLUSTER INFORMATION |
=========================================================================================================
| NodeName | IpAddress | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
| node1 | 10.0.2.101 | 8 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node2 | 10.0.2.102 | | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node3 | 10.0.2.115 | | CLOSE | CLOSE | 0 |
---------------------------------------------------------------------------------------------------------
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$
移除缩容的节点
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ cat rmnodes.xml
<?xml version="1.0" encoding="utf-8"?>
<servers>
<rack>
<node ip="10.0.2.102"/>
<node ip="10.0.2.115"/>
</rack>
</servers>
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin rmnodes rmnodes.xml
gcadmin remove nodes ...
gcadmin rmnodes from cluster success
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$ gcadmin
CLUSTER STATE: ACTIVE
VIRTUAL CLUSTER MODE: NORMAL
=============================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
=============================================================
| NodeName | IpAddress | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 | OPEN | OPEN | 0 |
-------------------------------------------------------------
=========================================================================================================
| GBASE DATA CLUSTER INFORMATION |
=========================================================================================================
| NodeName | IpAddress | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
| node1 | 10.0.2.101 | 8 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
[gbase@gbase_rh7_001 gcinstall_9.5.2.44.10]$
删除缩容节点的数据库文件
rm -fr /opt/gbase/gcluster
rm -fr /opt/gbase/gnode
rm -fr /opt/gbase/gcware
总结
当节点彻底不可用时,GBase 8a集群是支持将该节点强制缩容剔除出集群的。与正常缩容的区别,就是缩容重分布过程会在故障节点产生event,再删除分布策略时要先清理掉。