GBase GCDW默认采用foundationDB作为元数据数据库服务,本文介绍FoundationDB的集群配置方法,以及高可用测试。
目录导航
参考
FoundationDB集群配置
本文将配置3个节点1副本的FoundationDB集群,使用的IP是 10.0.2.210,10.0.2.211,10.0.2.212。
下载
rmp包的下载请看前面的参考文章的FoundationDB部分。
安装服务
通过rpm -ivh 对服务server进行安装
[root@localhost ~]# rpm -ivh foundationdb-server-6.3.24-1.el7.x86_64.rpm
Preparing... ################################# [100%]
Updating / installing...
1:foundationdb-server-6.3.24-1 ################################# [100%]
为了测试,也将客户端安装了。这里就不写了。
修改配置文件
/etc/foundationdb/fdb.cluster
将里面的127.0.0.1改成本机对外服务IP, 比如10.0.2.210
重启服务
systemctl restart foundationdb
配置FDB集群多个调度节点
多个节点都安装好后,通过某一台fdbcli客户端进行配置。用coordinators, 设定多个IP为调度节点,建议为单数。
如下是设置3个的例子
fdb> coordinators 10.0.2.210:4500 10.0.2.211:4500
fdb> coordinators 10.0.2.210:4500 10.0.2.211:4500 10.0.2.212:4500
Coordination state changed
fdb>
设置成功后,通过status可以看到Coordinators输出为3,但FoundationDB processes为1,Machines也是1.
fdb> status
Using cluster file `fdb.cluster'.
Configuration:
Redundancy mode - single
Storage engine - memory-2
Coordinators - 3
Usable Regions - 1
Cluster:
FoundationDB processes - 1
Zones - 1
Machines - 1
查看本机的fdb.cluster,其字符串已经变动,包含了3个IP。
检查其它节点的配置文件,有可能会自动修改成3个IP的,如果确认未修改,可以将前面的复制一份过来。然后记得重启一下服务
systemctl restart foundationdb
再查看status,其中的FoundationDB processes为3,Machines也是3. 同时也请注意Redundancy mode为single,后面要修改冗余配置。
fdb> status
Using cluster file `/etc/foundationdb/fdb.cluster'.
Configuration:
Redundancy mode - single
Storage engine - memory-2
Coordinators - 3
Usable Regions - 1
Cluster:
FoundationDB processes - 3 (less 0 excluded; 1 with errors)
Zones - 3
Machines - 3
Memory availability - 2.9 GB per process on machine with least available
>>>>> (WARNING: 4.0 GB recommended) <<<<<
Retransmissions rate - 0 Hz
Fault Tolerance - 1 machines
Server time - 07/07/22 09:57:55
Data:
Replication health - (Re)initializing automatic data distribution
Moving data - unknown (initializing)
Sum of key-value sizes - unknown
Disk space used - 325 MB
Operating space:
Storage server - 1.0 GB free on most full server
Log server - 20.8 GB free on most full server
Workload:
Read rate - 17 Hz
Write rate - 3 Hz
Transactions started - 9 Hz
Transactions committed - 2 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Client time: 07/07/22 09:57:55
fdb>
配置FDB集群的冗余模式
默认single是单数据模式,double是2个副本,triple是3个副本。参考如下官方文档的介绍。
https://apple.github.io/foundationdb/configuration.html#configuration-choosing-redundancy-mode
通过fdbcli进行配置,注意Redundancy mode变成了double、
观察Fault Tolerance信息,需要一点时间才能从1machines 变成1 machines, 也就是允许1台机器故障。
fdb> configure double
Configuration changed
fdb> status
Using cluster file `/etc/foundationdb/fdb.cluster'.
Configuration:
Redundancy mode - double
Storage engine - memory-2
Coordinators - 3
Usable Regions - 1
Cluster:
FoundationDB processes - 3 (less 0 excluded; 1 with errors)
Zones - 3
Machines - 3
Memory availability - 3.0 GB per process on machine with least available
>>>>> (WARNING: 4.0 GB recommended) <<<<<
Fault Tolerance - 1 machines
Server time - 07/07/22 10:04:08
Data:
Replication health - (Re)initializing automatic data distribution
Moving data - unknown (initializing)
Sum of key-value sizes - unknown
Disk space used - 330 MB
Operating space:
Storage server - 1.0 GB free on most full server
Log server - 20.8 GB free on most full server
Workload:
Read rate - 16 Hz
Write rate - 0 Hz
Transactions started - 0 Hz
Transactions committed - 0 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Client time: 07/07/22 10:04:08
fdb>
FoundationDB高可用测试
模拟故障
我们将210节点的服务停下来
[root@rh7_210 ~]# systemctl stop foundationdb
[root@rh7_210 ~]# systemctl status foundationdb
● foundationdb.service - FoundationDB Key-Value Store
Loaded: loaded (/usr/lib/systemd/system/foundationdb.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Thu 2022-07-07 09:44:12 CST; 37s ago
Process: 3871 ExecStart=/usr/lib/foundationdb/fdbmonitor --conffile /etc/foundationdb/foundationdb.conf --lockfile /var/run/fdbmonitor.pid --daemonize (code=exited, status=0/SUCCESS)
Main PID: 3873 (code=exited, status=0/SUCCESS)
Jul 07 09:33:08 rh7_210 fdbmonitor[3873]: LogGroup="default" Process="fdbmonitor": Watching conf dir /etc/foundationdb/ (2)
Jul 07 09:33:08 rh7_210 fdbmonitor[3873]: LogGroup="default" Process="fdbmonitor": Loading configuration /etc/foundationdb/...db.conf
Jul 07 09:33:08 rh7_210 fdbmonitor[3873]: LogGroup="default" Process="fdbmonitor": Starting backup_agent.1
Jul 07 09:33:08 rh7_210 fdbmonitor[3873]: LogGroup="default" Process="fdbmonitor": Starting fdbserver.4500
Jul 07 09:33:08 rh7_210 fdbmonitor[3873]: LogGroup="default" Process="backup_agent.1": Launching /usr/lib/foundationdb/back...agent.1
Jul 07 09:33:08 rh7_210 fdbmonitor[3873]: LogGroup="default" Process="fdbserver.4500": Launching /usr/sbin/fdbserver (3875)...er.4500
Jul 07 09:33:08 rh7_210 fdbmonitor[3873]: LogGroup="default" Process="fdbserver.4500": FDBD joined cluster.
Jul 07 09:44:12 rh7_210 systemd[1]: Stopping FoundationDB Key-Value Store...
Jul 07 09:44:12 rh7_210 fdbmonitor[3873]: LogGroup="default" Process="fdbmonitor": Received signal 15 (Terminated), shutting down
Jul 07 09:44:12 rh7_210 systemd[1]: Stopped FoundationDB Key-Value Store.
Hint: Some lines were ellipsized, use -l to show in full.
Status状态
status可以看到变化,其中10.0.2.210:4500 (unreachable), 以及Fault Tolerance - 0 machines。表示有故障发生了,但集群还是可以对外提供服务的。
fdb> status
Using cluster file `/etc/foundationdb/fdb.cluster'.
Could not communicate with all of the coordination servers.
The database will remain operational as long as we
can connect to a quorum of servers, however the fault
tolerance of the system is reduced as long as the
servers remain disconnected.
10.0.2.210:4500 (unreachable)
10.0.2.211:4500 (reachable)
10.0.2.212:4500 (reachable)
Configuration:
Redundancy mode - double
Storage engine - memory-2
Coordinators - 3
Usable Regions - 1
Cluster:
FoundationDB processes - 2 (less 0 excluded; 1 with errors)
Zones - 2
Machines - 2
Memory availability - 2.9 GB per process on machine with least available
>>>>> (WARNING: 4.0 GB recommended) <<<<<
Fault Tolerance - 0 machines
Server time - 07/07/22 10:57:39
Data:
Replication health - Healthy
Moving data - 0.000 GB
Sum of key-value sizes - 1 MB
Disk space used - 346 MB
Operating space:
Storage server - 1.0 GB free on most full server
Log server - 20.8 GB free on most full server
Workload:
Read rate - 5 Hz
Write rate - 0 Hz
Transactions started - 0 Hz
Transactions committed - 0 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Client time: 07/07/22 10:57:35
fdb>
集群正常读写服务
fdb> writemode on
fdb> set sign 1234
Committed (290192586305)
fdb> get sign
`sign' is `1234'
fdb>
模拟再故障一台
将211服务也停了。
[root@rh7_211 ~]# systemctl stop foundationdb
[root@rh7_211 ~]#
集群无法对外提供服务
fdb> status
Using cluster file `/etc/foundationdb/fdb.cluster'.
Could not communicate with a quorum of coordination servers:
10.0.2.210:4500 (unreachable)
10.0.2.211:4500 (unreachable)
10.0.2.212:4500 (reachable)
fdb>
故障恢复
将210,211的服务恢复启动
[root@rh7_210 ~]# systemctl start foundationdb
[root@rh7_210 ~]#
[root@rh7_211 ~]# systemctl start foundationdb
[root@rh7_211 ~]#
服务恢复正常
fdb> status
Using cluster file `/etc/foundationdb/fdb.cluster'.
Configuration:
Redundancy mode - double
Storage engine - memory-2
Coordinators - 3
Usable Regions - 1
Cluster:
FoundationDB processes - 3 (less 0 excluded; 1 with errors)
Zones - 3
Machines - 3
Memory availability - 2.9 GB per process on machine with least available
>>>>> (WARNING: 4.0 GB recommended) <<<<<
Fault Tolerance - 1 machines
Server time - 07/07/22 11:03:51
Data:
Replication health - Healthy
Moving data - 0.000 GB
Sum of key-value sizes - 1 MB
Disk space used - 336 MB
Operating space:
Storage server - 1.0 GB free on most full server
Log server - 20.8 GB free on most full server
Workload:
Read rate - 35 Hz
Write rate - 0 Hz
Transactions started - 8 Hz
Transactions committed - 0 Hz
Conflict rate - 0 Hz
Backup and DR:
Running backups - 0
Running DRs - 0
Client time: 07/07/22 11:03:51
fdb> get sign
`sign' is `1234'
fdb>
扩容
安装服务与前面一致,然后将IP修改,重启服务.
扩容coor和数据节点
将节点通过coordinator加入集群. 命令和输出参考前面集群安装部分。
coordinatio是可以随时调整的,但建议数量为单数。
coordinators 10.0.2.210:4500 10.0.2.211:4500
仅扩容数据
如果不想增加coor,那就将fdb.cluster配置文件覆盖新节点的配置文件,然后重启服务即可。
缩容
用命令执行即可。如果包含coor,先用coordinators命令调整。
命令可以是某个IP的所有服务,也可以是某个IP的某个端口。
exclude 1.2.3.4 1.2.3.5 1.2.3.6
为了避免服务重启后影响,建议停止服务启动。并卸载删掉服务。
systemctl stop foundationdb
yum remove -e XXXXX
或者
rpm -e XXXX