GBase 8a支持2个结构相同(主分片数一样)的集群间,通过gcluster_rsynctool工具做表级的数据同步,常用于跨集群的表数据迁移,主备集群的数据同步等。本文介绍该工具的一个使用样例。
本例子只提供了2个单节点集群的样例,实际环境只要2套集群的主分片数相同即可同步,与副本数量无关。
目录导航
参考
GBase 8a【私网/内网】集群间同步的方法gcluster_rsynctoolIP映射功能配置
环境
版本
9.5.2.44.10
主集群
[gbase@gbase_rh7_001 ~]$ gcadmin
CLUSTER STATE: ACTIVE
VIRTUAL CLUSTER MODE: NORMAL
=============================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
=============================================================
| NodeName | IpAddress | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 | OPEN | OPEN | 0 |
-------------------------------------------------------------
=========================================================================================================
| GBASE DATA CLUSTER INFORMATION |
=========================================================================================================
| NodeName | IpAddress | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
| node1 | 10.0.2.101 | 8 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
备集群
[gbase@localhost gcinstall_9.5.2.44.10]$ gcadmin
CLUSTER STATE: ACTIVE
VIRTUAL CLUSTER MODE: NORMAL
=============================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
=============================================================
| NodeName | IpAddress | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.102 | OPEN | OPEN | 0 |
-------------------------------------------------------------
=========================================================================================================
| GBASE DATA CLUSTER INFORMATION |
=========================================================================================================
| NodeName | IpAddress | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
| node1 | 10.0.2.102 | 1 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
数据
主集群有testdb库,以及一些表。
[gbase@gbase_rh7_001 ~]$ gccli
GBase client 9.5.2.44.1045e3118. Copyright (c) 2004-2022, GBase. All Rights Reserved.
gbase> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| performance_schema |
| gbase |
| gctmpdb |
| gclusterdb |
| testdb |
+--------------------+
6 rows in set (Elapsed: 00:00:00.00)
gbase> use testdb;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)
gbase> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| t1 |
| t2 |
| t2_inc |
| t3_inc |
| t_inc |
| t_pid |
| tidname |
| tmp2 |
| tmp3 |
| tt |
| tt2 |
+------------------+
11 rows in set (Elapsed: 00:00:00.00)
同步准备
现有版本不支持自动建库,所有备库要手工创建。所以首次使用,要创建testdb库。
请注意数据库用户,以及对应库的权限,避免表无法访问。
[gbase@localhost gcinstall_9.5.2.44.10]$ gccli
GBase client 9.5.2.44.1045e3118. Copyright (c) 2004-2022, GBase. All Rights Reserved.
gbase> create database testdb;
Query OK, 1 row affected (Elapsed: 00:00:00.01)
同步工具参数
- --master_mpp_ip 主集群IP,当前版本只支持ipv4,且单个IP
- --slave_mpp_ip 备集群IP,当前版本只支持ipv4,且单个IP
- --master_mpp_ip_mapping_file 指定主集群的IP映射文件
- --slave_mpp_ip_mapping_file 指定从集群的IP映射文件
- --table_list_filet 被同步的表列表文件
- --log_level 日志级别
- 0: nolog level
- 1: critical level
- 2: error level
- 3: warning level
- 4: info level
- 5: debug level
- --rsync_mode 同步模式
- 0:主分片同步,备分片设定状态,通过集群内部的自动恢复机制进行恢复。该参数值不建议使用,仅仅为了版本兼容保留。
- 1:同时同步备集群的主、备分片,单表同步的性能能够达到最大化。该参数值不建议使用,存在一个分片的主备都同步失败,表不可用的情况,该参数是当初设计是为了数据安全性不高的场景。
- 2:先主后备同步方式。先同步备集群表的主分片,主分片同步成功后再同步备分片,确保备集群的表在同步失败后存在一组可用分片,用来回滚同步操作。
- --master_mpp_gc_pw 主集群的数据库用户密码
- --slave_mpp_gc_pw 备集群的数据库用户密码
- --database_user 同步使用的数据库用户,两面必须相同,所有只有1个参数
- --slave_create_table_if_not_exists 如果备集群表不存在,则自动根据主集群创建
其它参数请看--help的信息
[gbase@gbase_rh7_001 gcluster_rsynctool]$ ./gcluster_rsynctool.py --help
Usage: gcluster_rsynctool.py [optinos]
Options:
-h, --help show this help message and exit
-v, --version output version information and exit
--master_mpp_ip=MASTER_MPP_IP
The ipaddress of master mpp gcluster,only support IPV4
--master_mpp_ip_mapping_file=MASTER_MPP_IP_MAPPING_FILE
The file for master mpp gcluster ipaddress transform
file format is json,only support IPV4
--master_mpp_gc_port=MASTER_MPP_GC_PORT
The port of master mpp gcluster[default:5258]
--master_mpp_gn_port=MASTER_MPP_GN_PORT
The port of master mpp gcluster's gnode[default:5050]
--slave_mpp_ip=SLAVE_MPP_IP
The ipaddress of slave mpp gcluster,only support IPV4
--slave_mpp_ip_mapping_file=SLAVE_MPP_IP_MAPPING_FILE
The file for slave mpp gcluster ipaddress transform
file format is json,only support IPV4
--slave_mpp_gc_port=SLAVE_MPP_GC_PORT
The port of slave mpp gcluster[default:5258]
--slave_mpp_gn_port=SLAVE_MPP_GN_PORT
The port of slave mpp gcluster's gnode[default:5050]
--master_mpp_gc_pw=MASTER_MPP_GC_PW
The password of master mpp gcluster's database
user[default:000000]
--slave_mpp_gc_pw=SLAVE_MPP_GC_PW
The password of slave mpp gcluster's database
user[default:000000]
--sync_vc_name=SYNC_VC_NAME
The name of vc that will be sync,only support one vc
per time[default:vcname000001]
--database_user=DATABASE_USER
A database user having rights to access tables defined
in 'table_list_file', this user
should exist in both master gcluster and slave
gcluster[default:gbase]
--table_list_file=TABLE_LIST_FILE
The file which contain the table list should be synced
--table_parallel_degree=TABLE_PARALLEL_DEGREE
The parallel degree for how many tables should be
synced [default:1, min:1, max:128]
--lock_table_timeout=LOCK_TABLE_TIMEOUT
The timeout use by gcluster locking table
[default:600,min:1,max:3600,unit:second]
--retry_times=RETRY_TIMES
Synctool retry times[default:1, min:1,
max:10,unit:times]
--retry_interval=RETRY_INTERVAL
The interval time between synctool retry [default:10,
min:1, max:1800,unit:second]
--sync_mode=SYNC_MODE
The synctool mode [default:2, min:1, max:3];
1: do table level scn check; 2:
not to do table level scn check,just check each column
scn; 3: not to do scn check
,just sync whole table data;
--error_table_list_file=ERROR_TABLE_LIST_FILE
The file output for failed table list[default:{$table_
list_file}_error_table_list_{%Y_%m_%d-%H:%M:%S}.log]
--log_name=LOG_NAME The log file
name[default:{$table_list_file}_{%Y_%m_%d}.log]
--log_level=LOG_LEVEL
The gcluster_rsynctool's log level [default:3, min:0,
max:5]; 0: nolog level;
1: critical level 2: error level
3: warning level 4: info level
5: debug level
--rsync_mode=RSYNC_MODE
The gcluster_rsynctool's rsync_mode [default:2, min:0,
max:2]; 0: sync major slice and
set status on backup slice; 1:
sync major slice and backup slice at the same;
2: sync major slice first and then sync the backup
slice if major slice sync success
.
--double_check check data that already write on disk.[default:false]
--slave_create_table_if_not_exists
create table if not exists on slave
mpp.[default:false]
--cleanup_uncontrolled_data=CLEANUP_UNCONTROLLED_DATA
whether clean up uncontrolled data on target table
[default:0, min:0, max:1]; 0:
not clean up uncontrolled data on target data;
1: clean up uncontrolled data on target data;
同步单个表
我们先测试同步一张表,多张的看后面例子
同步表的列表文件
[gbase@gbase_rh7_001 gcluster_rsynctool]$ cat test.list
testdb.t2
同步操作
[gbase@gbase_rh7_001 gcluster_rsynctool]$ ./gcluster_rsynctool.py --master_mpp_ip=10.0.2.101 --slave_mpp_ip=10.0.2.102 --table_list_file=test.list --log_level=5 --rsync_mode=1 --master_mpp_gc_pw=gbase20110531 --slave_mpp_gc_pw=gbase20110531 --database_user=gbase --slave_create_table_if_not_exists
*************Gcluster Sync Tool Start*************************
Table [ vcname000001: testdb: t2] Sync Start
Table [ vcname000001 testdb: t2] Sync End cost : <0 s,142 ms>
*************Gcluster Sync Tool End With Success**************
[gbase@gbase_rh7_001 gcluster_rsynctool]$
同步结果
备用集群的testdb.t2表自动创建并同步了数据。
[gbase@gbase_rh7_001 gcluster_rsynctool]$ gccli -h10.0.2.102 -e"select * from testdb.t2"
+------+
| id |
+------+
| 1000 |
+------+
[gbase@gbase_rh7_001 gcluster_rsynctool]$ gccli -h10.0.2.101 -e"select * from testdb.t2"
+------+
| id |
+------+
| 1000 |
+------+
多表同步
同步命令可以增加并行参数
--table_parallel_degree 同时做同步的任务数量
同步表的列表
[gbase@gbase_rh7_001 gcluster_rsynctool]$ cat test.list
testdb.t1
testdb.t2
testdb.t2_inc
testdb.t3_inc
testdb.t_inc
testdb.t_pid
testdb.tidname
testdb.tmp2
testdb.tmp3
testdb.tt
testdb.tt2
同步操作
[gbase@gbase_rh7_001 gcluster_rsynctool]$ ./gcluster_rsynctool.py --master_mpp_ip=10.0.2.101 --slave_mpp_ip=10.0.2.102 --table_list_file=test.list --log_level=5 --rsync_mode=1 --master_mpp_gc_pw=gbase20110531 --slave_mpp_gc_pw=gbase20110531 --database_user=gbase --slave_create_table_if_not_exists --table_parallel_degree=2
*************Gcluster Sync Tool Start*************************
Table [ vcname000001: testdb: t1] Sync Start
Table [ vcname000001: testdb: t2] Sync Start
Table [ vcname000001 testdb: t1] Sync End cost : <0 s,400 ms>
Table [ vcname000001 testdb: t2] Sync End cost : <0 s,362 ms>
Table [ vcname000001: testdb: t2_inc] Sync Start
Table [ vcname000001: testdb: t3_inc] Sync Start
Table [ vcname000001 testdb: t2_inc] Sync End cost : <0 s,370 ms>
Table [ vcname000001: testdb: t_inc] Sync Start
Table [ vcname000001 testdb: t3_inc] Sync End cost : <0 s,369 ms>
Table [ vcname000001: testdb: t_pid] Sync Start
Table [ vcname000001 testdb: t_inc] Sync End cost : <0 s,336 ms>
Table [ vcname000001: testdb: tidname] Sync Start
Table [ vcname000001 testdb: t_pid] Sync End cost : <0 s,384 ms>
Table [ vcname000001 testdb: tidname] Sync End cost : <0 s,125 ms>
Table [ vcname000001: testdb: tmp2] Sync Start
Table [ vcname000001: testdb: tmp3] Sync Start
Table [ vcname000001 testdb: tmp2] Sync End cost : <0 s,372 ms>
Table [ vcname000001: testdb: tt] Sync Start
Table [ vcname000001 testdb: tmp3] Sync End cost : <0 s,388 ms>
Table [ vcname000001 testdb: tt] Sync End cost : <0 s,114 ms>
Table [ vcname000001: testdb: tt2] Sync Start
Table [ vcname000001 testdb: tt2] Sync End cost : <0 s,145 ms>
*************Gcluster Sync Tool End With Success**************
[gbase@gbase_rh7_001 gcluster_rsynctool]$
同步结果
[gbase@gbase_rh7_001 gcluster_rsynctool]$ gccli -h10.0.2.102 -e"show tables" -Dtestdb
+------------------+
| Tables_in_testdb |
+------------------+
| t1 |
| t2 |
| t2_inc |
| t3_inc |
| t_inc |
| t_pid |
| tidname |
| tmp2 |
| tmp3 |
| tt |
| tt2 |
+------------------+
[gbase@gbase_rh7_001 gcluster_rsynctool]$ gccli -h10.0.2.102 -e"select count(*) from t1" -Dtestdb
+----------+
| count(*) |
+----------+
| 590000 |
+----------+
[gbase@gbase_rh7_001 gcluster_rsynctool]$
影响
根据命令参数,可以看到对表是有锁的(meta_lock的独占锁),默认是600秒。
--lock_table_timeout=LOCK_TABLE_TIMEOUT
The timeout use by gcluster locking table
[default:600,min:1,max:3600,unit:second]
所以无论是主库还是备库,都不建议在业务高峰期做同步,除非同步的是当前不常用的历史表。
主集群锁
Total : 1
+====================================================================================+
| GCLUSTER LOCK |
+====================================================================================+
+-------------------------------+----------+--------------+--------------+------+----+
| Lock name | owner | content | create time |locked|type|
+-------------------------------+----------+--------------+--------------+------+----+
| gc-event-lock |10.0.2.101|global master |20220822112351| TRUE | E |
+-------------------------------+----------+--------------+--------------+------+----+
| vc00001.hashmap_lock |10.0.2.101|416(LWP:21132)|20220822154731| TRUE | S |
+-------------------------------+----------+--------------+--------------+------+----+
| vc00001.testdb.db_lock |10.0.2.101|416(LWP:21132)|20220822154731| TRUE | S |
+-------------------------------+----------+--------------+--------------+------+----+
| vc00001.testdb.t1.meta_lock |10.0.2.101|416(LWP:21132)|20220822154731| TRUE | E |
+-------------------------------+----------+--------------+--------------+------+----+
|vc00001.testdb.table_space_lock|10.0.2.101|416(LWP:21132)|20220822154731| TRUE | S |
+-------------------------------+----------+--------------+--------------+------+----+
备集群锁
+====================================================================================+
| GCLUSTER LOCK |
+====================================================================================+
+-------------------------------+----------+--------------+--------------+------+----+
| Lock name | owner | content | create time |locked|type|
+-------------------------------+----------+--------------+--------------+------+----+
| gc-event-lock |10.0.2.102|global master |20220822141805| TRUE | E |
+-------------------------------+----------+--------------+--------------+------+----+
| vc00001.hashmap_lock |10.0.2.102|339(LWP:11140)|20220822154608| TRUE | S |
+-------------------------------+----------+--------------+--------------+------+----+
| vc00001.testdb.db_lock |10.0.2.102|339(LWP:11140)|20220822154608| TRUE | S |
+-------------------------------+----------+--------------+--------------+------+----+
| vc00001.testdb.t1.meta_lock |10.0.2.102|339(LWP:11140)|20220822154608| TRUE | E |
+-------------------------------+----------+--------------+--------------+------+----+
|vc00001.testdb.table_space_lock|10.0.2.102|339(LWP:11140)|20220822154608| TRUE | S |
+-------------------------------+----------+--------------+--------------+------+----+