本文介绍GBase 8a数据库集群自带的备份还原工具gcrcman的使用,尝试备份还原一个表。
目录导航
环境
2节点集群,8.6.2Build43版本
[gbase@862b43 gcluster_0_0]$ gcadmin
CLUSTER STATE: ACTIVE
CLUSTER MODE: NORMAL
=================================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
=================================================================
| NodeName | IpAddress |gcware |gcluster |DataState |
-----------------------------------------------------------------
| coordinator1 | 10.0.2.107 | OPEN | OPEN | 0 |
-----------------------------------------------------------------
| coordinator2 | 10.0.2.106 | OPEN | OPEN | 0 |
-----------------------------------------------------------------
=============================================================
| GBASE DATA CLUSTER INFORMATION |
=============================================================
|NodeName | IpAddress |gnode |syncserver |DataState |
-------------------------------------------------------------
| node1 | 10.0.2.107 | OPEN | OPEN | 0 |
-------------------------------------------------------------
| node2 | 10.0.2.106 | OPEN | OPEN | 0 |
表和数据
1个testdb.t1表,有4行数据。
gbase> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 4 |
+----------+
1 row in set (Elapsed: 00:00:00.04)
gcrcman命令介绍
GBase 8a MPP Cluster 提供专用的备份恢复工具(gcrcman),用户使用它可以方便地对整个集群中的数据进行集群的备份和恢复。集群的备份、恢复工具随集群的安装自动安装,该工具被安装在/opt/gcluster/server/bin 目录下。一次全量备份开启一个新的周期。一次增量备份则续写最后一个备份周期,使其增加一个备份点。备份恢复工具会对磁盘空间进行检查,如果存放备份数据的磁盘空间不足,会报告错误信息。
参数
[gbase@localhost ~]$ gcrcman.py --help
SYNOPSIS
gcrcman.py [options] <-d|--path BACKUP_PATH>
OPTIONS:
-h,--help
Print this help
-V,--version
Print version
-d BACKUP_PATH,--path=BACKUP_PATH
Specify backup path, it must be absolute path
-e COMMAND,--execute=COMMAND
Specify backup or recover command
-P HOST_PASSWD,--ospassword=HOST_PASSWD
Specify operation system user gbase's password
-p DATABASE_PASSWD,--dbpassword=DATABASE_PASSWD
Specify database user gbase's password
-r PARALLEL_LEVEL,--parallel=PARALLEL_LEVEL
Specify parallel level. Parallel level is digital in range [1,128], default 4
-D,--disk_space_estimate
Skip estimate disk space before backup or recover
-c,--checksum_database
Skip database data checksum
-C,--checksum_backup_data
Skip backup data checksum
-t SECOND,--timeout=SECOND
Block on read, write or both transaction timeout, should be integar value
in range [0,3600], default[300 secs]. if zero then block indefinitely
backup or recover commands:
1.show backup show backup data
2.backup level <0|1> backup instance
3.backup table <dbname.tablename> level <0|1> backup single table
4.recover [<cycle_id> [point_id]] recover instance
5.recover [force] table <dbname.tablename> [<cycle_id> [point_id]] recover single table
6.delete <cycle_id | last> delete backup data
7.cleanup clean invalid backup data
8.quit exit
9.help show help info
gcrcman 的具体功能:
支持集群级、库级、表级的全量备份
支持集群级、库级、表级的增量备份
支持集群级、库级、表级恢复到指定备份周期的指定备份点
支持集群级、库级、表级恢复到最新备份周期最新备份点
支持异地备份恢复(nfs 挂载异地备份恢复数据存放磁盘) - 支持删除备份和清除无效备份
数据
支持查看备份信息
支持删除备份数据
支持删除垃圾备份数据
备份前的要求
- 当前用户必须是dbaUser,也就是数据库安装时的操作系统用户,一般是gbase。
- 备份目录在每个节点必须存在,且dbaUser有读写权限。
- 各节点运行正常,没有 offline 节点,没有 event 事件日志,没有CLOSED服务等异常情况。
- 实例级级备份前,数据库必须处于readonly状态,还原时必须时recovery状态。如果是多vc,每个vc都要设置。表级不需要。
- 实例级备份时操作,会独占备份恢复锁,其它gcrcman运行集群备份会报错。
- 实例级备份目录不能用于库级和表级备份。
备份恢复相关锁的信息
集群级并行备份恢复时无法拿到锁的报错
gcrcman>backup level 0
fail to get gcrcman instance backup or recover lock
gcrcman>
实例级级备份恢复锁信息
[root@rh6-2 ~]# gcadmin showlock
+========================================================================================================================+
| GCLUSTER LOCK |
+========================================================================================================================+
+------------------------------------------------+---------+----------------------------------+--------------+------+----+
| Lock name | owner | content | create time |orphan|type|
+------------------------------------------------+---------+----------------------------------+--------------+------+----+
|gcrcmanlock_8c5d910d-85fa-4a45-b7b9-f0282b0c63eb|2001::202|gcrcman instance backup or recover|20230414091550|FALSE | E |
+------------------------------------------------+---------+----------------------------------+--------------+------+----+
Total : 1
库级备份的锁
[gbase@gbase_rh7_001 root]$ gcadmin showlock
+==================================================================================================+
| GCLUSTER LOCK |
+==================================================================================================+
+----------------------+----------+-------------------------------------+--------------+------+----+
| Lock name | owner | content | create time |locked|type|
+----------------------+----------+-------------------------------------+--------------+------+----+
| gc-event-lock |10.0.2.101| global master |20230414080742| TRUE | E |
+----------------------+----------+-------------------------------------+--------------+------+----+
| vc00001.hashmap_lock |10.0.2.101|gcrcman for table backup lock hashmap|20230414102329| TRUE | S |
+----------------------+----------+-------------------------------------+--------------+------+----+
|vc00001.testdb.db_lock|10.0.2.101| gcrcman for table backup lock db |20230414102329| TRUE | E |
+----------------------+----------+-------------------------------------+--------------+------+----+
Total : 3
实例级备份目录不能用以库级和表级备份
gcrcman>backup database vc1.testdb level 0
backup dir already has backup data, and type is 'INSTANCE', please change another backup dir to do 'DATABASE' backup!
gcrcman>
日志
gcrcman.py的日志报错在gcluster/log/gcluster/gcrcman.log
交互模式
不带-e参数,进入交互模式 ,按quit或者Ctrl+c退出。
[gbase@localhost ~]$ gcrcman.py -d /home/gbase/backup -P gbase1234 -pXXXXXX
gcrcman>show backup
cycle point level time
0 0 0 2020-09-05 08:09:35
0 1 1 2020-09-05 08:46:35
gcrcman>quit
[gbase@localhost ~]$
批量执行模式
用 -e参数指定要执行的命令即可
[gbase@localhost ~]$ gcrcman.py -d /home/gbase/backup -P gbase1234 -e "show backup"
cycle point level time
0 0 0 2020-09-05 08:09:35
0 1 1 2020-09-05 08:46:35
[gbase@localhost ~]$
开始全量备份
通过gcrcman命令备份这个表。其中命令行的
-d 是指定备份的目录,必须在每个节点都存在,且gbase用户可写入。
-P 参数是操作系统gbase用户的密码
-p 执行数据库gbase用户的密码
backup table 库名.表名 level 0|1
其中0是全备份,1是增量。本测试只测试全备份。
[gbase@localhost ~]$ gcrcman.py -d /home/gbase/backup -P gbase1234
gcrcman>backup table testdb.t1 level 0
09.05 08:09:35 BackUp table testdb.t1 start
--------------------------------------------
09.05 08:09:35 node (10.0.2.107) backup table begin
09.05 08:09:35 node (10.0.2.106) backup table begin
09.05 08:10:03 node (10.0.2.107) backup table success
09.05 08:10:03 node (10.0.2.106) backup table success
--------------------------------------------
09.05 08:10:03 BackUp table testdb.t1 end
查看备份
show backup命令, 其中
cycle 是备份周期编号
point 是备份点编号
level 是备份级别,0是全备,1是增量
gcrcman>show backup
cycle point level time
0 0 0 2020-09-05 08:09:35
删除表,模拟故障
gbase> drop table t1;
Query OK, 0 rows affected (Elapsed: 00:00:00.12)
gbase> select count(*) from t1;
ERROR 1146 (42S02): Table 'testdb.t1' doesn't exist
还原表
通过recover table 库名.表名 cycleId pointId参数,本例是0 0
gcrcman>recover table testdb.t1 0 0
09.05 08:14:28 check Table topology start
--------------------------------------------
09.05 08:14:28 node (10.0.2.107) check table topology begin
09.05 08:14:31 node (10.0.2.107) check table topology success
--------------------------------------------
09.05 08:14:31 check table topology success
09.05 08:14:31 check Table BackUp start
--------------------------------------------
09.05 08:14:31 node (10.0.2.107) check table backup begin
09.05 08:14:31 node (10.0.2.106) check table backup begin
09.05 08:14:48 node (10.0.2.107) check table backup success
09.05 08:14:48 node (10.0.2.106) check table backup success
--------------------------------------------
09.05 08:14:48 check table BackUp success
09.05 08:14:48 refresh table testdb.t1 start
09.05 08:14:48 node (10.0.2.107) recreate table begin
09.05 08:14:51 node (10.0.2.107) recreate table success
09.05 08:14:51 refresh table testdb.t1 end
09.05 08:14:52 Recover table testdb.t1 start
--------------------------------------------
09.05 08:14:52 node (10.0.2.107) Recover table begin
09.05 08:14:52 node (10.0.2.106) Recover table begin
09.05 08:15:09 node (10.0.2.107) Recover table success
09.05 08:15:09 node (10.0.2.106) Recover table success
--------------------------------------------
09.05 08:15:09 Recover table testdb.t1 success, please refresh it!
验证
根据提示,需要先刷新表,然后查询。不refresh table, 会只看到表结构,但看不到数据。
[gbase@862b43 gcluster_0_0]$ gccli testdb -e"select count(*) from t1"
+----------+
| count(*) |
+----------+
| 0 |
+----------+
[gbase@862b43 gcluster_0_0]$ gccli testdb -e"refresh table testdb.t1;select count(*) from t1"
+----------+
| count(*) |
+----------+
| 4 |
+----------+
变动数据
增加一行数据
gbase> select * from t1;
+------+-------+-------+
| id | value | birth |
+------+-------+-------+
| 1 | 234 | NULL |
| 2 | 567 | NULL |
| 3 | 888 | NULL |
| 4 | 999 | NULL |
+------+-------+-------+
4 rows in set (Elapsed: 00:00:00.01)
gbase> insert into t1 values (5,555555,'2020-01-03');
Query OK, 1 row affected (Elapsed: 00:00:02.91)
gbase> select * from t1;
+------+--------+---------------------+
| id | value | birth |
+------+--------+---------------------+
| 1 | 234 | NULL |
| 2 | 567 | NULL |
| 3 | 888 | NULL |
| 4 | 999 | NULL |
| 5 | 555555 | 2020-01-03 00:00:00 |
+------+--------+---------------------+
5 rows in set (Elapsed: 00:00:00.00)
增量备份
再次做备份,这回选择增量, level 参数为1. 最后得到了cycle等于0的,point 为1,leve为1的一个增量备份。
[gbase@localhost ~]$ gcrcman.py -d /home/gbase/backup -P gbase1234
gcrcman>show backup
cycle point level time
0 0 0 2020-09-05 08:09:35
gcrcman>backup table testdb.t1 level 1
09.05 08:46:32 check cluster topology begin
09.05 08:46:32 node (10.0.2.107) check topology begin
09.05 08:46:35 node (10.0.2.107) check topology success
09.05 08:46:35 check cluster topology end
09.05 08:46:35 BackUp table testdb.t1 start
--------------------------------------------
09.05 08:46:35 node (10.0.2.107) backup table begin
09.05 08:46:35 node (10.0.2.106) backup table begin
09.05 08:47:06 node (10.0.2.107) backup table success
09.05 08:47:06 node (10.0.2.106) backup table success
--------------------------------------------
09.05 08:47:06 BackUp table testdb.t1 end
gcrcman>show backup
cycle point level time
0 0 0 2020-09-05 08:09:35
0 1 1 2020-09-05 08:46:35
再次还原
在还原前,需要先删除掉表,否则会报错。Table testdb.t1 exist, please drop it at first
gcrcman>show backup
cycle point level time
0 0 0 2020-09-05 08:09:35
0 1 1 2020-09-05 08:46:35
gcrcman>recover table testdb.t1 0 1
09.05 08:48:36 check Table topology start
--------------------------------------------
09.05 08:48:36 node (10.0.2.107) check table topology begin
09.05 08:48:39 node (10.0.2.107) check table topology success
--------------------------------------------
09.05 08:48:39 check table topology success
09.05 08:48:39 check Table BackUp start
--------------------------------------------
09.05 08:48:39 node (10.0.2.107) check table backup begin
09.05 08:48:39 node (10.0.2.106) check table backup begin
09.05 08:48:59 node (10.0.2.107) check table backup success
09.05 08:48:59 node (10.0.2.106) check table backup success
--------------------------------------------
09.05 08:48:59 check table BackUp success
Table testdb.t1 exist, please drop it at first
删除表
gbase> select * from t1;
+------+-------+-------+
| id | value | birth |
+------+-------+-------+
| 1 | 234 | NULL |
| 2 | 567 | NULL |
| 3 | 888 | NULL |
| 4 | 999 | NULL |
+------+-------+-------+
4 rows in set (Elapsed: 00:00:00.01)
gbase> drop table t1;
Query OK, 0 rows affected (Elapsed: 00:00:00.07)
再次还原
gcrcman>recover table testdb.t1 0 1
09.05 08:50:10 check Table topology start
--------------------------------------------
09.05 08:50:10 node (10.0.2.107) check table topology begin
09.05 08:50:14 node (10.0.2.107) check table topology success
--------------------------------------------
09.05 08:50:14 check table topology success
09.05 08:50:14 check Table BackUp start
--------------------------------------------
09.05 08:50:14 node (10.0.2.107) check table backup begin
09.05 08:50:14 node (10.0.2.106) check table backup begin
09.05 08:50:34 node (10.0.2.107) check table backup success
09.05 08:50:34 node (10.0.2.106) check table backup success
--------------------------------------------
09.05 08:50:34 check table BackUp success
09.05 08:50:34 refresh table testdb.t1 start
09.05 08:50:34 node (10.0.2.107) recreate table begin
09.05 08:50:38 node (10.0.2.107) recreate table success
09.05 08:50:38 refresh table testdb.t1 end
09.05 08:50:38 Recover table testdb.t1 start
--------------------------------------------
09.05 08:50:38 node (10.0.2.107) Recover table begin
09.05 08:50:38 node (10.0.2.106) Recover table begin
09.05 08:50:56 node (10.0.2.107) Recover table success
09.05 08:50:56 node (10.0.2.106) Recover table success
--------------------------------------------
09.05 08:50:56 Recover table testdb.t1 success, please refresh it!
gcrcman>
验证
gbase> refresh table t1;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)
gbase> select * from t1;
+------+--------+---------------------+
| id | value | birth |
+------+--------+---------------------+
| 1 | 234 | NULL |
| 2 | 567 | NULL |
| 3 | 888 | NULL |
| 4 | 999 | NULL |
| 5 | 555555 | 2020-01-03 00:00:00 |
+------+--------+---------------------+
5 rows in set (Elapsed: 00:00:00.01)
总结
GBase 8a数据库集群自带的gcrcman工具可以实现表级备份,期间不影响数据库的整体状态,不需要切换到readonly,原则上备份和还原时,该表是被锁住独占状态。