南大通用GBase 8a表级全量和增量备份还原功能测试gcrcman.py

gcrcman 备份还原

本文介绍GBase 8a数据库集群自带的备份还原工具gcrcman的使用,尝试备份还原一个表。

环境

2节点集群,8.6.2Build43版本

[gbase@862b43 gcluster_0_0]$ gcadmin
CLUSTER STATE:  ACTIVE
CLUSTER MODE:   NORMAL

=================================================================
|             GBASE COORDINATOR CLUSTER INFORMATION             |
=================================================================
|   NodeName   |     IpAddress     |gcware |gcluster |DataState |
-----------------------------------------------------------------
| coordinator1 |    10.0.2.107     | OPEN  |  OPEN   |    0     |
-----------------------------------------------------------------
| coordinator2 |    10.0.2.106     | OPEN  |  OPEN   |    0     |
-----------------------------------------------------------------
=============================================================
|              GBASE DATA CLUSTER INFORMATION               |
=============================================================
|NodeName |     IpAddress     |gnode |syncserver |DataState |
-------------------------------------------------------------
|  node1  |    10.0.2.107     | OPEN |   OPEN    |    0     |
-------------------------------------------------------------
|  node2  |    10.0.2.106     | OPEN |   OPEN    |    0     |

表和数据

1个testdb.t1表,有4行数据。

gbase> select count(*) from t1;
+----------+
| count(*) |
+----------+
|        4 |
+----------+
1 row in set (Elapsed: 00:00:00.04)

gcrcman命令介绍

GBase 8a MPP Cluster 提供专用的备份恢复工具(gcrcman),用户使用它可以方便地对整个集群中的数据进行集群的备份和恢复。集群的备份、恢复工具随集群的安装自动安装,该工具被安装在/opt/gcluster/server/bin 目录下。一次全量备份开启一个新的周期。一次增量备份则续写最后一个备份周期,使其增加一个备份点。备份恢复工具会对磁盘空间进行检查,如果存放备份数据的磁盘空间不足,会报告错误信息。

参数

[gbase@localhost ~]$ gcrcman.py --help
SYNOPSIS
    gcrcman.py [options] <-d|--path BACKUP_PATH>

OPTIONS:
    -h,--help
            Print this help
    -V,--version
            Print version
    -d BACKUP_PATH,--path=BACKUP_PATH
            Specify backup path, it must be absolute path
    -e COMMAND,--execute=COMMAND
            Specify backup or recover command
    -P HOST_PASSWD,--ospassword=HOST_PASSWD
            Specify operation system user gbase's password
    -p DATABASE_PASSWD,--dbpassword=DATABASE_PASSWD
            Specify database user gbase's password
    -r PARALLEL_LEVEL,--parallel=PARALLEL_LEVEL
            Specify parallel level. Parallel level is digital in range [1,128], default 4
    -D,--disk_space_estimate
            Skip estimate disk space before backup or recover
    -c,--checksum_database
            Skip database data checksum
    -C,--checksum_backup_data
            Skip backup data checksum
    -t SECOND,--timeout=SECOND
            Block on read, write or both transaction timeout, should be integar value
            in range [0,3600], default[300 secs]. if zero then block indefinitely

backup or recover commands:
    1.show backup                                                       show backup data
    2.backup level <0|1>                                                backup instance
    3.backup table <dbname.tablename> level <0|1>                       backup single table
    4.recover [<cycle_id> [point_id]]                                   recover instance
    5.recover [force] table <dbname.tablename> [<cycle_id> [point_id]]  recover single table
    6.delete <cycle_id | last>                                          delete backup data
    7.cleanup                                                           clean invalid backup data
    8.quit                                                              exit
    9.help                                                              show help info
gcrcman 的具体功能:
 支持集群级、库级、表级的全量备份
 支持集群级、库级、表级的增量备份
 支持集群级、库级、表级恢复到指定备份周期的指定备份点
 支持集群级、库级、表级恢复到最新备份周期最新备份点
 支持异地备份恢复(nfs 挂载异地备份恢复数据存放磁盘) - 支持删除备份和清除无效备份
数据
 支持查看备份信息
 支持删除备份数据
 支持删除垃圾备份数据

备份前的要求

  • 当前用户必须是dbaUser,也就是数据库安装时的操作系统用户,一般是gbase。
  • 备份目录在每个节点必须存在,且dbaUser有读写权限。
  • 各节点运行正常,没有 offline 节点,没有 event 事件日志,没有CLOSED服务等异常情况。
  • 实例级级备份前,数据库必须处于readonly状态,还原时必须时recovery状态。如果是多vc,每个vc都要设置。表级不需要。
  • 实例级备份时操作,会独占备份恢复锁,其它gcrcman运行集群备份会报错。
  • 实例级备份目录不能用于库级和表级备份。

备份恢复相关锁的信息

集群级并行备份恢复时无法拿到锁的报错

gcrcman>backup level 0
fail to get gcrcman instance backup or recover lock
gcrcman>

实例级级备份恢复锁信息

[root@rh6-2 ~]# gcadmin showlock
 +========================================================================================================================+
 |                                                     GCLUSTER LOCK                                                      |
 +========================================================================================================================+
 +------------------------------------------------+---------+----------------------------------+--------------+------+----+
 |                   Lock name                    |  owner  |             content              | create time  |orphan|type|
 +------------------------------------------------+---------+----------------------------------+--------------+------+----+
 |gcrcmanlock_8c5d910d-85fa-4a45-b7b9-f0282b0c63eb|2001::202|gcrcman instance backup or recover|20230414091550|FALSE | E  |
 +------------------------------------------------+---------+----------------------------------+--------------+------+----+
 Total : 1

库级备份的锁

[gbase@gbase_rh7_001 root]$ gcadmin showlock
 +==================================================================================================+
 |                                          GCLUSTER LOCK                                           |
 +==================================================================================================+
 +----------------------+----------+-------------------------------------+--------------+------+----+
 |      Lock name       |  owner   |               content               | create time  |locked|type|
 +----------------------+----------+-------------------------------------+--------------+------+----+
 |    gc-event-lock     |10.0.2.101|            global master            |20230414080742| TRUE | E  |
 +----------------------+----------+-------------------------------------+--------------+------+----+
 | vc00001.hashmap_lock |10.0.2.101|gcrcman for table backup lock hashmap|20230414102329| TRUE | S  |
 +----------------------+----------+-------------------------------------+--------------+------+----+
 |vc00001.testdb.db_lock|10.0.2.101|  gcrcman for table backup lock db   |20230414102329| TRUE | E  |
 +----------------------+----------+-------------------------------------+--------------+------+----+
 Total : 3

实例级备份目录不能用以库级和表级备份

gcrcman>backup database vc1.testdb level 0
backup dir already has backup data, and type is 'INSTANCE', please change another backup dir to do 'DATABASE' backup!
gcrcman>

日志

gcrcman.py的日志报错在gcluster/log/gcluster/gcrcman.log

交互模式

不带-e参数,进入交互模式 ,按quit或者Ctrl+c退出。

[gbase@localhost ~]$ gcrcman.py -d /home/gbase/backup -P gbase1234 -pXXXXXX
gcrcman>show backup
cycle   point   level   time
0       0       0       2020-09-05 08:09:35
0       1       1       2020-09-05 08:46:35
gcrcman>quit
[gbase@localhost ~]$

批量执行模式

用 -e参数指定要执行的命令即可

[gbase@localhost ~]$ gcrcman.py -d /home/gbase/backup -P gbase1234 -e "show backup"
cycle   point   level   time
0       0       0       2020-09-05 08:09:35
0       1       1       2020-09-05 08:46:35
[gbase@localhost ~]$

开始全量备份

通过gcrcman命令备份这个表。其中命令行的

-d 是指定备份的目录,必须在每个节点都存在,且gbase用户可写入。

-P 参数是操作系统gbase用户的密码

-p 执行数据库gbase用户的密码

backup table 库名.表名 level 0|1

其中0是全备份,1是增量。本测试只测试全备份。

[gbase@localhost ~]$ gcrcman.py -d /home/gbase/backup -P gbase1234
gcrcman>backup table testdb.t1 level 0
09.05 08:09:35  BackUp table testdb.t1 start
--------------------------------------------
09.05 08:09:35  node (10.0.2.107)  backup table begin
09.05 08:09:35  node (10.0.2.106)  backup table begin
09.05 08:10:03  node (10.0.2.107)  backup table success
09.05 08:10:03  node (10.0.2.106)  backup table success
--------------------------------------------
09.05 08:10:03  BackUp table testdb.t1 end

查看备份

show backup命令, 其中

cycle 是备份周期编号

point 是备份点编号

level 是备份级别,0是全备,1是增量

gcrcman>show backup
cycle   point   level   time
0       0       0       2020-09-05 08:09:35

删除表,模拟故障

gbase> drop table t1;
Query OK, 0 rows affected (Elapsed: 00:00:00.12)

gbase> select count(*) from t1;
ERROR 1146 (42S02): Table 'testdb.t1' doesn't exist

还原表

通过recover table 库名.表名 cycleId pointId参数,本例是0 0

gcrcman>recover table testdb.t1 0 0
09.05 08:14:28  check Table topology start
--------------------------------------------
09.05 08:14:28  node (10.0.2.107)  check table topology begin
09.05 08:14:31  node (10.0.2.107)  check table topology success
--------------------------------------------
09.05 08:14:31  check table topology success
09.05 08:14:31  check Table BackUp start
--------------------------------------------
09.05 08:14:31  node (10.0.2.107)  check table backup begin
09.05 08:14:31  node (10.0.2.106)  check table backup begin
09.05 08:14:48  node (10.0.2.107)  check table backup success
09.05 08:14:48  node (10.0.2.106)  check table backup success
--------------------------------------------
09.05 08:14:48  check table BackUp success
09.05 08:14:48  refresh table testdb.t1 start
09.05 08:14:48  node (10.0.2.107)  recreate table begin
09.05 08:14:51  node (10.0.2.107)  recreate table success
09.05 08:14:51  refresh table testdb.t1 end
09.05 08:14:52  Recover table testdb.t1 start
--------------------------------------------
09.05 08:14:52  node (10.0.2.107)  Recover table begin
09.05 08:14:52  node (10.0.2.106)  Recover table begin
09.05 08:15:09  node (10.0.2.107)  Recover table success
09.05 08:15:09  node (10.0.2.106)  Recover table success
--------------------------------------------
09.05 08:15:09  Recover table testdb.t1 success, please refresh it!

验证

根据提示,需要先刷新表,然后查询。不refresh table, 会只看到表结构,但看不到数据。

[gbase@862b43 gcluster_0_0]$ gccli testdb -e"select count(*) from t1"
+----------+
| count(*) |
+----------+
|        0 |
+----------+
[gbase@862b43 gcluster_0_0]$ gccli testdb -e"refresh table testdb.t1;select count(*) from t1"
+----------+
| count(*) |
+----------+
|        4 |
+----------+

变动数据

增加一行数据

gbase> select * from t1;
+------+-------+-------+
| id   | value | birth |
+------+-------+-------+
|    1 |   234 | NULL  |
|    2 |   567 | NULL  |
|    3 |   888 | NULL  |
|    4 |   999 | NULL  |
+------+-------+-------+
4 rows in set (Elapsed: 00:00:00.01)

gbase> insert into t1 values (5,555555,'2020-01-03');
Query OK, 1 row affected (Elapsed: 00:00:02.91)

gbase> select * from t1;
+------+--------+---------------------+
| id   | value  | birth               |
+------+--------+---------------------+
|    1 |    234 | NULL                |
|    2 |    567 | NULL                |
|    3 |    888 | NULL                |
|    4 |    999 | NULL                |
|    5 | 555555 | 2020-01-03 00:00:00 |
+------+--------+---------------------+
5 rows in set (Elapsed: 00:00:00.00)

增量备份

再次做备份,这回选择增量, level 参数为1. 最后得到了cycle等于0的,point 为1,leve为1的一个增量备份。

[gbase@localhost ~]$ gcrcman.py -d /home/gbase/backup -P gbase1234
gcrcman>show backup
cycle   point   level   time
0       0       0       2020-09-05 08:09:35
gcrcman>backup table testdb.t1 level 1
09.05 08:46:32  check cluster topology begin
09.05 08:46:32  node (10.0.2.107)  check topology begin
09.05 08:46:35  node (10.0.2.107)  check topology success
09.05 08:46:35  check cluster topology end
09.05 08:46:35  BackUp table testdb.t1 start
--------------------------------------------
09.05 08:46:35  node (10.0.2.107)  backup table begin
09.05 08:46:35  node (10.0.2.106)  backup table begin
09.05 08:47:06  node (10.0.2.107)  backup table success
09.05 08:47:06  node (10.0.2.106)  backup table success
--------------------------------------------
09.05 08:47:06  BackUp table testdb.t1 end
gcrcman>show backup
cycle   point   level   time
0       0       0       2020-09-05 08:09:35
0       1       1       2020-09-05 08:46:35

再次还原

在还原前,需要先删除掉表,否则会报错。Table testdb.t1 exist, please drop it at first

gcrcman>show backup
cycle   point   level   time
0       0       0       2020-09-05 08:09:35
0       1       1       2020-09-05 08:46:35
gcrcman>recover table testdb.t1 0 1
09.05 08:48:36  check Table topology start
--------------------------------------------
09.05 08:48:36  node (10.0.2.107)  check table topology begin
09.05 08:48:39  node (10.0.2.107)  check table topology success
--------------------------------------------
09.05 08:48:39  check table topology success
09.05 08:48:39  check Table BackUp start
--------------------------------------------
09.05 08:48:39  node (10.0.2.107)  check table backup begin
09.05 08:48:39  node (10.0.2.106)  check table backup begin
09.05 08:48:59  node (10.0.2.107)  check table backup success
09.05 08:48:59  node (10.0.2.106)  check table backup success
--------------------------------------------
09.05 08:48:59  check table BackUp success
Table testdb.t1 exist, please drop it at first

删除表

gbase> select * from t1;
+------+-------+-------+
| id   | value | birth |
+------+-------+-------+
|    1 |   234 | NULL  |
|    2 |   567 | NULL  |
|    3 |   888 | NULL  |
|    4 |   999 | NULL  |
+------+-------+-------+
4 rows in set (Elapsed: 00:00:00.01)

gbase> drop table t1;
Query OK, 0 rows affected (Elapsed: 00:00:00.07)

再次还原

gcrcman>recover table testdb.t1 0 1
09.05 08:50:10  check Table topology start
--------------------------------------------
09.05 08:50:10  node (10.0.2.107)  check table topology begin
09.05 08:50:14  node (10.0.2.107)  check table topology success
--------------------------------------------
09.05 08:50:14  check table topology success
09.05 08:50:14  check Table BackUp start
--------------------------------------------
09.05 08:50:14  node (10.0.2.107)  check table backup begin
09.05 08:50:14  node (10.0.2.106)  check table backup begin
09.05 08:50:34  node (10.0.2.107)  check table backup success
09.05 08:50:34  node (10.0.2.106)  check table backup success
--------------------------------------------
09.05 08:50:34  check table BackUp success
09.05 08:50:34  refresh table testdb.t1 start
09.05 08:50:34  node (10.0.2.107)  recreate table begin
09.05 08:50:38  node (10.0.2.107)  recreate table success
09.05 08:50:38  refresh table testdb.t1 end
09.05 08:50:38  Recover table testdb.t1 start
--------------------------------------------
09.05 08:50:38  node (10.0.2.107)  Recover table begin
09.05 08:50:38  node (10.0.2.106)  Recover table begin
09.05 08:50:56  node (10.0.2.107)  Recover table success
09.05 08:50:56  node (10.0.2.106)  Recover table success
--------------------------------------------
09.05 08:50:56  Recover table testdb.t1 success, please refresh it!
gcrcman>

验证

gbase> refresh table t1;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)

gbase> select * from t1;
+------+--------+---------------------+
| id   | value  | birth               |
+------+--------+---------------------+
|    1 |    234 | NULL                |
|    2 |    567 | NULL                |
|    3 |    888 | NULL                |
|    4 |    999 | NULL                |
|    5 | 555555 | 2020-01-03 00:00:00 |
+------+--------+---------------------+
5 rows in set (Elapsed: 00:00:00.01)

总结

GBase 8a数据库集群自带的gcrcman工具可以实现表级备份,期间不影响数据库的整体状态,不需要切换到readonly,原则上备份和还原时,该表是被锁住独占状态。