南大通用GBase 8a升级过程意外失败后强制回退的方法Restore.py

本文介绍,在GBase 8a数据库集群升级时,由于意外原因,比如断电,死机等,导致数据库已经更新了部分内容,但无法正常完成时,强行Restore的方法。

环境

三节点环境,V95版本

[gbase@localhost gcinstall]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.102 | CLOSE  |  CLOSE   |     0     |
-------------------------------------------------------------
| coordinator2 | 10.0.2.202 |  OPEN  |  CLOSE   |     0     |
-------------------------------------------------------------
| coordinator3 | 10.0.2.203 | CLOSE  |  CLOSE   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.102                |       6        | CLOSE |   CLOSE    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |                10.0.2.202                |       6        | CLOSE |   CLOSE    |     0     |
---------------------------------------------------------------------------------------------------------
|  node3   |                10.0.2.203                |       6        | CLOSE |   CLOSE    |     0     |
---------------------------------------------------------------------------------------------------------

升级失败

失败卡住了,后台kill掉了

10.0.2.202              install cluster on host 10.0.2.202 successfully.
10.0.2.102              install cluster on host 10.0.2.102 successfully.
Starting all gcluster nodes...
^[[B^C^C^C^C^C^C^C^C^C^C^C^C^C^C^CKilled
[gbase@localhost gcinstall]$

另开一个终端
[gbase@localhost gcinstall]$
[gbase@localhost ~]$ ps -ef|grep gcin
gbase    17087 15973  0 02:08 pts/1    00:00:02 python ./gcinstall.py -U --silent=demo.options
gbase    17889 17133  0 02:32 pts/0    00:00:00 grep --color=auto gcin
[gbase@localhost ~]$ kill -9 17087
[gbase@localhost ~]$
[gbase@localhost ~]$

清理升级残留

升级一半时意外失败,部分节点可能残留了还在运行中的程序,避免他们影响restore程序的运行。包括但不限于

  • gcinstall.py 安装升级发起程序
  • InstallTar.py 节点安装程序
  • tar XXX 备份打包程序, 用 tar命令
  • gbase服务, 回退前要将数据库服务停掉,并确认正常停止 ps -ef|grep gbase 看不到数据库进程
  • python 其它的py程序,如果有存在长时间运行没有自动停下的。

查看备份信息

选择和安装时间最接近的备份,比如gcluster_backup_9.5.2.17.115980_20200908021544.tar.bz2 一般都是最新的那个。

[gbase@localhost gcinstall]$ ll /home/gbase/
total 391200
drwx------ 2 gbase gbase        20 Mar  6  2020 data
-rw-r--r-- 1 root  root  138437196 Sep  8 00:57 GBase8a_MPP_Cluster-NoLicense-9.5.2.26-redhat7.3-x86_64.tar.bz2__
drwxrwxr-x 3 gbase gbase      4096 Sep  8 02:20 gcinstall
-rw-rw-r-- 1 gbase gbase 131070648 Sep  8 01:14 gcluster_backup_9.5.2.17.115980_20200908011242.tar.bz2
-rw-rw-r-- 1 gbase gbase 131070631 Sep  8 02:17 gcluster_backup_9.5.2.17.115980_20200908021544.tar.bz2
[gbase@localhost gcinstall]$ ./Restore.py --help
Usage: Restore.py [options]

开始还原

Restore命令备份文件名字,和升级用的配置文件

[gbase@localhost gcinstall]$ ./Restore.py --help
Usage: Restore.py [options]

Options:
  -h, --help            show this help message and exit
  -a                    do not prompt the user for confirmation
  --backupFile=BACKUPFILE
                        backup package
  --silent=SILENTCONFIG
                        use the supplied properties file for a 'silent'
                        restore
  --passwordInputMode=PASSWORDINPUTMODE
                        get password method[file,pwdsame,pwddiff],
                        file:  get from config file,default
                        pwdsame: nodes have the same user passwd
                        pwddiff: nodes have different user passwds
[gbase@localhost gcinstall]$ ./Restore.py --backupFile=/home/gbase/gcluster_backup_9.5.2.17.115980_20200908021544.tar.bz2 --silent=demo.options
CoordinateHost:
10.0.2.102    10.0.2.202    10.0.2.203
DataHost:
10.0.2.102    10.0.2.202    10.0.2.203
Are you sure to restore these gcluster nodes from /home/gbase/gcluster_backup_9.5.2.17.115980_20200908021544.tar.bz2 ([Y,y]/[N,n])? y
10.0.2.203      Success to RestoreLocal.
10.0.2.202      Success to RestoreLocal.
10.0.2.102      Success to RestoreLocal.
[gbase@localhost gcinstall]$

完成