南大通用GBase 8a在硬件原因导致主副本都损坏用户允许数据部分丢失时的处理方案

GBase 8a是将数据分散到多台服务器来实现MPP架构,每个分片数据通过副本来保证高可用,最高允许2个副本。如果因意外,比如多台服务器RAID卡故障,服务器损毁等肯定无法恢复数据的情况,导致分片所在的所有主副本都不可用,此时虽然其它没有损坏的服务器数据是正常的,但在集群层因部分分片数据丢失而无法查询(完整性)。 如果用户允许这部分无法恢复的数据丢失,其它数据希望能继续查询,新服务器能替换现有故障节点,新的表能继续提供正常服务时,本文提供了一个处理方案。

本文仅针对V9.5版本。

警告:非常规操作,如果数据量不大,建议从gnode导出数据,然后重建集群,再把数据load进去更好一些,至少随机分布表的数据分布更均匀点。

概述

分片数据必然丢失

即使再多的副本,只要主副本都丢失了,这部分数据就肯定丢失了。更多的副本带来更高的安全性,但需要更多的硬件资源。

目标

故障节点恢复前,现有可用节点的数据能继续提供【查询】服务, 不能继续提供DML,DDL等服务

故障节点恢复后,表能提供完整的功能。

方案整体描述

提供丢失部分数据的查询服务

GBase 8a提供了如下的参数,可以在主副本的服务都CLOSE/OFFLINE时能继续提供查询服务。建议同时修改集群的配置文件。

set global gcluster_allow_sg_lost=1;

此时集群只能查询,查询结果缺少损坏分片的数据,在故障节点恢复前,不能提供任何DML,DDL操作。

故障节点替换方案

服务器维修后,需要将集群的程序、数据复制到新节点。

程序和元数据替换

从其它正常节点复制集群运行必须的内容,包括元数据,但不包括用户数据(一般很大)

gcware从其它节点全部复制

gcluster从其它节点全部复制

gnode 除了userdata/gbase/用户表外,其它的都复制。 用户表数据看后面的用户数据替换章节

用户数据替换

确认丢失的分片,在计算节点主副本都手工建的空的分片表。

有副本的分片,在调度节点设置同步标志,自动创建分片表和恢复数据。

环境准备

搭建1个管理3个数据计算的集群,然后把2个数据节点服务停掉,并将用户的数据文件删除(程序就算了,懒得复制)。

集群

[gbase@gbase_rh7_001 gcinstall_9.5.2.43.5]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.101                |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |                10.0.2.102                |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node3   |                10.0.2.115                |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------

数据

[gbase@gbase_rh7_001 gcinstall_9.5.2.43.5]$ gccli

GBase client 9.5.2.43.5f8fd4b2. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> initnodedatamap;
Query OK, 0 rows affected (Elapsed: 00:00:00.47)

gbase> create database testdb;
Query OK, 1 row affected (Elapsed: 00:00:00.01)

gbase> use testdb;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)

gbase> create table t1(id int) distributed by ('id');
Query OK, 0 rows affected (Elapsed: 00:00:00.14)

gbase> insert into t1 values(1),(10),(100),(1000);
Query OK, 4 rows affected (Elapsed: 00:00:00.17)
Records: 4  Duplicates: 0  Warnings: 0

gbase> select * from t1;
+------+
| id   |
+------+
|   10 |
|  100 |
|    1 |
| 1000 |
+------+
4 rows in set (Elapsed: 00:00:00.01)

gbase> exit
Bye

模拟主副本都故障

停掉2个节点

[gbase@gbase_rh7_001 gcinstall_9.5.2.43.5]$ ssh 10.0.2.102 "gcluster_services all stop"
gbase@10.0.2.102's password:
Stopping GCMonit success!
Stopping gbase :                                           [  OK  ]
Stopping syncserver :                                      [  OK  ]
[gbase@gbase_rh7_001 gcinstall_9.5.2.43.5]$ ssh 10.0.2.115 "gcluster_services all stop"
gbase@10.0.2.115's password:
Stopping GCMonit success!
Stopping gbase :                                           [  OK  ]
Stopping syncserver :                                      [  OK  ]
[gbase@gbase_rh7_001 gcinstall_9.5.2.43.5]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.101                |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |                10.0.2.102                |       1        | CLOSE |   CLOSE    |     0     |
---------------------------------------------------------------------------------------------------------
|  node3   |                10.0.2.115                |       1        | CLOSE |   CLOSE    |     0     |
---------------------------------------------------------------------------------------------------------

删除主副本分片数据文件

删掉102和115对应的t1表的元数据和数据,模拟无法恢复的故障。 115上一样操作就不贴了。

清理102节点的数据文件
[gbase@localhost ~]$ cd /opt/gbase/gnode/userdata/gbase/
[gbase@localhost gbase]$ ll
total 12
-rw------- 1 gbase gbase   19 May 19 15:59 express.seq
drwx------ 2 gbase gbase 4096 May 19 15:59 gbase
drwx------ 4 gbase gbase   44 May 19 15:59 gclusterdb
drwx------ 4 gbase gbase   44 May 19 15:59 gctmpdb
drwx------ 4 gbase gbase   44 May 19 16:01 testdb
[gbase@localhost gbase]$ cd testdb
[gbase@localhost testdb]$ ll
total 0
drwx------ 4 gbase gbase 88 May 19 16:02 metadata
drwx------ 4 gbase gbase 32 May 19 16:02 sys_tablespace
[gbase@localhost testdb]$ cd me
-bash: cd: me: No such file or directory
[gbase@localhost testdb]$ cd metadata/
[gbase@localhost metadata]$ ll
total 28
-rw------- 1 gbase gbase   61 May 19 16:01 db.opt
-rw------- 1 gbase gbase 8558 May 19 16:02 t1_n1.frm
drwx------ 2 gbase gbase   65 May 19 16:02 t1_n1.GED
-rw------- 1 gbase gbase 8558 May 19 16:02 t1_n2.frm
drwx------ 2 gbase gbase   38 May 19 16:02 t1_n2.GED
[gbase@localhost metadata]$ mkdir bak
[gbase@localhost metadata]$ mv t1* bak/
[gbase@localhost metadata]$ ll
total 4
drwxrwxr-x 4 gbase gbase 74 May 19 16:06 bak
-rw------- 1 gbase gbase 61 May 19 16:01 db.opt
[gbase@localhost metadata]$ cd ..
[gbase@localhost testdb]$ cd sys_tablespace/
[gbase@localhost sys_tablespace]$ ll
total 0
drwx------ 2 gbase gbase 26 May 19 16:02 t1_n1
drwx------ 2 gbase gbase  6 May 19 16:02 t1_n2
[gbase@localhost sys_tablespace]$ mkdir bak
[gbase@localhost sys_tablespace]$ mv t1* bak/
[gbase@localhost sys_tablespace]$ ll
total 0
drwxrwxr-x 4 gbase gbase 32 May 19 16:07 bak
[gbase@localhost sys_tablespace]$

查询报错

[gbase@gbase_rh7_001 gcinstall_9.5.2.43.5]$ gccli testdb

GBase client 9.5.2.43.5f8fd4b2. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> select * from t1;
ERROR 1708 (HY000): (GBA-02EX-0004) Failed to get metadata:
DETAIL: check nodes, no valid node for suffix: n2,
please execute 'show datacopymap database.table_name;' to see the detail.

通过参数提供缺少部分数据的查询服务

例子不太好,n2分片竟然没有分到一行数据,但不影响不设置参数会出现前面的查询报错。

注意warnings信息,没有可用的n2分片。

gbase> show variables like '%sg%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| gcluster_allow_sg_lost | 0     |
+------------------------+-------+
1 row in set (Elapsed: 00:00:00.00)

gbase> set global gcluster_allow_sg_lost=1;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)

gbase> show variables like '%sg%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| gcluster_allow_sg_lost | 1     |
+------------------------+-------+
1 row in set (Elapsed: 00:00:00.00)

gbase> select * from t1;
+------+
| id   |
+------+
|   10 |
|  100 |
|    1 |
| 1000 |
+------+
4 rows in set, 1 warning (Elapsed: 00:00:00.00)

gbase> show warnings;
+-------+------+------------------------------------+
| Level | Code | Message                            |
+-------+------+------------------------------------+
| Note  | 1702 | No valid nodes for table part 'n2' |
+-------+------+------------------------------------+
1 row in set (Elapsed: 00:00:00.00)

DML,DDL报错

gbase> insert into t1 values(20);
ERROR 1705 (HY000): gcluster DML error: segment id 2 has no valid node.

恢复程序和元数据

将整个目录从正常节点复制过来,除了gnode的userdata/gbase/用户库。

这步我省略了,一堆的scp -r操作。

启动服务

先启动了102节点。

[gbase@localhost root]$ gcluster_services all start
Starting gbase :                                           [  OK  ]
Starting syncserver :                                      [  OK  ]
Starting GCMonit success!
[gbase@localhost root]$

查询报表不存在

当服务启动后,gcluster_allow_sg_lost参数失效,此时必须尽快恢复用户数据。

gbase@gbase_rh7_001 gcinstall_9.5.2.43.5]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.101                |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |                10.0.2.102                |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node3   |                10.0.2.115                |       1        | CLOSE |   CLOSE    |     0     |
---------------------------------------------------------------------------------------------------------

[gbase@gbase_rh7_001 gcinstall_9.5.2.43.5]$ gccli testdb

GBase client 9.5.2.43.5f8fd4b2. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> select * from t1;
ERROR 1708 (HY000): [10.0.2.102:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: Table 'testdb.t1_n2' doesn't exist
SQL: SELECT /*10.0.2.101_30_1_2022-05-19_16:09:18*/ /*+ TID('55') */ `vcname000001.testdb.t1`.`id` AS `id` FROM `testdb`.`t1_n2` `vcname000001.testdb.t1`
gbase> show variables like '%sg%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| gcluster_allow_sg_lost | 1     |
+------------------------+-------+
1 row in set (Elapsed: 00:00:00.01)

恢复用户数据

确认丢失的表的数据

在计算节点服务创建空的分片表。

在102上gnode建表
[gbase@localhost root]$ gncli

GBase client 9.5.2.43.5f8fd4b2. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> use testdb;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)

gbase> show tables;
Empty set (Elapsed: 00:00:00.00)

gbase> create table t1_n2(id int);
Query OK, 0 rows affected (Elapsed: 00:00:00.03)

可以正常查询了

在101上查询
gbase> select * from t1;
+------+
| id   |
+------+
|    1 |
| 1000 |
|   10 |
|  100 |
+------+
4 rows in set (Elapsed: 00:00:00.01)

可恢复的表设置自动同步

详情参考 https://www.gbase8.cn/5799

gbase> show nodes;
+------------+------------+-------+--------------+----------------+---------+-----------+
| Id         | ip         | name  | primary part | duplicate part | status  | datastate |
+------------+------------+-------+--------------+----------------+---------+-----------+
| 1694629898 | 10.0.2.101 | node1 | n1           | n3             | online  |         0 |
| 1711407114 | 10.0.2.102 | node2 | n2           | n1             | online  |         0 |
| 1929510922 | 10.0.2.115 | node3 | n3           | n2             | offline |         0 |
+------------+------------+-------+--------------+----------------+---------+-----------+
3 rows in set (Elapsed: 00:00:00.00)

gbase> show vcs;
+---------+--------------+---------+
| id      | name         | default |
+---------+--------------+---------+
| vc00001 | vcname000001 | Y       |
+---------+--------------+---------+
1 row in set (Elapsed: 00:00:00.00)

gbase> set self gcluster_node_status_list='vc00001.testdb.t1:n1:1711407114:2:2';
Query OK, 0 rows affected (Elapsed: 00:00:00.02)

查看恢复的表数据

然后在102上查询,一会就有数据了
gbase> select * from testdb.t1_n1;
+------+
| id   |
+------+
|   10 |
|  100 |
+------+
2 rows in set (Elapsed: 00:00:00.00)

其它节点相同的处理方式

根据分片情况,创建空表或者自动同步