南大通用GBase 8a的SQL重试功能测试,gcluster_query_retry参数介绍

GBase 8a支持在某个节点出现临时故障时,某些SQL可以自动从备用节点进行查询。本文介绍该参数gcluster_query_retry的使用样例。

参数

gcluster_query_retry

这个参数用于控制是否启用查询重试机制。重试机制是指当主节点不工作时,集群将发送查询给备份节点。默认是不启用。
参数=0关闭;
参数=1启动;
该参数的默认值是1。

适用于高可用备份查询场景:

集群查询中某一步骤,由于连接的节点offline或者其它一些原因查询失败,那么执行器会将该查询由主节点转到备份节点重新查询,但高可用备份查询的前提为该查询没有进入结果处理阶段,一旦进入该阶段(例如已经有结果发送到客户端),则不会启动高可用备份查询,而直接查询报错。

该参数不支持session设置,只能global。

测试环境

2节点集群

[gbase@gbase_rh7_001 gcinstall]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.101                |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |                10.0.2.102                |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------

[gbase@gbase_rh7_001 gcinstall]$ 

测试表和数据

最简单的1个列的表,3行数据


gbase> create database testdb;
Query OK, 1 row affected (Elapsed: 00:00:00.00)

gbase> use testdb;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)

gbase> create table t1(id int);
Query OK, 0 rows affected (Elapsed: 00:00:00.12)

gbase> insert into t1 values(1),(11),(111);
Query OK, 3 rows affected (Elapsed: 00:00:00.11)
Records: 3  Duplicates: 0  Warnings: 0

gbase> select count(*) from t1;
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (Elapsed: 00:00:00.01)

构造故障

手工进入101节点,将分片表删除,模拟表不存在的故障

gbase> ^CAborted
[gbase@gbase_rh7_001 gcinstall]$ gncli

GBase client 9.5.2.45.58675fc1. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> use testdb;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)

gbase> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| t1_n1            |
| t1_n2            |
+------------------+
2 rows in set (Elapsed: 00:00:00.00)

gbase> drop table t1_n1;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)

gbase> exit

测试查询重试功能

在集群查询,发现简单的SQL,比如select * , select count(*)等没有报错

gbase> use testdb;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)

gbase> select count(*) from t1;
+----------+
| count(*) |
+----------+
|        3 |
+----------+
1 row in set (Elapsed: 00:00:00.53)

查看集群日志

可以看到集群层发现了表不存在的情况,并记录了日志。集群内部通过重试,自动查询了备用分片,结果成功返回。

[gbase@gbase_rh7_001 gcinstall]$ tail /opt/gbase/gcluster/log/gcluster/express.log  -n 30

。。。。
2022-10-25 09:43:09.050 [SQLDISP][ERROR][S:26][Q:96]<ExecSQL|2567>:Query failed, SessionCtx(0x3df0000) HOST(10.0.2.101:5050->27) reason: Table 'testdb.t1_n1' doesn't exist SQL(SELECT  COUNT(1) FROM `testdb`.`t1_n1`  `vcname000001.testdb.t1`)
[gbase@gbase_rh7_001 gcinstall]$ g

其它SQL测试

发现order by ,group 目前还不支持。

gbase> select * from t1 order by id;
ERROR 1708 (HY000): [10.0.2.101:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: Table 'testdb.t1_n1' doesn't exist
SQL: CREATE TABLE `gctmpdb`._tmp_1694629898_28_t4_1_1666661350_s AS SELECT /*10.0.2.101_28_4_2022-10-25_09:44:37*/ /*+ TID('43') */ `vcname000001.testdb.t1`.`id` AS `c1` FROM `testdb`.`t1_n1` `vcname000001.testdb.t1` LIMIT 0 ;
gbase> 
gbase> select id,count(*) from t1 group by id;
ERROR 1708 (HY000): [10.0.2.101:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: Table 'testdb.t1_n1' doesn't exist
SQL: CREATE TABLE `gctmpdb`._tmp_rht_1694629898_32_t9_1_1666661350_s_n1 AS SELECT /*10.0.2.101_32_4_2022-10-25_09:59:39*/ /*+ TID('63') */ `vcname000001.testdb.t1`.`id` AS `c1`, COUNT(0) AS `c2` FROM `testdb`.`t1_n1` `vcname000001.testdb.t1` GROUP BY `vcname000001.testdb.t1`.`id` LIMIT 0 ;
gbase>

关闭自动重试参数测试

关闭重试参数后,SQL直接报错。

gbase> set global  gcluster_query_retry=0;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)

gbase> select count(*) from t1;
ERROR 1708 (HY000): [10.0.2.101:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: Table 'testdb.t1_n1' doesn't exist
SQL: SELECT  COUNT(1) FROM `testdb`.`t1_n1`  `vcname000001.testdb.t1`
gbase>