GBase 8a支持在某个节点出现临时故障时,某些SQL可以自动从备用节点进行查询。本文介绍该参数gcluster_query_retry的使用样例。
目录导航
参数
gcluster_query_retry
这个参数用于控制是否启用查询重试机制。重试机制是指当主节点不工作时,集群将发送查询给备份节点。默认是不启用。
参数=0关闭;
参数=1启动;
该参数的默认值是1。
适用于高可用备份查询场景:
集群查询中某一步骤,由于连接的节点offline或者其它一些原因查询失败,那么执行器会将该查询由主节点转到备份节点重新查询,但高可用备份查询的前提为该查询没有进入结果处理阶段,一旦进入该阶段(例如已经有结果发送到客户端),则不会启动高可用备份查询,而直接查询报错。
该参数不支持session设置,只能global。
测试环境
2节点集群
[gbase@gbase_rh7_001 gcinstall]$ gcadmin
CLUSTER STATE: ACTIVE
VIRTUAL CLUSTER MODE: NORMAL
=============================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
=============================================================
| NodeName | IpAddress | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 | OPEN | OPEN | 0 |
-------------------------------------------------------------
=========================================================================================================
| GBASE DATA CLUSTER INFORMATION |
=========================================================================================================
| NodeName | IpAddress | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
| node1 | 10.0.2.101 | 1 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node2 | 10.0.2.102 | 1 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
[gbase@gbase_rh7_001 gcinstall]$
测试表和数据
最简单的1个列的表,3行数据
gbase> create database testdb;
Query OK, 1 row affected (Elapsed: 00:00:00.00)
gbase> use testdb;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)
gbase> create table t1(id int);
Query OK, 0 rows affected (Elapsed: 00:00:00.12)
gbase> insert into t1 values(1),(11),(111);
Query OK, 3 rows affected (Elapsed: 00:00:00.11)
Records: 3 Duplicates: 0 Warnings: 0
gbase> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 3 |
+----------+
1 row in set (Elapsed: 00:00:00.01)
构造故障
手工进入101节点,将分片表删除,模拟表不存在的故障
gbase> ^CAborted
[gbase@gbase_rh7_001 gcinstall]$ gncli
GBase client 9.5.2.45.58675fc1. Copyright (c) 2004-2022, GBase. All Rights Reserved.
gbase> use testdb;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)
gbase> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| t1_n1 |
| t1_n2 |
+------------------+
2 rows in set (Elapsed: 00:00:00.00)
gbase> drop table t1_n1;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)
gbase> exit
测试查询重试功能
在集群查询,发现简单的SQL,比如select * , select count(*)等没有报错
gbase> use testdb;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)
gbase> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 3 |
+----------+
1 row in set (Elapsed: 00:00:00.53)
查看集群日志
可以看到集群层发现了表不存在的情况,并记录了日志。集群内部通过重试,自动查询了备用分片,结果成功返回。
[gbase@gbase_rh7_001 gcinstall]$ tail /opt/gbase/gcluster/log/gcluster/express.log -n 30
。。。。
2022-10-25 09:43:09.050 [SQLDISP][ERROR][S:26][Q:96]<ExecSQL|2567>:Query failed, SessionCtx(0x3df0000) HOST(10.0.2.101:5050->27) reason: Table 'testdb.t1_n1' doesn't exist SQL(SELECT COUNT(1) FROM `testdb`.`t1_n1` `vcname000001.testdb.t1`)
[gbase@gbase_rh7_001 gcinstall]$ g
其它SQL测试
发现order by ,group 目前还不支持。
gbase> select * from t1 order by id;
ERROR 1708 (HY000): [10.0.2.101:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: Table 'testdb.t1_n1' doesn't exist
SQL: CREATE TABLE `gctmpdb`._tmp_1694629898_28_t4_1_1666661350_s AS SELECT /*10.0.2.101_28_4_2022-10-25_09:44:37*/ /*+ TID('43') */ `vcname000001.testdb.t1`.`id` AS `c1` FROM `testdb`.`t1_n1` `vcname000001.testdb.t1` LIMIT 0 ;
gbase>
gbase> select id,count(*) from t1 group by id;
ERROR 1708 (HY000): [10.0.2.101:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: Table 'testdb.t1_n1' doesn't exist
SQL: CREATE TABLE `gctmpdb`._tmp_rht_1694629898_32_t9_1_1666661350_s_n1 AS SELECT /*10.0.2.101_32_4_2022-10-25_09:59:39*/ /*+ TID('63') */ `vcname000001.testdb.t1`.`id` AS `c1`, COUNT(0) AS `c2` FROM `testdb`.`t1_n1` `vcname000001.testdb.t1` GROUP BY `vcname000001.testdb.t1`.`id` LIMIT 0 ;
gbase>
关闭自动重试参数测试
关闭重试参数后,SQL直接报错。
gbase> set global gcluster_query_retry=0;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)
gbase> select count(*) from t1;
ERROR 1708 (HY000): [10.0.2.101:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: Table 'testdb.t1_n1' doesn't exist
SQL: SELECT COUNT(1) FROM `testdb`.`t1_n1` `vcname000001.testdb.t1`
gbase>