南大通用GBase 8a相同子查询多次使用的优化方法

在GBase 8a中,会将子查询保存到临时表里。 如果多个子查询是完全相同的,则会导致资源浪费。本文提供了CTE和参数共2个可行方案来优化这个场景。

场景

某个子查询的结果,会被多次使用,类似如下的

select XXX from (SubQueryA) ta
union all
select YYY from (SubQueryA) tb
union all
select ZZZ from (SubQueryA) tc

优化方案

采用CTE(Common Table Express)

SQL将优化成如下写法,该SQL将只执行一次,节省资源,同时SQL语句也更易读。

with t as SubQueryA
select XXX from t
union all
select YYY from t
union all
select ZZZ from t

参考 GBase 8a 支持CTE 功能with as 介绍

调整数据库参数

CTE方法需要重写SQL语句,如果已上线的系统,短时间内无法改写,可以通过数据库的参数来【复用】临时表。

_t_gcluster_reuse_tmp_table_optimize

该参数默认为0,关闭,设置为2From子查询可以复用。

支持session 级。

如下是开启前和开启后的执行计划对比

默认执行计划

可以看到步骤0和步骤1是各自独立运行的,生成了2个临时表,步骤2将0和1的临时表做了最终处理。

gbase> explain select sum(id) from (select id,count(*) cnt from t1 group by id) t1 union all  select sum(cnt) from (select id,count(*) cnt from t1 group by id) t1 ;
+----+----------+------------+--------+-------------+
| ID | MOTION   | OPERATION  | TABLE  | CONDITION   |
+----+----------+------------+--------+-------------+
| 02 | [RESULT] |  Step      | <00>   |             |
|    |          |  AGG       |        |             |
|    |          |  UNION ALL |        |             |
|    |          |  Step      | <01>   |             |
|    |          |  AGG       |        |             |
| 01 | [GATHER] |  SubQuery3 | t1     |             |
|    |          |   Table    | t1[id] |             |
|    |          |   GROUP    |        | GROUP BY id |
|    |          |  AGG       |        |             |
| 00 | [GATHER] |  SubQuery1 | t1     |             |
|    |          |   Table    | t1[id] |             |
|    |          |   GROUP    |        | GROUP BY id |
|    |          |  AGG       |        |             |
+----+----------+------------+--------+-------------+
13 rows in set (Elapsed: 00:00:00.00)

开启参数后的执行计划

可以看到子查询会先执行,生成临时表0, 然后两部分各自在0的临时表的基础上,执行查询,并生成01,02的临时表,最后03将01和02汇总得到结果。

gbase> explain select sum(id) from (select id,count(*) cnt from t1 group by id) t1 union all  select sum(cnt) from (select id,count(*) cnt from t1 group by id) t1 ;
+----+-------------+------------+--------+-------------+
| ID | MOTION      | OPERATION  | TABLE  | CONDITION   |
+----+-------------+------------+--------+-------------+
| 03 | [RESULT]    |  Step      | <01>   |             |
|    |             |  AGG       |        |             |
|    |             |  UNION ALL |        |             |
|    |             |  Step      | <02>   |             |
|    |             |  AGG       |        |             |
| 02 | [GATHER]    |  Step      | <00>   |             |
|    |             |  AGG       |        |             |
| 01 | [GATHER]    |  Step      | <00>   |             |
|    |             |  AGG       |        |             |
| 00 | [NO REDIST] |  Table     | t1[id] |             |
|    |             |  GROUP     |        | GROUP BY id |
+----+-------------+------------+--------+-------------+
11 rows in set (Elapsed: 00:00:00.01)

总结

对于查询的复用,特别是耗时较久的,建议用CTE方案改写。 短期内可以考虑数据库参数。