本文介绍2个和distinct 有关的参数。_t_gcluster_agg_distinct_redist_optimize 和 _t_gcluster_agg_distinct_redist_optimize_with_groupby。
如下参数,仅在count distinct列或者count distinct group列不是hash分布列的情况下。如果distinct或group有hash列,全部本地计算即可。
目录导航
_t_gcluster_agg_distinct_redist_optimize
aggr(distinct)优化。注意带group by时不参与
1:打开,默认。本地
0:关闭,数据会先拉到一个节点。
执行计划样例
默认情况下,数据先在本地根据distinct列做group, 然后将结果按照distinct的列做二次分发。每个节点本地做完agg后,将结果汇总到一个节点,最后得到结果。
如果参数为0,则数据本地做group后,直接全部汇总到一个节点,得到结果。
gbase> set _t_gcluster_agg_distinct_redist_optimize=1;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)
gbase> explain select count(distinct name) from t1;
+----+----------------+-----------+--------+---------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------------+-----------+--------+---------------+
| 02 | [RESULT] | Step | <01> | |
| | | AGG | | |
| 01 | [GATHER] | Step | <00> | |
| | | AGG | | |
| 00 | [REDIST(name)] | Table | t1[id] | |
| | | GROUP | | GROUP BY name |
+----+----------------+-----------+--------+---------------+
6 rows in set (Elapsed: 00:00:00.00)
gbase> set _t_gcluster_agg_distinct_redist_optimize=0;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)
gbase> explain select count(distinct name) from t1;
+----+----------+-----------+--------+---------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------+-----------+--------+---------------+
| 01 | [RESULT] | Step | <00> | |
| | | AGG | | |
| 00 | [GATHER] | Table | t1[id] | |
| | | GROUP | | GROUP BY name |
+----+----------+-----------+--------+---------------+
4 rows in set (Elapsed: 00:00:00.01)
_t_gcluster_agg_distinct_redist_optimize_with_groupby
agg(distinct A) group by B时,强行指定按A列做重分布,必须在_t_gcluster_agg_distinct_redist_optimize打开时才生效。
0:默认按照group by 的B列重分布
1:按照distinct的A列重分布
执行计划样例
如下例子,是按照id列,也就是group by 的列做动态重分布。
设置参数为1,则多了一步,先按照distinct做重分布,然后再做一次根据group 列的二次重分布。
gbase> explain select count(distinct name) from t2 group by id;
+----+--------------+-----------+---------+-------------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+--------------+-----------+---------+-------------------+
| 01 | [RESULT] | Step | <00> | |
| | | GROUP | | GROUP BY id |
| 00 | [REDIST(id)] | Table | t2[DIS] | |
| | | GROUP | | GROUP BY id, name |
+----+--------------+-----------+---------+-------------------+
4 rows in set (Elapsed: 00:00:00.01)
gbase> set _t_gcluster_agg_distinct_redist_optimize_with_groupby=1;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)
gbase> explain select count(distinct name) from t2 group by id;
+----+----------------+-----------+---------+-------------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------------+-----------+---------+-------------------+
| 02 | [RESULT] | Step | <01> | |
| | | GROUP | | GROUP BY id |
| 01 | [REDIST(id)] | Step | <00> | |
| | | GROUP | | GROUP BY id |
| 00 | [REDIST(name)] | Table | t2[DIS] | |
| | | GROUP | | GROUP BY id, name |
+----+----------------+-----------+---------+-------------------+
6 rows in set (Elapsed: 00:00:00.02)