本文根据网上常见的SQL面试题,查询出每门课都大于80 分的学生姓名,给出分析和在GBase 8a数据库集群上的执行结果。
目录导航
题目
已知学生成绩如下:
gbase> select * from exam_1;
+--------+--------+-------+
| name | class | score |
+--------+--------+-------+
| 张三 | 语文 | 81 |
| 张三 | 数学 | 75 |
| 李四 | 语文 | 76 |
| 李四 | 数学 | 90 |
| 王五 | 语文 | 81 |
| 王五 | 数学 | 100 |
| 王五 | 英语 | 90 |
+--------+--------+-------+
7 rows in set (Elapsed: 00:00:00.00)
查询出每门课都大于80 分的学生姓名
分析
1、题目中,课程数量不是相同的,所以不能提前假设只有3门成绩,解答方案必须能适应任意课程数量。
2、题目只要求学生姓名,所以课程列几乎就是多余的,算干扰因素了。
3、大于80分,指最低分是>80的。
答案样例
通过group
以name作为维度,最小分数min(score)做聚合运算,然后选择那些最小分数大于80分的
gbase> select name from exam_1 group by name having(min(score)>80);
+--------+
| name |
+--------+
| 王五 |
+--------+
1 row in set (Elapsed: 00:00:00.03)
执行计划如下
gbase> explain select name from exam_1 group by name having(min(score)>80);
+----+----------------+-----------+-------------+-------------------------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------------+-----------+-------------+-------------------------------+
| 01 | [RESULT] | Step | <00> | |
| | | GROUP | | GROUP BY name |
| | | HAVING | | HAVING (MIN(MIN(score)) > 80) |
| 00 | [REDIST(name)] | Table | exam_1[DIS] | |
| | | GROUP | | GROUP BY name |
+----+----------------+-----------+-------------+-------------------------------+
5 rows in set (Elapsed: 00:00:00.01)
通过not exists
将那些存在分数<=80的name 排除在外。
gbase> select distinct name from exam_1 a where not exists (select 1 from exam_1 b where score<=80 and a.name=b.name);
+--------+
| name |
+--------+
| 王五 |
+--------+
1 row in set (Elapsed: 00:00:00.08)
执行计划如下
gbase> explain select distinct name from exam_1 a where not exists (select 1 from exam_1 b where score<=80 and a.name=b.name);
+----+----------------+------------+-------------+--------------------------+
| ID | MOTION | OPERATION | TABLE | CONDITION |
+----+----------------+------------+-------------+--------------------------+
| 02 | [RESULT] | Step | <00> | |
| | | WHERE | | NOT EXISTS ([SubQuery1]) |
| | | AGG | | |
| 01 | [REDIST(name)] | SubQuery1 | placeholder | |
| | | SCAN | b[DIS] | (score{S} <= 80) |
| 00 | [REDIST(name)] | Table | a[DIS] | |
+----+----------------+------------+-------------+--------------------------+
6 rows in set (Elapsed: 00:00:00.00)
总结
2个方案,从执行计划看,方案1的group性能预计更优一些。方案2存在join的情况。