我们经常遇到的一个问题就是,都是过滤条件,注意不是join列,放在where里和on里有啥区别呢?本文以GBase数据库为例,研究下本问题。
整体看,on就是对主表和关联表符合条件的数据做关联,不符合的不关联,填充null,属于【事中】处理。而where 是对哪些主表数据参与关联,以及哪些关联【后】的结果符合要求进行过滤,更像是【事前】和【事后】处理。
目录导航
业务需求
如下是一个获取40岁以下人员流水订单的业务场景。其中id是关联条件,而b.age是过滤条件,分别放在了on和where部分。
select a.*,b.age from a join b on a.id=b.id and b.age<40;
select a.*,b.age from a join b on a.id=b.id where b.age<40;
模拟数据
gbase> select * from a;
+------+------+
| id | bill |
+------+------+
| 1 | 100 |
| 2 | 200 |
| 1 | 101 |
| 2 | 202 |
+------+------+
4 rows in set (Elapsed: 00:00:00.00)
gbase> select * from b;
+------+------+
| id | age |
+------+------+
| 1 | 30 |
| 2 | 50 |
+------+------+
2 rows in set (Elapsed: 00:00:00.00)
执行结果
分别执行2个SQL,可以看到结果完全相同。
gbase> select a.*,b.age from a join b on a.id=b.id and b.age<40;
+------+------+------+
| id | bill | age |
+------+------+------+
| 1 | 100 | 30 |
| 1 | 101 | 30 |
+------+------+------+
2 rows in set (Elapsed: 00:00:00.03)
gbase> select a.*,b.age from a join b on a.id=b.id where b.age<40
-> ;
+------+------+------+
| id | bill | age |
+------+------+------+
| 1 | 100 | 30 |
| 1 | 101 | 30 |
+------+------+------+
2 rows in set (Elapsed: 00:00:00.04)
查看执行计划
从执行计划看,也是完全相同。都是先在B表过滤age<40的数据,然后再和A表做id=id的join。
gbase> explain select a.*,b.age from a join b on a.id=b.id and b.age<40;
+----+-------------+-------------+--------+---------------+-----------------+
| ID | MOTION | OPERATION | TABLE | CONDITION | NO STAT Tab/Col |
+----+-------------+-------------+--------+---------------+-----------------+
| 01 | [RESULT] | INNER JOIN | | (id = id) | |
| | | Step | <00> | | |
| | | Table | a[DIS] | | |
| 00 | [BROADCAST] | SCAN | b[DIS] | (age{S} < 40) | b |
+----+-------------+-------------+--------+---------------+-----------------+
4 rows in set (Elapsed: 00:00:00.01)
gbase> explain select a.*,b.age from a join b on a.id=b.id where b.age<40;
+----+-------------+-------------+--------+---------------+-----------------+
| ID | MOTION | OPERATION | TABLE | CONDITION | NO STAT Tab/Col |
+----+-------------+-------------+--------+---------------+-----------------+
| 01 | [RESULT] | INNER JOIN | | (id = id) | |
| | | Step | <00> | | |
| | | Table | a[DIS] | | |
| 00 | [BROADCAST] | SCAN | b[DIS] | (age{S} < 40) | b |
+----+-------------+-------------+--------+---------------+-----------------+
4 rows in set (Elapsed: 00:00:00.01)
难道where和on是完全一样的吗?
我们前面做的是inner join, 我们来看看 outer join的情况。业务SQL改造成 a left join b。
select a.*,b.age from a left join b on a.id=b.id and b.age<40;
select a.*,b.age from a left join b on a.id=b.id where b.age<40;
再次运行看结果。
gbase> select a.* from a left join b on a.id=b.id and b.age<40;
+------+------+
| id | bill |
gbase> select a.*,b.age from a left join b on a.id=b.id and b.age<40;
+------+------+------+
| id | bill | age |
+------+------+------+
| 1 | 100 | 30 |
| 1 | 101 | 30 |
| 2 | 200 | NULL |
| 2 | 202 | NULL |
+------+------+------+
4 rows in set (Elapsed: 00:00:00.06)
注意到on条件的写法,返回了更多的数据,包括部分age为null的。
gbase> select a.*,b.age from a left join b on a.id=b.id where b.age<40;
+------+------+------+
| id | bill | age |
+------+------+------+
| 1 | 100 | 30 |
| 1 | 101 | 30 |
+------+------+------+
2 rows in set (Elapsed: 00:00:00.07)
写在where里的,和最开始的结果是一样的。
查看执行计划,也是不一样的。可以看到where方式,会转化成inner join进行。而on部分的,则是left join。
gbase> explain select a.*,b.age from a left join b on a.id=b.id and b.age<40;
+----+-------------+------------+--------+---------------+-----------------+
| ID | MOTION | OPERATION | TABLE | CONDITION | NO STAT Tab/Col |
+----+-------------+------------+--------+---------------+-----------------+
| 01 | [RESULT] | LEFT JOIN | | (id = id) | |
| | | Table | a[DIS] | | |
| | | Step | <00> | | |
| 00 | [BROADCAST] | SCAN | b[DIS] | (age{S} < 40) | a |
+----+-------------+------------+--------+---------------+-----------------+
4 rows in set (Elapsed: 00:00:00.03)
gbase> explain select a.*,b.age from a left join b on a.id=b.id where b.age<40;
+----+-------------+-------------+--------+---------------+-----------------+
| ID | MOTION | OPERATION | TABLE | CONDITION | NO STAT Tab/Col |
+----+-------------+-------------+--------+---------------+-----------------+
| 01 | [RESULT] | INNER JOIN | | (id = id) | |
| | | Step | <00> | | |
| | | Table | a[DIS] | | |
| 00 | [BROADCAST] | SCAN | b[DIS] | (age{S} < 40) | b |
+----+-------------+-------------+--------+---------------+-----------------+
4 rows in set (Elapsed: 00:00:00.01)
结果不一样的原因是什么?
区别在于:
inner join 是订单必须是40岁一下的人完成。如果该订单没有找到对应的人,或者对应的人年龄不小于40岁,则不符合要求。
left join 是以订单a表为准,去查询符合条件(小于40岁)的人员的信息,如果查不到或者不符合要求就返回null。
结论
对于inner join, 过滤条件写在on里和where里没有区别。
对于outer join, 过滤条件写在on里和where里是有区别的。
实际业务,需要谨慎的根据需求,确认条件的位置,避免结果与预期不符。