本文是一道网上的面试题目,查询某位名字叫“张三"的学生,有哪些课程没有选修。要考虑到姓名重复的可能性。
目录导航
背景
我们构造了4个学生,2门课程,其中学生张三,只选修了1门课程。
学生表(student)
学生表S(SNO,SNAME,AGE,SEX),其属性分别表示学号、学生姓名、年龄、性别;
gbase> create table student(sno varchar(20),sname varchar(20),age int, sex int);
Query OK, 0 rows affected (Elapsed: 00:00:00.12)
gbase> insert into student values ('20210001','张三','18',1);
Query OK, 1 row affected (Elapsed: 00:00:00.07)
gbase> insert into student values ('20210002','李四','19',1);
Query OK, 1 row affected (Elapsed: 00:00:00.05)
gbase> insert into student values ('20210003','王五','18',0);
Query OK, 1 row affected (Elapsed: 00:00:00.06)
gbase> insert into student values ('20210004','赵六','18',0);
Query OK, 1 row affected (Elapsed: 00:00:00.06)
gbase> select * from student;
+----------+--------+------+------+
| sno | sname | age | sex |
+----------+--------+------+------+
| 20210001 | 张三 | 18 | 1 |
| 20210002 | 李四 | 19 | 1 |
| 20210003 | 王五 | 18 | 0 |
| 20210004 | 赵六 | 18 | 0 |
+----------+--------+------+------+
4 rows in set (Elapsed: 00:00:00.01)
课程表(course)
课程表C(CNO,CNAME,TEACHER),其属性分别表示课程号、课程名、上课教师名;
gbase> create table course(cno varchar(10),cname varchar(20),teacher varchar(20));
Query OK, 0 rows affected (Elapsed: 00:00:00.10)
gbase> insert into course values('990001','大数','张老师');
Query OK, 1 row affected (Elapsed: 00:00:00.07)
gbase> insert into course values('990002','大英','李老师');
Query OK, 1 row affected (Elapsed: 00:00:00.04)
gbase> select * from course;
+--------+--------+-----------+
| cno | cname | teacher |
+--------+--------+-----------+
| 990001 | 大数 | 张老师 |
| 990002 | 大英 | 李老师 |
+--------+--------+-----------+
2 rows in set (Elapsed: 00:00:00.01)
选修表(student_course)
选修表SC(SNO,CNO,GRADE),其属性分别表示学号、课程号、成绩。
gbase> create table student_course(sno varchar(20),cno varchar(10),grade int);
Query OK, 0 rows affected (Elapsed: 00:00:00.11)
gbase> insert into student_course values ('20210001','990001',1);
Query OK, 1 row affected (Elapsed: 00:00:00.07)
gbase> insert into student_course values ('20210002','990001',1);
Query OK, 1 row affected (Elapsed: 00:00:00.05)
gbase> insert into student_course values ('20210003','990001',1);
Query OK, 1 row affected (Elapsed: 00:00:00.05)
gbase> insert into student_course values ('20210004','990001',1);
Query OK, 1 row affected (Elapsed: 00:00:00.07)
gbase> insert into student_course values ('20210002','990002',1);
Query OK, 1 row affected (Elapsed: 00:00:00.08)
gbase> insert into student_course values ('20210003','990002',1);
Query OK, 1 row affected (Elapsed: 00:00:00.04)
gbase> insert into student_course values ('20210004','990002',1);
Query OK, 1 row affected (Elapsed: 00:00:00.05)
gbase> select * from student_course;
+----------+--------+-------+
| sno | cno | grade |
+----------+--------+-------+
| 20210001 | 990001 | 1 |
| 20210002 | 990001 | 1 |
| 20210003 | 990001 | 1 |
| 20210004 | 990001 | 1 |
| 20210002 | 990002 | 1 |
| 20210003 | 990002 | 1 |
| 20210004 | 990002 | 1 |
+----------+--------+-------+
7 rows in set (Elapsed: 00:00:00.01)
查询张三同学没有选修的课程的课程号
分析
张三没有选修 = 所有课程 - 张三已经选修的。
SQL实现
张三选修的课程
通过名字,从学生表(student),可以查询到其学生编号,再从学生编号,从选修课程表里,获得其已经选修的课程列表。本处使用了一个子查询。
gbase> select * from student_course where sno=(select sno from student where sname='张三');
+----------+--------+-------+
| sno | cno | grade |
+----------+--------+-------+
| 20210001 | 990001 | 1 |
+----------+--------+-------+
1 row in set (Elapsed: 00:00:00.05)
张三没有选修的课程
通过课程表,查询 cno not in 已经选修的课程信息,拿到没有选修的课程。
gbase> select * from course where cno not in (select cno from student_course where sno=(select sno from student where sname='张三'));
+--------+--------+-----------+
| cno | cname | teacher |
+--------+--------+-----------+
| 990002 | 大英 | 李老师 |
+--------+--------+-----------+
1 row in set (Elapsed: 00:00:00.05)
gbase> select cno from course where cno not in (select cno from student_course where sno=(select sno from student where sname='张三'));
+--------+
| cno |
+--------+
| 990002 |
+--------+
1 row in set (Elapsed: 00:00:00.04)
如果张三存在重名呢?
数据变动
我们增加了一个学生,也叫张三,其cno=20210099, 同时该学生和前一个张三不同:选修了第一个课程,而没有选择第二个课程。
gbase> insert into student values ('20210099','张三','18',0);
Query OK, 1 row affected (Elapsed: 00:00:00.07)
gbase> insert into student_course values ('20210099','990002',1);
Query OK, 1 row affected (Elapsed: 00:00:00.05)
gbase> select * from student;
+----------+--------+------+------+
| sno | sname | age | sex |
+----------+--------+------+------+
| 20210001 | 张三 | 18 | 1 |
| 20210002 | 李四 | 19 | 1 |
| 20210003 | 王五 | 18 | 0 |
| 20210004 | 赵六 | 18 | 0 |
| 20210099 | 张三 | 18 | 0 |
+----------+--------+------+------+
5 rows in set (Elapsed: 00:00:00.00)
gbase> select * from course;
+--------+--------+-----------+
| cno | cname | teacher |
+--------+--------+-----------+
| 990001 | 大数 | 张老师 |
| 990002 | 大英 | 李老师 |
+--------+--------+-----------+
2 rows in set (Elapsed: 00:00:00.01)
gbase> select * from student_course;
+----------+--------+-------+
| sno | cno | grade |
+----------+--------+-------+
| 20210001 | 990001 | 1 |
| 20210002 | 990001 | 1 |
| 20210003 | 990001 | 1 |
| 20210004 | 990001 | 1 |
| 20210002 | 990002 | 1 |
| 20210003 | 990002 | 1 |
| 20210004 | 990002 | 1 |
| 20210099 | 990002 | 1 |
+----------+--------+-------+
8 rows in set (Elapsed: 00:00:00.01)
直接查询报错
由于叫张三的有多个,所以前一个实现SQL里面的cno=子查询会因为1对多,而报错。在GBase 8a数据库报错样例如下
gbase> select cno from course where cno not in (select cno from student_course where sno=(select sno from student where sname='张三'));
ERROR 1708 (HY000): [10.0.2.101:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-0002) Subquery return more than 1 row
SQL: SELECT /
等于改用in,结果错误
由于2人分别选择了不同的选修,导致结果为空。
gbase> select cno from course where cno not in (select cno from student_course where sno in(select sno from student where sname='张三'));
Empty set (Elapsed: 00:00:00.04)
改进方案
虽然都叫张三,但其学号肯定不同,所以最终结果,必然是根据学号,而不是根据姓名来区分。
拿到张三能选修的课程列表
通过笛卡尔,无条件的join,得到张三能选修的所有课程对照表。其中包含了所有叫张三的人。
gbase> select * from student s,course c where s.sname='张三';
+----------+--------+------+------+--------+--------+-----------+
| sno | sname | age | sex | cno | cname | teacher |
+----------+--------+------+------+--------+--------+-----------+
| 20210001 | 张三 | 18 | 1 | 990001 | 大数 | 张老师 |
| 20210001 | 张三 | 18 | 1 | 990002 | 大英 | 李老师 |
| 20210099 | 张三 | 18 | 0 | 990001 | 大数 | 张老师 |
| 20210099 | 张三 | 18 | 0 | 990002 | 大英 | 李老师 |
+----------+--------+------+------+--------+--------+-----------+
关联课程表
左关联课程表,可以发现有没有关联上的。
gbase> select * from (select s.sno,c.cno from student s,course c where s.sname='张三')t left join student_course sc on sc.sno=t.sno and sc.cno=t.cno;
+----------+--------+----------+--------+-------+
| sno | cno | sno | cno | grade |
+----------+--------+----------+--------+-------+
| 20210001 | 990001 | 20210001 | 990001 | 1 |
| 20210099 | 990002 | 20210099 | 990002 | 1 |
| 20210001 | 990002 | NULL | NULL | NULL |
| 20210099 | 990001 | NULL | NULL | NULL |
+----------+--------+----------+--------+-------+
4 rows in set (Elapsed: 00:00:00.04)
张三没有选修的课程
通过过滤课程表里null的数据,就是没有选修的课程了。
gbase> select t.sno,t.cno from (select s.sno,c.cno from student s,course c where s.sname='张三')t left join student_course sc on sc.sno=t.sno and sc.cno=t.cno where sc.sno is null;
+----------+--------+
| sno | cno |
+----------+--------+
| 20210001 | 990002 |
| 20210099 | 990001 |
+----------+--------+
2 rows in set (Elapsed: 00:00:00.06)