Cliclhouse支持跳数索引(Data Skipping Indexes),本文从执行计划上查找跳数/稀疏索引对执行计划和性能的影响。
目录导航
环境
普通表
CREATE TABLE testdb.t1
(
`id` Int32,
`name` String,
`birth` Date
)
ENGINE = MergeTree
ORDER BY id
带跳数/稀疏索引的表
本例子中,用名字的长度,length(name),做索引。
CREATE TABLE testdb.tt1
(
`id` Int32,
`name` String,
`birth` Date,
INDEX a length(name) TYPE set(1000) GRANULARITY 4
)
ENGINE = MergeTree
ORDER BY id
数据量
都是6710万行(67108864)。从相同的表,insert select 复制过来的。
localhost :) select count() from t1;
SELECT count()
FROM t1
Query id: f84c9c2a-a93c-4eaa-b3cd-a5edd171be0b
┌──count()─┐
│ 67108864 │
└──────────┘
1 rows in set. Elapsed: 0.003 sec.
localhost :) select count() from tt1;
SELECT count()
FROM tt1
Query id: d82f42cd-56f0-4697-b2ea-89c743d75b78
┌──count()─┐
│ 67108864 │
└──────────┘
1 rows in set. Elapsed: 0.003 sec.
localhost :)
查看Clickhouse的执行计划
注意其中的数据读取位置,分别是从原始的存储(ReadFromStorage),和预处理的源(ReadFromPreparedSource)。
无索引表的执行计划
localhost :) explain select * from t1 where length(name)=10;
EXPLAIN
SELECT *
FROM t1
WHERE length(name) = 10
Query id: a9431920-a102-4e98-bb63-1bcf27d1cfe4
┌─explain───────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│ SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│ ReadFromStorage (MergeTree) │
└───────────────────────────────────────────────────────────────────────────┘
3 rows in set. Elapsed: 0.004 sec.
localhost :)
带跳数/稀疏索引的执行计划
localhost :) explain select * from tt1 where length(name)=10;
EXPLAIN
SELECT *
FROM tt1
WHERE length(name) = 10
Query id: cbe2695b-7f0a-4c59-b4f4-bd69eba72dab
┌─explain───────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│ SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│ ReadFromPreparedSource (Read from NullSource) │
└───────────────────────────────────────────────────────────────────────────┘
3 rows in set. Elapsed: 0.004 sec.
localhost :)
执行性能结果对比
命中数量少时,索引效果明显
6700万只命中1行,性能差距一个数量级以上。
localhost :) select count(*) from t1 where length(name)=8;
SELECT count(*)
FROM t1
WHERE length(name) = 8
Query id: 46c953f0-5b52-4162-a030-c628b0261e2e
┌─count()─┐
│ 1 │
└─────────┘
1 rows in set. Elapsed: 0.458 sec. Processed 67.11 million rows, 939.52 MB (146.47 million rows/s., 2.05 GB/s.)
localhost :) select count(*) from tt1 where length(name)=8;
SELECT count(*)
FROM tt1
WHERE length(name) = 8
Query id: ecbba9cc-a728-4194-b814-9ffe6dffdcc7
┌─count()─┐
│ 1 │
└─────────┘
1 rows in set. Elapsed: 0.013 sec.
localhost :)
命中数量少时执行计划
localhost :) explain select count(*) from t1 where length(name)=6;
EXPLAIN
SELECT count(*)
FROM t1
WHERE length(name) = 6
Query id: 3e40c992-cb35-427c-a666-cdec4497d430
┌─explain─────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│ Aggregating │
│ Expression (Before GROUP BY) │
│ Filter (WHERE) │
│ SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│ ReadFromStorage (MergeTree) │
└─────────────────────────────────────────────────────────────────────────────────┘
6 rows in set. Elapsed: 0.013 sec.
localhost :) explain select count(*) from tt1 where length(name)=6;
EXPLAIN
SELECT count(*)
FROM tt1
WHERE length(name) = 6
Query id: 3ef5e2fd-d210-4549-8cb4-b8db899f0a8d
┌─explain─────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│ Aggregating │
│ Expression (Before GROUP BY) │
│ Filter (WHERE) │
│ SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│ ReadFromPreparedSource (Read from NullSource) │
└─────────────────────────────────────────────────────────────────────────────────┘
6 rows in set. Elapsed: 0.014 sec.
localhost :)
命中大量结果,无实质影响
因为评估的结果,命中大量数据,不再走索引,后面的执行计划也能看出差距。
localhost :) select count(*) from t1 where length(name)=5;
SELECT count(*)
FROM t1
WHERE length(name) = 5
Query id: b72a577a-6c1b-4fa5-aaea-d65a7900b14e
┌──count()─┐
│ 67108864 │
└──────────┘
1 rows in set. Elapsed: 0.447 sec. Processed 67.11 million rows, 939.52 MB (150.24 million rows/s., 2.10 GB/s.)
localhost :) select count(*) from tt1 where length(name)=5;
SELECT count(*)
FROM tt1
WHERE length(name) = 5
Query id: 984daf4c-f674-417b-b251-e53b31ba01e8
┌──count()─┐
│ 67108864 │
└──────────┘
1 rows in set. Elapsed: 0.454 sec. Processed 67.11 million rows, 939.52 MB (147.71 million rows/s., 2.07 GB/s.)
localhost :)
命中大量结果执行计划
完全一样的执行计划,都是从存储读取了。
localhost :) explain select count(*) from t1 where length(name)=5;
EXPLAIN
SELECT count(*)
FROM t1
WHERE length(name) = 5
Query id: c5ce5b52-bc5b-4315-8546-48bb74eb510c
┌─explain─────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│ Aggregating │
│ Expression (Before GROUP BY) │
│ Filter (WHERE) │
│ SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│ ReadFromStorage (MergeTree) │
└─────────────────────────────────────────────────────────────────────────────────┘
6 rows in set. Elapsed: 0.004 sec.
localhost :) explain select count(*) from tt1 where length(name)=5;
EXPLAIN
SELECT count(*)
FROM tt1
WHERE length(name) = 5
Query id: 9d1c9b43-344e-4e8b-91a1-2af55e417090
┌─explain─────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY)) │
│ Aggregating │
│ Expression (Before GROUP BY) │
│ Filter (WHERE) │
│ SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│ ReadFromStorage (MergeTree) │
└─────────────────────────────────────────────────────────────────────────────────┘
6 rows in set. Elapsed: 0.011 sec.