在Clickhouse里,每次数据装在都会生成一个part, 内部数据是按照列存储的,每个列一组文件。当数据量很少时,会浪费很多的磁盘IO次数。新版本Clickhouse里,在建表时可以通过min_bytes_for_wide_part和min_rows_for_wide_part两个参数,来设置一个阈值。当参数低于阈值时,会将part数据合并存储来减少IO次数。
目录导航
参数
min_bytes_for_wide_part
数据从合并存储(Compact)转成按列存储(Wide)的最小文件大小。
min_rows_for_wide_part
数据从合并存储(Compact)转成按列存储(Wide)的最小行数。
样例
建表
设置参数大小为10K。
localhost :) create table t3_2(id int, name String, birth date)engine=MergeTree order by id settings min_bytes_for_wide_part=10240;
CREATE TABLE t3_2
(
`id` int,
`name` String,
`birth` date
)
ENGINE = MergeTree
ORDER BY id
SETTINGS min_bytes_for_wide_part = 10240
Query id: 58aeb53d-c8e5-44f3-9ceb-9137216c366d
Ok.
0 rows in set. Elapsed: 0.048 sec.
插入一行数据
localhost :) insert into t3_2 values(1,'111','2011-01-02');
INSERT INTO t3_2 VALUES
Query id: 36ec247b-0033-4c07-881f-319b88132681
Ok.
1 rows in set. Elapsed: 0.005 sec.
查看part数据
可以看到part数据是合并在一起的。
[root@localhost t3_2]# ll *
-rw-r-----. 1 clickhouse clickhouse 1 May 17 16:28 format_version.txt
all_1_1_0:
total 28
-rw-r-----. 1 clickhouse clickhouse 185 May 17 16:29 checksums.txt
-rw-r-----. 1 clickhouse clickhouse 75 May 17 16:29 columns.txt
-rw-r-----. 1 clickhouse clickhouse 1 May 17 16:29 count.txt
-rw-r-----. 1 clickhouse clickhouse 88 May 17 16:29 data.bin
-rw-r-----. 1 clickhouse clickhouse 112 May 17 16:29 data.mrk3
-rw-r-----. 1 clickhouse clickhouse 10 May 17 16:29 default_compression_codec.txt
-rw-r-----. 1 clickhouse clickhouse 8 May 17 16:29 primary.idx
detached:
total 0
再次插入100行数据
localhost :) insert into t3_2 select * from t1 limit 100;
INSERT INTO t3_2 SELECT *
FROM t1
LIMIT 100
Query id: a22e737b-0ff2-4463-9682-aec04d675ce8
Ok.
0 rows in set. Elapsed: 0.007 sec. Processed 8.19 thousand rows, 193.98 KB (1.21 million rows/s., 28.59 MB/s.)
查看100行数据的part
可以看到还是没有超过。
[root@localhost t3_2]# ll *
-rw-r-----. 1 clickhouse clickhouse 1 May 17 16:28 format_version.txt
all_1_1_0:
total 28
-rw-r-----. 1 clickhouse clickhouse 185 May 17 16:29 checksums.txt
-rw-r-----. 1 clickhouse clickhouse 75 May 17 16:29 columns.txt
-rw-r-----. 1 clickhouse clickhouse 1 May 17 16:29 count.txt
-rw-r-----. 1 clickhouse clickhouse 88 May 17 16:29 data.bin
-rw-r-----. 1 clickhouse clickhouse 112 May 17 16:29 data.mrk3
-rw-r-----. 1 clickhouse clickhouse 10 May 17 16:29 default_compression_codec.txt
-rw-r-----. 1 clickhouse clickhouse 8 May 17 16:29 primary.idx
all_2_2_0:
total 28
-rw-r-----. 1 clickhouse clickhouse 187 May 17 16:30 checksums.txt
-rw-r-----. 1 clickhouse clickhouse 75 May 17 16:30 columns.txt
-rw-r-----. 1 clickhouse clickhouse 3 May 17 16:30 count.txt
-rw-r-----. 1 clickhouse clickhouse 303 May 17 16:30 data.bin
-rw-r-----. 1 clickhouse clickhouse 112 May 17 16:30 data.mrk3
-rw-r-----. 1 clickhouse clickhouse 10 May 17 16:30 default_compression_codec.txt
-rw-r-----. 1 clickhouse clickhouse 8 May 17 16:30 primary.idx
detached:
total 0
插入1000行数据
localhost :) insert into t3_2 select * from t1 limit 1000;
INSERT INTO t3_2 SELECT *
FROM t1
LIMIT 1000
Query id: bf969c32-3b51-4686-a1e4-ba53c3412179
Ok.
0 rows in set. Elapsed: 0.006 sec. Processed 8.19 thousand rows, 193.98 KB (1.31 million rows/s., 31.02 MB/s.)
localhost :)
查看1000行数据part
可以看到第三个part:all_3_3_0已经是按列存储了。每个列一组文件。
[root@localhost t3_2]# ll *
-rw-r-----. 1 clickhouse clickhouse 1 May 17 16:28 format_version.txt
all_1_1_0:
total 28
-rw-r-----. 1 clickhouse clickhouse 185 May 17 16:29 checksums.txt
-rw-r-----. 1 clickhouse clickhouse 75 May 17 16:29 columns.txt
-rw-r-----. 1 clickhouse clickhouse 1 May 17 16:29 count.txt
-rw-r-----. 1 clickhouse clickhouse 88 May 17 16:29 data.bin
-rw-r-----. 1 clickhouse clickhouse 112 May 17 16:29 data.mrk3
-rw-r-----. 1 clickhouse clickhouse 10 May 17 16:29 default_compression_codec.txt
-rw-r-----. 1 clickhouse clickhouse 8 May 17 16:29 primary.idx
all_2_2_0:
total 28
-rw-r-----. 1 clickhouse clickhouse 187 May 17 16:30 checksums.txt
-rw-r-----. 1 clickhouse clickhouse 75 May 17 16:30 columns.txt
-rw-r-----. 1 clickhouse clickhouse 3 May 17 16:30 count.txt
-rw-r-----. 1 clickhouse clickhouse 303 May 17 16:30 data.bin
-rw-r-----. 1 clickhouse clickhouse 112 May 17 16:30 data.mrk3
-rw-r-----. 1 clickhouse clickhouse 10 May 17 16:30 default_compression_codec.txt
-rw-r-----. 1 clickhouse clickhouse 8 May 17 16:30 primary.idx
all_3_3_0:
total 44
-rw-r-----. 1 clickhouse clickhouse 76 May 17 16:30 birth.bin
-rw-r-----. 1 clickhouse clickhouse 48 May 17 16:30 birth.mrk2
-rw-r-----. 1 clickhouse clickhouse 326 May 17 16:30 checksums.txt
-rw-r-----. 1 clickhouse clickhouse 75 May 17 16:30 columns.txt
-rw-r-----. 1 clickhouse clickhouse 4 May 17 16:30 count.txt
-rw-r-----. 1 clickhouse clickhouse 10 May 17 16:30 default_compression_codec.txt
-rw-r-----. 1 clickhouse clickhouse 483 May 17 16:30 id.bin
-rw-r-----. 1 clickhouse clickhouse 48 May 17 16:30 id.mrk2
-rw-r-----. 1 clickhouse clickhouse 921 May 17 16:30 name.bin
-rw-r-----. 1 clickhouse clickhouse 48 May 17 16:30 name.mrk2
-rw-r-----. 1 clickhouse clickhouse 8 May 17 16:30 primary.idx
detached:
total 0