本文用于验证,相同的数字,分别存在在tinyint, smallint ,int 和 bigint 类型时,GBase 8a数据库集群在存储层的磁盘占用是否有实质性差异。结论是占用磁盘一样无差别。
提示:默认GBase8a数据库集群是开启压缩的(5,5)。
目录导航
测试环境
2个节点的虚拟机
[gbase@localhost test_int_n1]$ gcadmin
CLUSTER STATE: ACTIVE
CLUSTER MODE: NORMAL
=================================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
=================================================================
| NodeName | IpAddress |gcware |gcluster |DataState |
-----------------------------------------------------------------
| coordinator1 | 10.0.2.107 | OPEN | OPEN | 0 |
-----------------------------------------------------------------
| coordinator2 | 10.0.2.106 | OPEN | OPEN | 0 |
-----------------------------------------------------------------
=============================================================
| GBASE DATA CLUSTER INFORMATION |
=============================================================
|NodeName | IpAddress |gnode |syncserver |DataState |
-------------------------------------------------------------
| node1 | 10.0.2.107 | OPEN | OPEN | 0 |
-------------------------------------------------------------
| node2 | 10.0.2.106 | OPEN | OPEN | 0 |
-------------------------------------------------------------
测试表
包含tinyint, smallint, int 和bigint四个字段。
gbase> desc test_int;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id | tinyint(4) | YES | | NULL | |
| id2 | smallint(6) | YES | | NULL | |
| id3 | int(11) | YES | | NULL | |
| id4 | bigint(20) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
4 rows in set (Elapsed: 00:00:00.00)
测试数据
通过如下shell产生,每行相同的数字输出4次
[gbase@localhost ~]$ cat gen_int.sh
for((i=1;i<=10000000;i++))
do
declare -i number=$RANDOM*100/32767;
echo ${number},${number},${number},${number}
done
[gbase@localhost ~]$ sh gen_int.sh > 1000w.dat
第一次加载结果
可以看到,4个列数据部分占用的磁盘空间完全一样。
gbase> load data infile 'file://10.0.2.107/home/gbase/1000w.dat' into table test_int fields terminated by ',' ;
Query OK, 10000000 rows affected (Elapsed: 00:00:24.84)
Task 1835156 finished, Loaded 10000000 records, Skipped 0 records
[gbase@localhost test_int_n1]$ ll
total 32512
-rw------- 1 gbase gbase 5002613 Sep 4 21:41 C00000.seg.1
-rw------- 1 gbase gbase 5002613 Sep 4 21:41 C00001.seg.1
-rw------- 1 gbase gbase 5002613 Sep 4 21:41 C00002.seg.1
-rw------- 1 gbase gbase 5002613 Sep 4 21:41 C00003.seg.1
[gbase@localhost test_int_n1]$
[gbase@localhost test_int_n1]$ ll ../../metadata/test_int_n1.GED
total 80
-rw------- 1 gbase gbase 129 Sep 4 21:41 C00000.ctl.A
-rw------- 1 gbase gbase 129 Sep 4 20:59 C00000.ctl.B
-rw------- 1 gbase gbase 4134 Sep 4 21:41 C00000.map
-rw------- 1 gbase gbase 130 Sep 4 21:41 C00001.ctl.A
-rw------- 1 gbase gbase 130 Sep 4 20:59 C00001.ctl.B
-rw------- 1 gbase gbase 4134 Sep 4 21:41 C00001.map
-rw------- 1 gbase gbase 130 Sep 4 21:41 C00002.ctl.A
-rw------- 1 gbase gbase 130 Sep 4 20:59 C00002.ctl.B
-rw------- 1 gbase gbase 4134 Sep 4 21:41 C00002.map
-rw------- 1 gbase gbase 130 Sep 4 21:41 C00003.ctl.A
-rw------- 1 gbase gbase 130 Sep 4 20:59 C00003.ctl.B
-rw------- 1 gbase gbase 4134 Sep 4 21:41 C00003.map
-rw------- 1 gbase gbase 249 Sep 4 21:41 table.des.A
-rw------- 1 gbase gbase 249 Sep 4 20:59 table.des.B
-rw------- 1 gbase gbase 52 Sep 4 21:41 table.state.A
-rw------- 1 gbase gbase 52 Sep 4 20:59 table.state.B
[gbase@localhost test_int_n1]$
第二次加载
结果一致,A/B版本切换,新数据文件也一致。
和第一次略有差距,是因为DC包 65536行。上一个版本的不足65536行的尾块要复制到下一个版本里。
gbase> load data infile 'file://10.0.2.107/home/gbase/1000w.dat' into table test_int fields terminated by ',' ;
Query OK, 10000000 rows affected (Elapsed: 00:00:25.39)
Task 1835157 finished, Loaded 10000000 records, Skipped 0 records
gbase> select count(*) from test_int;
+----------+
| count(*) |
+----------+
| 20000000 |
+----------+
1 row in set (Elapsed: 00:00:00.01)
gbase> select * from test_int limit 10;
+------+------+------+------+
| id | id2 | id3 | id4 |
+------+------+------+------+
| 52 | 52 | 52 | 52 |
| 74 | 74 | 74 | 74 |
| 51 | 51 | 51 | 51 |
| 60 | 60 | 60 | 60 |
| 22 | 22 | 22 | 22 |
| 38 | 38 | 38 | 38 |
| 23 | 23 | 23 | 23 |
| 21 | 21 | 21 | 21 |
| 5 | 5 | 5 | 5 |
| 17 | 17 | 17 | 17 |
+------+------+------+------+
10 rows in set (Elapsed: 00:00:00.01)
[gbase@localhost test_int_n1]$ ll
total 39184
-rw------- 1 gbase gbase 5021878 Sep 4 21:44 C00000.seg
-rw------- 1 gbase gbase 5002613 Sep 4 21:41 C00000.seg.1
-rw------- 1 gbase gbase 5021878 Sep 4 21:44 C00001.seg
-rw------- 1 gbase gbase 5002613 Sep 4 21:41 C00001.seg.1
-rw------- 1 gbase gbase 5021878 Sep 4 21:44 C00002.seg
-rw------- 1 gbase gbase 5002613 Sep 4 21:41 C00002.seg.1
-rw------- 1 gbase gbase 5021878 Sep 4 21:44 C00003.seg
-rw------- 1 gbase gbase 5002613 Sep 4 21:41 C00003.seg.1
[gbase@localhost test_int_n1]$
[gbase@localhost test_int_n1]$ ll ../../metadata/test_int_n1.GED
total 80
-rw------- 1 gbase gbase 129 Sep 4 21:41 C00000.ctl.A
-rw------- 1 gbase gbase 129 Sep 4 21:44 C00000.ctl.B
-rw------- 1 gbase gbase 8162 Sep 4 21:44 C00000.map
-rw------- 1 gbase gbase 130 Sep 4 21:41 C00001.ctl.A
-rw------- 1 gbase gbase 130 Sep 4 21:44 C00001.ctl.B
-rw------- 1 gbase gbase 8162 Sep 4 21:44 C00001.map
-rw------- 1 gbase gbase 130 Sep 4 21:41 C00002.ctl.A
-rw------- 1 gbase gbase 130 Sep 4 21:44 C00002.ctl.B
-rw------- 1 gbase gbase 8162 Sep 4 21:44 C00002.map
-rw------- 1 gbase gbase 130 Sep 4 21:41 C00003.ctl.A
-rw------- 1 gbase gbase 130 Sep 4 21:44 C00003.ctl.B
-rw------- 1 gbase gbase 8162 Sep 4 21:44 C00003.map
-rw------- 1 gbase gbase 0 Sep 4 21:44 lock
-rw------- 1 gbase gbase 249 Sep 4 21:41 table.des.A
-rw------- 1 gbase gbase 249 Sep 4 21:44 table.des.B
-rw------- 1 gbase gbase 52 Sep 4 21:41 table.state.A
-rw------- 1 gbase gbase 52 Sep 4 21:44 table.state.B
[gbase@localhost test_int_n1]$
第三次加载
一致
gbase> load data infile 'file://10.0.2.107/home/gbase/1000w.dat' into table test_int fields terminated by ',' ;
Query OK, 10000000 rows affected (Elapsed: 00:00:24.95)
Task 1835158 finished, Loaded 10000000 records, Skipped 0 records
gbase> select count(*) from test_int;
+----------+
| count(*) |
+----------+
| 30000000 |
+----------+
1 row in set (Elapsed: 00:00:00.01)
[gbase@localhost test_int_n1]$ ll
total 71856
-rw------- 1 gbase gbase 5021878 Sep 4 21:44 C00000.seg
-rw------- 1 gbase gbase 10024464 Sep 4 21:46 C00000.seg.1
-rw------- 1 gbase gbase 5021878 Sep 4 21:44 C00001.seg
-rw------- 1 gbase gbase 10024464 Sep 4 21:46 C00001.seg.1
-rw------- 1 gbase gbase 5021878 Sep 4 21:44 C00002.seg
-rw------- 1 gbase gbase 10024464 Sep 4 21:46 C00002.seg.1
-rw------- 1 gbase gbase 5021878 Sep 4 21:44 C00003.seg
-rw------- 1 gbase gbase 10024464 Sep 4 21:46 C00003.seg.1
[gbase@localhost test_int_n1]$
[gbase@localhost test_int_n1]$ ll ../../metadata/test_int_n1.GED
total 96
-rw------- 1 gbase gbase 129 Sep 4 21:46 C00000.ctl.A
-rw------- 1 gbase gbase 129 Sep 4 21:44 C00000.ctl.B
-rw------- 1 gbase gbase 12190 Sep 4 21:46 C00000.map
-rw------- 1 gbase gbase 130 Sep 4 21:46 C00001.ctl.A
-rw------- 1 gbase gbase 130 Sep 4 21:44 C00001.ctl.B
-rw------- 1 gbase gbase 12190 Sep 4 21:46 C00001.map
-rw------- 1 gbase gbase 130 Sep 4 21:46 C00002.ctl.A
-rw------- 1 gbase gbase 130 Sep 4 21:44 C00002.ctl.B
-rw------- 1 gbase gbase 12190 Sep 4 21:46 C00002.map
-rw------- 1 gbase gbase 130 Sep 4 21:46 C00003.ctl.A
-rw------- 1 gbase gbase 130 Sep 4 21:44 C00003.ctl.B
-rw------- 1 gbase gbase 12190 Sep 4 21:46 C00003.map
-rw------- 1 gbase gbase 249 Sep 4 21:46 table.des.A
-rw------- 1 gbase gbase 249 Sep 4 21:44 table.des.B
-rw------- 1 gbase gbase 52 Sep 4 21:46 table.state.A
-rw------- 1 gbase gbase 52 Sep 4 21:44 table.state.B
[gbase@localhost test_int_n1]$
结论
GBase 8a数据库,在对数据压缩时,不考虑字段定义,所以相同数据占用的磁盘空间是一样的。