南大通用GBase 8a相同数据在不同数字字段类型下的磁盘占用情况验证

GBase 8a相同数字不同类型的存储区别

本文用于验证,相同的数字,分别存在在tinyint, smallint ,int 和 bigint 类型时,GBase 8a数据库集群在存储层的磁盘占用是否有实质性差异。结论是占用磁盘一样无差别。

提示:默认GBase8a数据库集群是开启压缩的(5,5)。

测试环境

2个节点的虚拟机

[gbase@localhost test_int_n1]$ gcadmin
CLUSTER STATE:  ACTIVE
CLUSTER MODE:   NORMAL

=================================================================
|             GBASE COORDINATOR CLUSTER INFORMATION             |
=================================================================
|   NodeName   |     IpAddress     |gcware |gcluster |DataState |
-----------------------------------------------------------------
| coordinator1 |    10.0.2.107     | OPEN  |  OPEN   |    0     |
-----------------------------------------------------------------
| coordinator2 |    10.0.2.106     | OPEN  |  OPEN   |    0     |
-----------------------------------------------------------------
=============================================================
|              GBASE DATA CLUSTER INFORMATION               |
=============================================================
|NodeName |     IpAddress     |gnode |syncserver |DataState |
-------------------------------------------------------------
|  node1  |    10.0.2.107     | OPEN |   OPEN    |    0     |
-------------------------------------------------------------
|  node2  |    10.0.2.106     | OPEN |   OPEN    |    0     |
-------------------------------------------------------------

测试表

包含tinyint, smallint, int 和bigint四个字段。

gbase> desc test_int;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id    | tinyint(4)  | YES  |     | NULL    |       |
| id2   | smallint(6) | YES  |     | NULL    |       |
| id3   | int(11)     | YES  |     | NULL    |       |
| id4   | bigint(20)  | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
4 rows in set (Elapsed: 00:00:00.00)

测试数据

通过如下shell产生,每行相同的数字输出4次

[gbase@localhost ~]$ cat gen_int.sh
for((i=1;i<=10000000;i++))
do
  declare -i number=$RANDOM*100/32767;
  echo ${number},${number},${number},${number}
done
[gbase@localhost ~]$ sh gen_int.sh > 1000w.dat

第一次加载结果

可以看到,4个列数据部分占用的磁盘空间完全一样。

gbase> load data infile 'file://10.0.2.107/home/gbase/1000w.dat' into table test_int fields terminated by ',' ;
Query OK, 10000000 rows affected (Elapsed: 00:00:24.84)
Task 1835156 finished, Loaded 10000000 records, Skipped 0 records


[gbase@localhost test_int_n1]$ ll
total 32512
-rw------- 1 gbase gbase 5002613 Sep  4 21:41 C00000.seg.1
-rw------- 1 gbase gbase 5002613 Sep  4 21:41 C00001.seg.1
-rw------- 1 gbase gbase 5002613 Sep  4 21:41 C00002.seg.1
-rw------- 1 gbase gbase 5002613 Sep  4 21:41 C00003.seg.1
[gbase@localhost test_int_n1]$

[gbase@localhost test_int_n1]$ ll ../../metadata/test_int_n1.GED
total 80
-rw------- 1 gbase gbase  129 Sep  4 21:41 C00000.ctl.A
-rw------- 1 gbase gbase  129 Sep  4 20:59 C00000.ctl.B
-rw------- 1 gbase gbase 4134 Sep  4 21:41 C00000.map
-rw------- 1 gbase gbase  130 Sep  4 21:41 C00001.ctl.A
-rw------- 1 gbase gbase  130 Sep  4 20:59 C00001.ctl.B
-rw------- 1 gbase gbase 4134 Sep  4 21:41 C00001.map
-rw------- 1 gbase gbase  130 Sep  4 21:41 C00002.ctl.A
-rw------- 1 gbase gbase  130 Sep  4 20:59 C00002.ctl.B
-rw------- 1 gbase gbase 4134 Sep  4 21:41 C00002.map
-rw------- 1 gbase gbase  130 Sep  4 21:41 C00003.ctl.A
-rw------- 1 gbase gbase  130 Sep  4 20:59 C00003.ctl.B
-rw------- 1 gbase gbase 4134 Sep  4 21:41 C00003.map
-rw------- 1 gbase gbase  249 Sep  4 21:41 table.des.A
-rw------- 1 gbase gbase  249 Sep  4 20:59 table.des.B
-rw------- 1 gbase gbase   52 Sep  4 21:41 table.state.A
-rw------- 1 gbase gbase   52 Sep  4 20:59 table.state.B
[gbase@localhost test_int_n1]$

第二次加载

结果一致,A/B版本切换,新数据文件也一致。

和第一次略有差距,是因为DC包 65536行。上一个版本的不足65536行的尾块要复制到下一个版本里。

gbase> load data infile 'file://10.0.2.107/home/gbase/1000w.dat' into table test_int fields terminated by ',' ;
Query OK, 10000000 rows affected (Elapsed: 00:00:25.39)
Task 1835157 finished, Loaded 10000000 records, Skipped 0 records
gbase> select count(*) from test_int;
+----------+
| count(*) |
+----------+
| 20000000 |
+----------+
1 row in set (Elapsed: 00:00:00.01)

gbase> select * from test_int limit 10;
+------+------+------+------+
| id   | id2  | id3  | id4  |
+------+------+------+------+
|   52 |   52 |   52 |   52 |
|   74 |   74 |   74 |   74 |
|   51 |   51 |   51 |   51 |
|   60 |   60 |   60 |   60 |
|   22 |   22 |   22 |   22 |
|   38 |   38 |   38 |   38 |
|   23 |   23 |   23 |   23 |
|   21 |   21 |   21 |   21 |
|    5 |    5 |    5 |    5 |
|   17 |   17 |   17 |   17 |
+------+------+------+------+
10 rows in set (Elapsed: 00:00:00.01)


[gbase@localhost test_int_n1]$ ll
total 39184
-rw------- 1 gbase gbase 5021878 Sep  4 21:44 C00000.seg
-rw------- 1 gbase gbase 5002613 Sep  4 21:41 C00000.seg.1
-rw------- 1 gbase gbase 5021878 Sep  4 21:44 C00001.seg
-rw------- 1 gbase gbase 5002613 Sep  4 21:41 C00001.seg.1
-rw------- 1 gbase gbase 5021878 Sep  4 21:44 C00002.seg
-rw------- 1 gbase gbase 5002613 Sep  4 21:41 C00002.seg.1
-rw------- 1 gbase gbase 5021878 Sep  4 21:44 C00003.seg
-rw------- 1 gbase gbase 5002613 Sep  4 21:41 C00003.seg.1
[gbase@localhost test_int_n1]$
[gbase@localhost test_int_n1]$ ll ../../metadata/test_int_n1.GED
total 80
-rw------- 1 gbase gbase  129 Sep  4 21:41 C00000.ctl.A
-rw------- 1 gbase gbase  129 Sep  4 21:44 C00000.ctl.B
-rw------- 1 gbase gbase 8162 Sep  4 21:44 C00000.map
-rw------- 1 gbase gbase  130 Sep  4 21:41 C00001.ctl.A
-rw------- 1 gbase gbase  130 Sep  4 21:44 C00001.ctl.B
-rw------- 1 gbase gbase 8162 Sep  4 21:44 C00001.map
-rw------- 1 gbase gbase  130 Sep  4 21:41 C00002.ctl.A
-rw------- 1 gbase gbase  130 Sep  4 21:44 C00002.ctl.B
-rw------- 1 gbase gbase 8162 Sep  4 21:44 C00002.map
-rw------- 1 gbase gbase  130 Sep  4 21:41 C00003.ctl.A
-rw------- 1 gbase gbase  130 Sep  4 21:44 C00003.ctl.B
-rw------- 1 gbase gbase 8162 Sep  4 21:44 C00003.map
-rw------- 1 gbase gbase    0 Sep  4 21:44 lock
-rw------- 1 gbase gbase  249 Sep  4 21:41 table.des.A
-rw------- 1 gbase gbase  249 Sep  4 21:44 table.des.B
-rw------- 1 gbase gbase   52 Sep  4 21:41 table.state.A
-rw------- 1 gbase gbase   52 Sep  4 21:44 table.state.B
[gbase@localhost test_int_n1]$

第三次加载

一致

gbase> load data infile 'file://10.0.2.107/home/gbase/1000w.dat' into table test_int fields terminated by ',' ;
Query OK, 10000000 rows affected (Elapsed: 00:00:24.95)
Task 1835158 finished, Loaded 10000000 records, Skipped 0 records

gbase> select count(*) from test_int;
+----------+
| count(*) |
+----------+
| 30000000 |
+----------+
1 row in set (Elapsed: 00:00:00.01)



[gbase@localhost test_int_n1]$ ll
total 71856
-rw------- 1 gbase gbase  5021878 Sep  4 21:44 C00000.seg
-rw------- 1 gbase gbase 10024464 Sep  4 21:46 C00000.seg.1
-rw------- 1 gbase gbase  5021878 Sep  4 21:44 C00001.seg
-rw------- 1 gbase gbase 10024464 Sep  4 21:46 C00001.seg.1
-rw------- 1 gbase gbase  5021878 Sep  4 21:44 C00002.seg
-rw------- 1 gbase gbase 10024464 Sep  4 21:46 C00002.seg.1
-rw------- 1 gbase gbase  5021878 Sep  4 21:44 C00003.seg
-rw------- 1 gbase gbase 10024464 Sep  4 21:46 C00003.seg.1
[gbase@localhost test_int_n1]$
[gbase@localhost test_int_n1]$ ll ../../metadata/test_int_n1.GED
total 96
-rw------- 1 gbase gbase   129 Sep  4 21:46 C00000.ctl.A
-rw------- 1 gbase gbase   129 Sep  4 21:44 C00000.ctl.B
-rw------- 1 gbase gbase 12190 Sep  4 21:46 C00000.map
-rw------- 1 gbase gbase   130 Sep  4 21:46 C00001.ctl.A
-rw------- 1 gbase gbase   130 Sep  4 21:44 C00001.ctl.B
-rw------- 1 gbase gbase 12190 Sep  4 21:46 C00001.map
-rw------- 1 gbase gbase   130 Sep  4 21:46 C00002.ctl.A
-rw------- 1 gbase gbase   130 Sep  4 21:44 C00002.ctl.B
-rw------- 1 gbase gbase 12190 Sep  4 21:46 C00002.map
-rw------- 1 gbase gbase   130 Sep  4 21:46 C00003.ctl.A
-rw------- 1 gbase gbase   130 Sep  4 21:44 C00003.ctl.B
-rw------- 1 gbase gbase 12190 Sep  4 21:46 C00003.map
-rw------- 1 gbase gbase   249 Sep  4 21:46 table.des.A
-rw------- 1 gbase gbase   249 Sep  4 21:44 table.des.B
-rw------- 1 gbase gbase   52 Sep   4 21:46 table.state.A
-rw------- 1 gbase gbase   52 Sep   4 21:44 table.state.B
[gbase@localhost test_int_n1]$

结论

GBase 8a数据库,在对数据压缩时,不考虑字段定义,所以相同数据占用的磁盘空间是一样的。