本文介绍GBase 8a数据库集群内部,对没一行数据的行号rowid的方案。如Oracle等数据库,提供了rowid来唯一标识一行数据,在GBase里也提供了rowid,但因为是MPP,所以在随机分布表里,每个节点是各自独立的,只有复制表才能保证一致性。
目录导航
测试环境
2节点集群
[gbase@localhost ~]$ gcadmin
CLUSTER STATE: ACTIVE
CLUSTER MODE: NORMAL
=================================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
=================================================================
| NodeName | IpAddress |gcware |gcluster |DataState |
-----------------------------------------------------------------
| coordinator1 | 10.0.2.107 | OPEN | OPEN | 0 |
-----------------------------------------------------------------
| coordinator2 | 10.0.2.106 | OPEN | OPEN | 0 |
-----------------------------------------------------------------
=============================================================
| GBASE DATA CLUSTER INFORMATION |
=============================================================
|NodeName | IpAddress |gnode |syncserver |DataState |
-------------------------------------------------------------
| node1 | 10.0.2.107 | OPEN | OPEN | 0 |
-------------------------------------------------------------
| node2 | 10.0.2.106 | OPEN | OPEN | 0 |
-------------------------------------------------------------
测试hash分布表的行号rowid
随机表的rowid, 会出现重复的,因为每个数据节点都是从0开始的rowid。
gbase> create table test_rowid(id int, value varchar(100)) distributed by('id');
Query OK, 0 rows affected (Elapsed: 00:00:00.26)
gbase> insert into test_rowid values(1,'111'),(2,'2222'),(3,'3333');
Query OK, 3 rows affected (Elapsed: 00:00:00.16)
Records: 3 Duplicates: 0 Warnings: 0
gbase> insert into test_rowid values(4,'444'),(5,'555'),(6,'666');
Query OK, 3 rows affected (Elapsed: 00:00:00.21)
Records: 3 Duplicates: 0 Warnings: 0
gbase> insert into test_rowid values(7,'777'),(8,'888'),(9,'999');
Query OK, 3 rows affected (Elapsed: 00:00:00.13)
Records: 3 Duplicates: 0 Warnings: 0
gbase> select rowid,t.* from test_rowid t;
+-------+------+-------+
| rowid | id | value |
+-------+------+-------+
| 0 | 1 | 111 |
| 1 | 2 | 2222 |
| 2 | 3 | 3333 |
| 3 | 4 | 444 |
| 4 | 5 | 555 |
| 0 | 6 | 666 |
| 1 | 7 | 777 |
| 2 | 8 | 888 |
| 3 | 9 | 999 |
+-------+------+-------+
9 rows in set (Elapsed: 00:00:00.00)
gbase> ^CAborted
测试随机分布表行号rowid
依然会出现重复的行号rowid
[gbase@localhost ~]$ cat test_rowid.txt
11,1111111
22,22222222
33,333333333
44,44444444
55,555555555
66,666666666
77,7777777777
88,8888888888
[gbase@localhost ~]$
gbase> create table test_rowid_2(id int, value varchar(100));
Query OK, 0 rows affected (Elapsed: 00:00:00.27)
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.107//home/gbase/test_rowid.txt' into table test_rowid_2 fields terminated by ',';
Query OK, 8 rows affected (Elapsed: 00:00:00.98)
Task 1835164 finished, Loaded 8 records, Skipped 0 records
gbase> select rowid,t.* from test_rowid_2 t;
+-------+------+------------+
| rowid | id | value |
+-------+------+------------+
| 0 | 11 | 1111111 |
| 1 | 33 | 333333333 |
| 2 | 55 | 555555555 |
| 3 | 77 | 7777777777 |
| 0 | 22 | 22222222 |
| 1 | 44 | 44444444 |
| 2 | 66 | 666666666 |
| 3 | 88 | 8888888888 |
+-------+------+------------+
8 rows in set (Elapsed: 00:00:00.01)
测试复制表行号rowid
因为复制表,每个节点数据都一样,所以行号rowid是唯一的。
gbase> create table test_rowid_r(id int, value varchar(100)) replicated;
Query OK, 0 rows affected (Elapsed: 00:00:00.17)
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.107//home/gbase/test_rowid.txt' into table test_rowid_r fields terminated by ',';
Query OK, 8 rows affected (Elapsed: 00:00:00.90)
Task 1835166 finished, Loaded 8 records, Skipped 0 records
gbase> select rowid,t.* from test_rowid_r t;
+-------+------+------------+
| rowid | id | value |
+-------+------+------------+
| 0 | 11 | 1111111 |
| 1 | 22 | 22222222 |
| 2 | 33 | 333333333 |
| 3 | 44 | 44444444 |
| 4 | 55 | 555555555 |
| 5 | 66 | 666666666 |
| 6 | 77 | 7777777777 |
| 7 | 88 | 8888888888 |
+-------+------+------------+
8 rows in set (Elapsed: 00:00:00.01)
结论
在当前已经发行的版本,由于行号rowid是依赖数据节点本身的,所以集群层并没有统一的一个行号,所以如果需要行号,可以用复制表。
从业务上,还是希望用户自行管理类似功能,比如新的V9版本支持了自增列,不要依赖行号。