GBase 8a 数据库集群,将数据按照用户指定的分布格式切分,保存在多台数据计算节点上。本文介绍在新版9.5.3版本里,如果通过segment_id直接查询某些计算节点的数据。
目录导航
参考
限制
从V9.5.3版本才支持,该版本也同时支持多分片部署,可以参考【多实例有关文章】
背景
GBase 8a数据库集群的每个数据计算节点,都是一台数据库单机,负责本地计算和存储。老版本集群,可以通过直接连接计算节点(比如通过命令行工具gncli, -h参数指定IP)访问每个节点的分片数据,在V953版本里,引入了segment_id功能,可以直接访问对应的节点分片,简化了操作。
当然,如果该功能只用于方位当前服务中的分片,如为了查询备份分片,还是需要用命令行工具指定IP进行。
参数
数据库参数
gcluster_segment_id_replace
0:不启用segment_id功能,默认值
1:开启segnemtn_id功能
SQL语法
where segment_id=分片号
其中分片号,从1开始,详细可以看gcadmin showdistribution的输出
[gbase@gbase_rh7_003 ~]$ gcadmin showdistribution
Distribution ID: 2 | State: new | Total segment num: 4
Primary Segment Node IP Segment ID Duplicate Segment node IP
========================================================================================================================
| 10.0.2.103 | 1 | 10.0.2.104 |
------------------------------------------------------------------------------------------------------------------------
| 10.0.2.104 | 2 | 10.0.2.105 |
------------------------------------------------------------------------------------------------------------------------
| 10.0.2.105 | 3 | 10.0.2.106 |
------------------------------------------------------------------------------------------------------------------------
| 10.0.2.106 | 4 | 10.0.2.103 |
========================================================================================================================
样例
环境
4分片集群。
[gbase@gbase_rh7_003 ~]$ gcadmin
CLUSTER STATE: ACTIVE
VIRTUAL CLUSTER MODE: NORMAL
====================================
| GBASE GCWARE CLUSTER INFORMATION |
====================================
| NodeName | IpAddress | gcware |
------------------------------------
| gcware1 | 10.0.2.103 | OPEN |
------------------------------------
====================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
====================================================
| NodeName | IpAddress | gcluster | DataState |
----------------------------------------------------
| coordinator1 | 10.0.2.103 | OPEN | 0 |
----------------------------------------------------
=========================================================================================================
| GBASE DATA CLUSTER INFORMATION |
=========================================================================================================
| NodeName | IpAddress | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
| node1 | 10.0.2.103 | 2 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node2 | 10.0.2.104 | 2 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node3 | 10.0.2.105 | 2 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node4 | 10.0.2.106 | 2 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
设置参数
本例设置session级别,只对当前连接有效。
gbase> show variables like '%seg%';
+-------------------------------------------------+------------+
| Variable_name | Value |
+-------------------------------------------------+------------+
| _gbase_segment_size | 2147483648 |
| _t_gcluster_single_segment_cluster_optimization | 0 |
| gcluster_segment_id_replace | 0 |
+-------------------------------------------------+------------+
3 rows in set (Elapsed: 00:00:00.00)
gbase> set gcluster_segment_id_replace=1;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)
数据
16行数据的hash分布表
gbase> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 16 |
+----------+
1 row in set (Elapsed: 00:00:00.08)
gbase> show create table t1;
+-------+----------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+----------------------------------------------------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE "t1" (
"id" int(11) DEFAULT NULL
) ENGINE=EXPRESS DISTRIBUTED BY('id') DEFAULT CHARSET=utf8 TABLESPACE='sys_tablespace' |
+-------+----------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (Elapsed: 00:00:00.00)
查询分片
通过segment_id在4个分片里查询。注意分片号不进行校验,如不存在则返回空。
gbase> select * from t1 where segment_id=1;
+------+
| id |
+------+
| 4 |
| 7 |
| 6 |
| 12 |
| 10 |
| 16 |
| 11 |
+------+
7 rows in set (Elapsed: 00:00:00.01)
gbase> select * from t1 where segment_id=2;
+------+
| id |
+------+
| 8 |
+------+
1 row in set (Elapsed: 00:00:00.01)
gbase> select * from t1 where segment_id=3;
+------+
| id |
+------+
| 1 |
| 9 |
| 13 |
+------+
3 rows in set (Elapsed: 00:00:00.01)
gbase> select * from t1 where segment_id=4;
+------+
| id |
+------+
| 2 |
| 15 |
| 3 |
| 5 |
| 14 |
+------+
5 rows in set (Elapsed: 00:00:00.01)
gbase> select * from t1 where segment_id=5;
Empty set (Elapsed: 00:00:00.01)
gbase> select * from t1 where segment_id=6;
Empty set (Elapsed: 00:00:00.01)
在查询结果里包含分片
在select 里包含分片号,比如
gbase> select segment_id,t1.* from t1;
+------------+------+
| segment_id | id |
+------------+------+
| 1 | 4 |
| 1 | 7 |
| 1 | 6 |
| 1 | 12 |
| 1 | 10 |
| 1 | 16 |
| 1 | 11 |
| 4 | 2 |
| 4 | 15 |
| 4 | 3 |
| 4 | 5 |
| 4 | 14 |
| 3 | 1 |
| 3 | 9 |
| 3 | 13 |
| 2 | 8 |
+------------+------+
16 rows in set (Elapsed: 00:00:00.02)
多表时,指定表名字
gbase> select segment_id(t1),segment_id(t2),t1.* from t1,t2 where t1.id=t2.id;
+------------+------------+------+
| segment_id | segment_id | id |
+------------+------------+------+
| 1 | 1 | 4 |
| 4 | 1 | 2 |
| 4 | 1 | 3 |
| 3 | 1 | 1 |
+------------+------------+------+
4 rows in set (Elapsed: 00:00:00.06)
统计各个分片的数据量
注意,此处返回的是正常分片的数据,用于排查数据倾斜很有用处。
gbase> select segment_id,count(*) from t1 group by segment_id;
+------------+----------+
| segment_id | count(*) |
+------------+----------+
| 4 | 5 |
| 1 | 7 |
| 2 | 1 |
| 3 | 3 |
+------------+----------+
4 rows in set (Elapsed: 00:00:00.17)