南大通用GBase 8a扩容缩容时重分布进度查询rebalancing_status

发表于2021年9月6日2022年6月1日作者 laozizhu

GBase 8a在做扩容缩容等操作时，会将需要重分布的任务，放到gclusterdb.rebalancing_status元数据表里，本文介绍该进度表的使用方法。

目录导航

表结构

字段	类型	说明
index_name	varchar(129)	任务索引，包含了库名和表名，用小数点分割
db_name	varchar(64)	库名
table_name	varchar(64)	表名
tmptable	varchar(129)	数据迁移用到的临时表名字，一般以rebalanc开头
start_time	datetime	开始时间，RUNNING状态有效
end_time	datetime	完成时间，FINISHED状态有效
status	varchar(32)	任务状态。STARTING、RUNNING、COMPLETED、PAUSED、CANCELED
percentage	int	完成百分比，无百分号。RUNNING状态有效。默认0
priority	int	优先级。越小优先级越高，默认5。
host	varchar(60)	任务所在IP
distribution_id	bigint	目标distribution_id

gbase> desc rebalancing_status;
+-----------------+--------------+------+-----+---------+-------+
| Field           | Type         | Null | Key | Default | Extra |
+-----------------+--------------+------+-----+---------+-------+
| index_name      | varchar(129) | YES  |     | NULL    |       |
| db_name         | varchar(64)  | YES  |     | NULL    |       |
| table_name      | varchar(64)  | YES  |     | NULL    |       |
| tmptable        | varchar(129) | YES  |     | NULL    |       |
| start_time      | datetime     | YES  |     | NULL    |       |
| end_time        | datetime     | YES  |     | NULL    |       |
| status          | varchar(32)  | YES  |     | NULL    |       |
| percentage      | int(11)      | YES  |     | NULL    |       |
| priority        | int(11)      | YES  |     | NULL    |       |
| host            | varchar(60)  | YES  |     | NULL    |       |
| distribution_id | bigint(8)    | YES  |     | NULL    |       |
+-----------------+--------------+------+-----+---------+-------+
11 rows in set (Elapsed: 00:00:00.00)

重分布任务状态

STARTING 准备状态，排队中
RUNNING 正在重分布中
COMPLETED 已经完成了重分布
PAUSED 任务暂停
CANCELED 任务取消

状态转换规则如下所示

表处于 STARTING 状态时，Coordinator 后台线程开始执行表的 rebalance 操作，表状态转换成 RUNNING。
表处于 PAUSED 状态时，对这个表执行 continue rebalance 操作，表状态转换成RUNNING。
表处于 RUNNING 状态时，coordinator 后台线程执行表的 rebalance 操作失败，表状态转换成 STARTING。
表处于 RUNNING 状态时，coordinator 后台线程完成了表的 rebalance 操作，表状态转换成 COMPLETED。

查询总体重分布进度

根据STATUS统计每个状态的数量。如下例子，等待中的15个，正在重分布的3个，完成了3个。

gbase> select status,count(*) from gclusterdb.rebalancing_status group by status;
+-----------+----------+
| status    | count(*) |
+-----------+----------+
| STARTING  |       15 |
| RUNNING   |        3 |
| COMPLETED |        3 |
+-----------+----------+
3 rows in set (Elapsed: 00:00:00.61)

当所有的表都处于COMPLETED状态，则重分布全部完成。

gbase> select status,count(*) from gclusterdb.rebalancing_status group by status;
+-----------+----------+
| status    | count(*) |
+-----------+----------+
| COMPLETED |       21 |
+-----------+----------+
1 row in set (Elapsed: 00:00:00.19)

查看正在重分布的任务进度

查看状态为RUNNING的进度。如下是4个正在运行的任务。

gbase> select * from rebalancing_status where status='RUNNING';
+-----------------------------------+------------+----------------------------+----------+----------------------------+------------+-----------+------------+----------+----------+-----------------+
| index_name                        | db_name    | table_name                 | tmptable | start_time                 | end_time   | status    | percentage | priority | host     | distribution_id |
+-----------------------------------+------------+----------------------------+----------+----------------------------+------------+-----------+------------+----------+----------+-----------------+
| testdb.t1                         | testdb     | t1                         |          | 2018-11-07 09:05:35.216000 |            | RUNNING   |        30  |        5 | 2001::61 |               2 |
| testdb.aggr_http_ci_hour_20180904 | testdb     | aggr_http_ci_hour_20180904 |          | 2018-11-07 09:05:35.276000 |            | RUNNING   |        10  |        5 | 2001::61 |               2 |
| testdb.dim_ci                     | testdb     | dim_ci                     |          | 2018-11-07 09:05:35.333000 |            | RUNNING   |        50  |        5 | 2001::61 |               2 |
| gclusterdb.audit_log_express      | gclusterdb | audit_log_express          |          | 2018-11-07 09:05:35.251000 |            | RUNNING   |        20  |        5 | 2001::61 |               2 |
+-----------------------------------+------------+----------------------------+----------+----------------------------+------------+-----------+------------+----------+----------+-----------------+
4 rows in set (Elapsed: 00:00:00.02)

需要注意，默认重分布时，每个节点，每次处理的数据是通过参数（gcluster_rebalancing_step）配置的，在V8里默认是2000万行，在V95里是1亿行。

再次提醒，是每个节点。比如100一行，10个节点平均，则每个节点处理10亿行，需要处理10轮。对应的percentage在处理时会变动。如果节点处理的数据少于参数值，则一次性处理，percentage从0直接到99或者100。

暂定重分布

pause rebalance table 库名.表名

以rebalancing_status里面的index_name为准。

gbase> select * from rebalancing_status;
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
| index_name                | db_name | table_name         | tmptable | start_time                 | end_time | status   | percentage | priority | host | distribution_id |
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
| testdb.t1                 | testdb  | t1                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t2                 | testdb  | t2                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_procedure        | testdb  | t_procedure        | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_procedure_testdb | testdb  | t_procedure_testdb | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_enc              | testdb  | t_enc              | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.ta                 | testdb  | ta                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
6 rows in set (Elapsed: 00:00:00.01)

gbase> pause rebalance table testdb.t1;
Query OK, 1 row affected (Elapsed: 00:00:00.08)

gbase> select * from rebalancing_status;
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
| index_name                | db_name | table_name         | tmptable | start_time                 | end_time | status   | percentage | priority | host | distribution_id |
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
| testdb.t1                 | testdb  | t1                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | PAUSED   |          0 |        5 | NULL |               4 |
| testdb.t2                 | testdb  | t2                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_procedure        | testdb  | t_procedure        | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_procedure_testdb | testdb  | t_procedure_testdb | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_enc              | testdb  | t_enc              | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.ta                 | testdb  | ta                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
6 rows in set (Elapsed: 00:00:00.14)

继续重分布

continue rebalance table 库名.表名

以rebalancing_status里面的index_name为准。

gbase> select * from rebalancing_status;
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
| index_name                | db_name | table_name         | tmptable | start_time                 | end_time | status   | percentage | priority | host | distribution_id |
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
| testdb.t1                 | testdb  | t1                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | PAUSED   |          0 |        5 | NULL |               4 |
| testdb.t2                 | testdb  | t2                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_procedure        | testdb  | t_procedure        | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_procedure_testdb | testdb  | t_procedure_testdb | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.t_enc              | testdb  | t_enc              | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
| testdb.ta                 | testdb  | ta                 | NULL     | 2022-06-01 09:49:42.633000 | NULL     | STARTING |          0 |        5 | NULL |               4 |
+---------------------------+---------+--------------------+----------+----------------------------+----------+----------+------------+----------+------+-----------------+
6 rows in set (Elapsed: 00:00:00.14)

gbase>
gbase>
gbase>
gbase> continue rebalance table testdb.t1;
Query OK, 1 row affected (Elapsed: 00:00:00.08)

gbase> select * from rebalancing_status;
+---------------------------+---------+--------------------+----------+----------------------------+----------------------------+-----------+------------+----------+------------+-----------------+
| index_name                | db_name | table_name         | tmptable | start_time                 | end_time                   | status    | percentage | priority | host       | distribution_id |
+---------------------------+---------+--------------------+----------+----------------------------+----------------------------+-----------+------------+----------+------------+-----------------+
| testdb.t1                 | testdb  | t1                 | NULL     | 2022-06-01 09:49:42.633000 | NULL                       | STARTING  |          0 |        5 | NULL       |               4 |
| testdb.t2                 | testdb  | t2                 | NULL     | 2022-06-01 09:49:42.633000 | NULL                       | STARTING  |          0 |        5 | NULL       |               4 |
| testdb.t_procedure        | testdb  | t_procedure        | NULL     | 2022-06-01 09:49:42.633000 | NULL                       | STARTING  |          0 |        5 | NULL       |               4 |
| testdb.t_procedure_testdb | testdb  | t_procedure_testdb | NULL     | 2022-06-01 09:49:42.633000 | NULL                       | STARTING  |          0 |        5 | NULL       |               4 |
| testdb.t_enc              | testdb  | t_enc              | NULL     | 2022-06-01 09:49:42.633000 | NULL                       | STARTING  |          0 |        5 | NULL       |               4 |
| testdb.ta                 | testdb  | ta                 | NULL     | 2022-06-01 09:56:27.395000 | 2022-06-01 09:56:27.395000 | COMPLETED |          0 |        5 | 10.0.2.101 |               4 |
+---------------------------+---------+--------------------+----------+----------------------------+----------------------------+-----------+------------+----------+------------+-----------------+
6 rows in set (Elapsed: 00:00:00.01)

取消重分布

cancel rebalance table 库名.表名

以rebalancing_status里面的index_name为准。

不保证一定能成功。

总结

通过统计已经FINISHED状态的表数量，计算出整体进度。通过查询RUNNING状态的数据，查看当前表的进度。

由于不同的表，数据量不同，耗时不同，所以通过进度来评估预计耗时，并没有那么精确，只能作为参考。用户可以根据业务数据量情况，划分不同的优先级，以此来评估每个不同优先级的完成进度，相对更准确些。

Post Views: 896

2025年4月
一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

表结构

重分布任务状态

查询总体重分布进度

查看正在重分布的任务进度

暂定重分布

继续重分布

取消重分布

总结

相关文章: