南大通用GBase 8a管理集群gcware的日志-vote leader、flower、candidate部分

本文分析下GBase 8a的管理集群gcware启动日志里,关于vote选举leader的部分内容,同时查看 当节点故障时,vote相关的变动内容。其中leader是主,flower是从,candidate是候选人。

环境

2节点集群

[gbase@gbase_rh7_001 log]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
| coordinator2 | 10.0.2.115 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.101                |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |                10.0.2.115                |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------

第一节点启动日志

本机的nodeid

该id将作为选举时的标识。


Mar 28 16:25:44.504496 NOTIC [MAIN  ] local nodeid:1694629898
Mar 28 16:25:44.504500 NOTIC [MAIN  ] bind addr:10.0.2.101

vote部分

其中如下部分,节点转化角色为候选人。

node 1694629898 switch to candidate

Mar 28 16:25:47.541343 NOTIC [GCWARE] begin load CCS meta
Mar 28 16:25:47.541347 NOTIC [GCWARE] finish load raft meta with term: 36019, voteFor:1694629898
Mar 28 16:25:47.541351 INFO  [GCWARE] open raft log file /opt/gbase/gcware/data/gcware/REDOLOG.111899 for load success, fd:11
Mar 28 16:25:47.541355 INFO  [GCWARE] close raft log /opt/gbase/gcware/data/gcware/REDOLOG.111899, fd:11
Mar 28 16:25:47.541359 INFO  [GCWARE] open raft log file /opt/gbase/gcware/data/gcware/REDOLOG.112041 for load success, fd:11
Mar 28 16:25:47.541363 INFO  [GCWARE] close raft log /opt/gbase/gcware/data/gcware/REDOLOG.112041, fd:11
Mar 28 16:25:47.541367 INFO  [GCWARE] open raft log file /opt/gbase/gcware/data/gcware/REDOLOG.112845 for load success, fd:11
Mar 28 16:25:47.591568 INFO  [GCWARE] close raft log /opt/gbase/gcware/data/gcware/REDOLOG.112845, fd:11
Mar 28 16:25:47.591596 INFO  [GCWARE] Showy: PN5GBCCS9TcpClientE::CreateEventWithNodeid start, ip 10.0.2.101 port 5918 nodeid 1694629898
Mar 28 16:25:47.591602 INFO  [GCWARE] Showy: PN5GBCCS9TcpClientE::CreateEventWithNodeid ok, ip 10.0.2.101 port 5918 nodeid 1694629898
Mar 28 16:25:47.591607 INFO  [GCWARE] Showy: PN5GBCCS9TcpClientE::CreateEventWithNodeid start, ip 10.0.2.115 port 5918 nodeid 1929510922
Mar 28 16:25:47.591765 INFO  [GCWARE] Showy: PN5GBCCS9TcpClientE::CreateEventWithNodeid ok, ip 10.0.2.115 port 5918 nodeid 1929510922
Mar 28 16:25:47.591771 INFO  [GCWARE] node 1694629898 switch to follower with term:36019 leader:0
Mar 28 16:25:47.591776 NOTIC [GCWARE] client server begin provide service
Mar 28 16:25:47.591889 INFO  [GCWARE] open raft log file /opt/gbase/gcware/data/gcware/REDOLOG.112845 for append success, fd:23
Mar 28 16:25:47.591899 NOTIC [GCWARE] client server event loop start
Mar 28 16:25:47.591903 NOTIC [GCWARE] client server acceptor begin
Mar 28 16:25:47.591907 WARN  [GCWARE] Imp: PN5GBCCS11TcpAcceptorE::run start accept...
Mar 28 16:25:47.591911 INFO  [GCWARE] Showy: PN5GBCCS11TcpAcceptorE::run accepted!, fd 24 ip  port 0
Mar 28 16:25:51.523786 INFO  [GCWARE] node 1694629898 switch to candidate
Mar 28 16:25:51.524076 INFO  [GCWARE] vote req from 1694629898, term:36020, candidate:1694629898, term:36019,index:117353,mine term:36019 index:117353, previous vote:0, current vote :1694629898 grant:1
Mar 28 16:25:51.533938 INFO  [GCWARE] finish save raft meta with term: 36020, voteFor:0
Mar 28 16:25:51.538713 INFO  [GCWARE] finish save raft meta with term: 36020, voteFor:1694629898
Mar 28 16:25:51.538840 INFO  [GCWARE] dealVoteRpcRes node 1694629898 grant me with term 36020

每隔一段时间,就会发起一次vote,直到成功,间隔大约在2-3秒。

当第二节点启动后的vote

其中如下代表收到了2个节点的投票信息,分别来自1694629898(本机)和另一个节点1929510922,然后1694629898转换成为了leader。

Mar 28 16:28:46.654368 INFO  [GCWARE] ignore message with lower term 36042 from 1929510922
Mar 28 16:28:47.567140 INFO  [GCWARE] node 1694629898 switch to candidate
Mar 28 16:28:47.567662 INFO  [GCWARE] vote req from 1694629898, term:36075, candidate:1694629898, term:36019,index:117353,mine term:36019 index:117353, previous vote:0, current vote :1694629898 grant:1
Mar 28 16:28:47.577373 INFO  [GCWARE] finish save raft meta with term: 36075, voteFor:0
Mar 28 16:28:47.582410 INFO  [GCWARE] dealVoteRpcRes node 1929510922 grant me with term 36075
Mar 28 16:28:47.585308 INFO  [GCWARE] finish save raft meta with term: 36075, voteFor:1694629898
Mar 28 16:28:47.585368 INFO  [GCWARE] dealVoteRpcRes node 1694629898 grant me with term 36075
Mar 28 16:28:47.585377 INFO  [GCWARE] node 1694629898 switch to leader with term:36075
Mar 28 16:28:47.585413 NOTIC [GCWARE] broadcast node state 1694629898 1
Mar 28 16:28:47.585538 NOTIC [GCWARE] broadcast node state 1929510922 0
Mar 28 16:28:47.585550 INFO  [GCWARE] statemachine sync with leader change from 0 to 1
Mar 28 16:28:47.585557 INFO  [GCWARE] follower state change 1929510922 from 0 to 1

第二节点启动日志

本机nodeid

Mar 28 16:28:41.005302 NOTIC [MAIN  ] local nodeid:1929510922
Mar 28 16:28:41.005307 NOTIC [MAIN  ] bind addr:10.0.2.115

vote部分

其中如下部分,节点转化角色为候选者,参与选举

Mar 28 16:28:46.648791 INFO  [GCWARE] node 1929510922 switch to candidate
Mar 28 16:28:46.649058 INFO  [GCWARE] vote req from 1929510922, term:36042, candidate:1929510922, term:36019,index:117353,mine term:36019 index:117353, previous vote:0, current vote :1929510922 grant:1
Mar 28 16:28:46.656952 INFO  [GCWARE] finish save raft meta with term: 36042, voteFor:0
Mar 28 16:28:46.661778 INFO  [GCWARE] finish save raft meta with term: 36042, voteFor:1929510922
Mar 28 16:28:46.661952 INFO  [GCWARE] dealVoteRpcRes node 1929510922 grant me with term 36042

选举结果出来了,其转成追随者。

Mar 28 16:28:47.562433 INFO  [GCWARE] node 1929510922 switch to follower with term:36075 leader:0
Mar 28 16:28:47.562463 INFO  [GCWARE] vote req from 1694629898, term:36075, candidate:1694629898, term:36019,index:117353,mine term:36019 index:117353, previous vote:0, current vote :1694629898 grant:1
Mar 28 16:28:47.571320 INFO  [GCWARE] finish save raft meta with term: 36075, voteFor:0
Mar 28 16:28:47.577133 INFO  [GCWARE] finish save raft meta with term: 36075, voteFor:1694629898

该部分日志

Mar 28 16:28:44.062948 NOTIC [GCWARE] client server begin provide service
Mar 28 16:28:44.063293 INFO  [GCWARE] open raft log file /opt/gbase/gcware/data/gcware/REDOLOG.112845 for append success, fd:24
Mar 28 16:28:44.063314 NOTIC [GCWARE] client server event loop start
Mar 28 16:28:44.063322 NOTIC [GCWARE] client server acceptor begin
Mar 28 16:28:46.648791 INFO  [GCWARE] node 1929510922 switch to candidate
Mar 28 16:28:46.649058 INFO  [GCWARE] vote req from 1929510922, term:36042, candidate:1929510922, term:36019,index:117353,mine term:36019 index:117353, previous vote:0, current vote :1929510922 grant:1
Mar 28 16:28:46.656952 INFO  [GCWARE] finish save raft meta with term: 36042, voteFor:0
Mar 28 16:28:46.661778 INFO  [GCWARE] finish save raft meta with term: 36042, voteFor:1929510922
Mar 28 16:28:46.661952 INFO  [GCWARE] dealVoteRpcRes node 1929510922 grant me with term 36042
Mar 28 16:28:47.562433 INFO  [GCWARE] node 1929510922 switch to follower with term:36075 leader:0
Mar 28 16:28:47.562463 INFO  [GCWARE] vote req from 1694629898, term:36075, candidate:1694629898, term:36019,index:117353,mine term:36019 index:117353, previous vote:0, current vote :1694629898 grant:1
Mar 28 16:28:47.571320 INFO  [GCWARE] finish save raft meta with term: 36075, voteFor:0
Mar 28 16:28:47.577133 INFO  [GCWARE] finish save raft meta with term: 36075, voteFor:1694629898
Mar 28 16:28:47.970905 INFO  [GCWARE] Showy: PN5GBCCS11TcpAcceptorE::run accepted!, fd 34 ip  port 0
Mar 28 16:28:48.134417 ERROR [GCWARE] previous log not exist index:117356, term:36075
Mar 28 16:28:48.134551 ERROR [GCWARE] previous log not exist index:117356, term:36075
Mar 28 16:28:48.135124 WARN  [GCWARE] log already exist index:117354, term:36075
Mar 28 16:28:48.149216 INFO  [GCWARE] node 1694629898 state plan change to 0
Mar 28 16:28:48.149247 INFO  [GCWARE] node 1694629898 state change to 0
Mar 28 16:28:48.149253 INFO  [GCWARE] node 1929510922 state plan change to 1
Mar 28 16:28:48.149258 INFO  [GCWARE] node 1929510922 state change to 1
Mar 28 16:28:48.149262 INFO  [GCWARE] CCS create session 31899,1,1,112850,21

当前leader节点服务停止启动

第一个节点服务停止

[gbase@gbase_rh7_001 log]$ gcluster_services gcware stop
Stopping gcware : 

第一节点日志

Mar 28 16:40:19.910979 NOTIC [CRM   ] [DEBUG]: crm_exec_exit_fn [ENTER]
Mar 28 16:40:19.910983 INFO  [CRM   ] crm switch snapshot begin!
Mar 28 16:40:19.910988 NOTIC [LCK   ] [INFO]: lck_sync_data_abort
Mar 28 16:40:19.910992 NOTIC [LCK   ] [INFO]: lck_sync_data_abort end
Mar 28 16:40:19.910995 INFO  [CRM   ] crm switch snapshot end!
Mar 28 16:40:19.910999 NOTIC [CRM   ] [DEBUG]: crm_exec_exit_fn [LEAVE]
Mar 28 16:40:19.911267 NOTIC [MAIN  ] Cluster Consist Service Engine exiting with status 0 at main.c:2381.

从第二个节点看,leader从1变成了0, 节点变化角色为选举者,并发起了投票。

Mar 28 16:40:20.839295 INFO  [GCWARE] statemachine sync with leader change from 1 to 0
Mar 28 16:40:20.839381 INFO  [GCWARE] node 1929510922 switch to candidate
Mar 28 16:40:20.839391 INFO  [GCWARE] vote req from 1929510922, term:36076, candidate:1929510922, term:36075,index:117545,mine term:36075 index:117545, previous vote:0, current vote :1929510922 grant:1
Mar 28 16:40:20.844925 INFO  [GCWARE] finish save raft meta with term: 36076, voteFor:0
Mar 28 16:40:20.849696 INFO  [GCWARE] finish save raft meta with term: 36076, voteFor:1929510922
Mar 28 16:40:20.849851 INFO  [GCWARE] dealVoteRpcRes node 1929510922 grant me with term 36076
Mar 28 16:40:23.417719 INFO  [GCWARE] node 1929510922 switch to candidate
Mar 28 16:40:23.417939 INFO  [GCWARE] vote req from 1929510922, term:36077, candidate:1929510922, term:36075,index:117545,mine term:36075 index:117545, previous vote:0, current vote :1929510922 grant:1
Mar 28 16:40:23.422567 INFO  [GCWARE] finish save raft meta with term: 36077, voteFor:0
Mar 28 16:40:23.427721 INFO  [GCWARE] finish save raft meta with term: 36077, voteFor:1929510922
Mar 28 16:40:23.427853 INFO  [GCWARE] dealVoteRpcRes node 1929510922 grant me with term 36077

第一个节点服务被monit自动拉起后,该节点成为了follower。

Mar 28 16:40:26.956760 INFO  [GCWARE] node 1694629898 switch to follower with term:36078 leader:0
Mar 28 16:40:26.956883 INFO  [GCWARE] vote req from 1929510922, term:36078, candidate:1929510922, term:36075,index:117545,mine term:36075 index:117545, previous vote:0, current vote :1929510922 grant:1
Mar 28 16:40:26.964026 INFO  [GCWARE] finish save raft meta with term: 36078, voteFor:0
Mar 28 16:40:26.971457 INFO  [GCWARE] finish save raft meta with term: 36078, voteFor:1929510922
Mar 28 16:40:26.973217 INFO  [GCWARE] Showy: PN5GBCCS11TcpAcceptorE::run accepted!, fd 32 ip  port 0
Mar 28 16:40:27.151832 INFO  [GCWARE] statemachine sync with leader change from 0 to 1

第二个节点成为了leader

Mar 28 16:40:26.957790 INFO  [GCWARE] node 1929510922 switch to candidate
Mar 28 16:40:26.958029 INFO  [GCWARE] vote req from 1929510922, term:36078, candidate:1929510922, term:36075,index:117545,mine term:36075 index:117545, previous vote:0, current vote :1929510922 grant:1
Mar 28 16:40:26.966603 INFO  [GCWARE] finish save raft meta with term: 36078, voteFor:0
Mar 28 16:40:26.974389 INFO  [GCWARE] finish save raft meta with term: 36078, voteFor:1929510922
Mar 28 16:40:26.974740 INFO  [GCWARE] dealVoteRpcRes node 1929510922 grant me with term 36078
Mar 28 16:40:26.975362 INFO  [GCWARE] dealVoteRpcRes node 1694629898 grant me with term 36078
Mar 28 16:40:26.975385 INFO  [GCWARE] node 1929510922 switch to leader with term:36078
Mar 28 16:40:26.975391 NOTIC [GCWARE] broadcast node state 1694629898 0
Mar 28 16:40:26.975395 NOTIC [GCWARE] broadcast node state 1929510922 1
Mar 28 16:40:26.975400 INFO  [GCWARE] statemachine sync with leader change from 0 to 1
Mar 28 16:40:27.172031 INFO  [GCWARE] node 1694629898 state plan change to 0
Mar 28 16:40:27.172067 INFO  [GCWARE] node 1694629898 state change to 0
Mar 28 16:40:27.172074 INFO  [GCWARE] node 1929510922 state plan change to 1

总结

当集群没有leader时,包括leader离线等,会发起选举

statemachine sync with leader change from 1 to 0

选举成功,leader会从0变成1

statemachine sync with leader change from 0 to 1