南大通用GBase 8a的V9.5版本gcware服务启动检测和停止的部分日志

本文介绍GBase 8a V9.5版本的gcware服务启动和停止的日志,其中有些标志表示服务可用或已经停止。本文也提供了一种手段,循环检测gcadmin命令,直到有节点可以对外提供服务了才返回。

启动

开始标记

Cluster Consist Service Engine 9.5.121227: started and ready to provide service.

可提供服务标记

statemachine sync with leader change from 0 to 1

完整启动日志

Dec 30 14:26:37.975666 WARN  [MAIN  ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
Dec 30 14:26:37.975750 WARN  [MAIN  ] Could not lock memory of service to avoid page faults: Cannot allocate memory (12)
Dec 30 14:26:37.975754 NOTIC [MAIN  ] Cluster Consist Service Engine 9.5.121227: started and ready to provide service.
Dec 30 14:26:37.975759 INFO  [MAIN  ] Corosync built-in features:
Dec 30 14:26:37.975762 NOTIC [MAIN  ] Successfully configured gcw services to load
Dec 30 14:26:37.975766 NOTIC [MAIN  ] Successfully read main configuration file '/opt/gbase/gcware/config/gcware.conf'.
Dec 30 14:26:37.975769 WARN  [GCWARE] Warning: the to_file config paramater has been obsoleted. See corosync.conf man page to_logfile directive.
Dec 30 14:26:38.173917 INFO  [MAIN  ] file check succeed.
Dec 30 14:26:38.173939 NOTIC [MAIN  ] GBCCS parameters:
Dec 30 14:26:38.173943 NOTIC [MAIN  ] leader heart beat:200
Dec 30 14:26:38.173946 NOTIC [MAIN  ] election timeout:2000
Dec 30 14:26:38.173949 NOTIC [MAIN  ] session timeout:40000
Dec 30 14:26:38.173952 NOTIC [MAIN  ] client port:5919
Dec 30 14:26:38.173955 NOTIC [MAIN  ] server port:5918
Dec 30 14:26:38.173958 NOTIC [MAIN  ] max message size:1048576
Dec 30 14:26:38.173961 NOTIC [MAIN  ] max redolog size:536870912
Dec 30 14:26:38.173965 NOTIC [MAIN  ] data dir:/opt/gbase/gcware/data/gcware
Dec 30 14:26:38.173968 NOTIC [MAIN  ] redo log dir:/opt/gbase/gcware/data/gcware
Dec 30 14:26:38.173971 NOTIC [MAIN  ] local nodeid:1
Dec 30 14:26:38.173974 NOTIC [MAIN  ] bind addr:gbase_rh7_001
Dec 30 14:26:38.173977 INFO  [MAIN  ] add config member gbase_rh7_001 with groupid 0 nodeid 1
Dec 30 14:26:38.173980 NOTIC [LCK   ] [DEBUG]: lck_exec_init_fn
Dec 30 14:26:38.173983 INFO  [GCWARE] begin init raft root instance
Dec 30 14:26:38.173986 INFO  [GCWARE] init raft root instance successfully
Dec 30 14:26:38.173989 INFO  [LCK   ] [INFO]: init_lck_default_info local_nodeid = 1
Dec 30 14:26:39.183445 INFO  [LCK   ] [DEBUG]: lck_exec_init_fn [LEAVE]
Dec 30 14:26:39.183467 NOTIC [SERV  ] Service engine loaded: gcw distributed locking service A.01.01
Dec 30 14:26:39.183473 NOTIC [CRM   ] [DEBUG]: crm_exec_init_fn [ENTER]
Dec 30 14:26:39.183477 INFO  [CRM   ] rlim_cur: 65535  rlim_max: 65535
Dec 30 14:26:39.183481 INFO  [GCWARE] begin init raft root instance
Dec 30 14:26:39.183485 INFO  [GCWARE] init raft root instance successfully
Dec 30 14:26:39.183489 INFO  [CRM   ] Reading configuration
Dec 30 14:26:39.183492 INFO  [CRM   ] Reading coordinator
Dec 30 14:26:39.183496 INFO  [CRM   ] Reading member
Dec 30 14:26:39.183500 INFO  [CRM   ] Reading memberaddr
Dec 30 14:26:39.183504 INFO  [CRM   ] ipaddr = gbase_rh7_001
Dec 30 14:26:39.183508 INFO  [CRM   ] nodeName = coordinator1
Dec 30 14:26:39.183512 INFO  [CRM   ] Reading nodeid
Dec 30 14:26:39.183516 INFO  [CRM   ] nodeid = 1
Dec 30 14:26:39.183520 NOTIC [CRM   ] [DEBUG]: crm_exec_init_fn [LEAVE]
Dec 30 14:26:39.183524 NOTIC [SERV  ] Service engine loaded: gcw crm service 1.0.0
Dec 30 14:26:39.183532 NOTIC [CLM   ] [DEBUG]: clm_exec_init_fn [ENTER]
Dec 30 14:26:39.183536 INFO  [CLM   ] Reading configuration
Dec 30 14:26:39.183539 INFO  [CLM   ] check_interval = 30
Dec 30 14:26:39.183542 INFO  [CLM   ] cfg_whole_check_interval_num = 20
Dec 30 14:26:39.183545 INFO  [CLM   ] cfg_check_tcp_only = 1
Dec 30 14:26:39.183548 INFO  [CLM   ] cfg_check_times_judge_failure = 3
Dec 30 14:26:39.183551 INFO  [CLM   ] cfg_connect_timeout = 5000
Dec 30 14:26:39.183554 INFO  [CLM   ] gcluster_port = 5258
Dec 30 14:26:39.183557 INFO  [CLM   ] gnode_port = 5050
Dec 30 14:26:39.183560 INFO  [CLM   ] syncserver_port = 5288
Dec 30 14:26:39.183563 INFO  [CLM   ] node_ssh_port = 22
Dec 30 14:26:39.183571 INFO  [CLM   ] check_coordinator_thread_num = 1
Dec 30 14:26:39.183575 INFO  [CLM   ] check_dataserver_thread_num = 10
Dec 30 14:26:39.183578 INFO  [GCWARE] begin init raft root instance
Dec 30 14:26:39.183581 INFO  [GCWARE] init raft root instance successfully
Dec 30 14:26:39.183584 INFO  [CLM   ] [INFO]: local_nodeid = 1
Dec 30 14:26:40.186270 INFO  [CLM   ] [DEBUG]: clm_exec_init_fn [LEAVE]
Dec 30 14:26:40.186288 NOTIC [SERV  ] Service engine loaded: gcw cluster membership service A.01.01
Dec 30 14:26:40.186334 NOTIC [SERV  ] Service engine loaded: corosync extended virtual synchrony service
Dec 30 14:26:40.186344 NOTIC [SERV  ] Service engine loaded: corosync configuration service
Dec 30 14:26:40.186347 NOTIC [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01
Dec 30 14:26:40.186351 NOTIC [SERV  ] Service engine loaded: corosync cluster config database access v1.01
Dec 30 14:26:40.186354 NOTIC [SERV  ] Service engine loaded: corosync profile loading service
Dec 30 14:26:40.186357 NOTIC [SERV  ] Service engine loaded: corosync cluster quorum service v0.1
Dec 30 14:26:40.186360 NOTIC [GCWARE] begin load snapshot index:886 term:69
Dec 30 14:26:40.186363 INFO  [GCWARE] session manager load session begin
Dec 30 14:26:40.186366 INFO  [GCWARE] session manager load session finish
Dec 30 14:26:40.186369 INFO  [LCK   ] load lck snapshort begin
Dec 30 14:26:40.186372 INFO  [LCK   ] [INFO]: init_lck_default_info local_nodeid = 1
Dec 30 14:26:41.197905 INFO  [LCK   ] load all lock from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine begin
Dec 30 14:26:41.197931 INFO  [LCK   ] load all lock from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/QUEUE_RESOURCE end
Dec 30 14:26:41.197935 INFO  [LCK   ] load commit list from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine begin
Dec 30 14:26:41.197939 INFO  [LCK   ] load commit list from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/FAILOVER end
Dec 30 14:26:41.197942 INFO  [LCK   ] load recover regist from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine begin
Dec 30 14:26:41.197946 INFO  [LCK   ] load recover regist from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/RECOVER_REGIST end
Dec 30 14:26:41.197949 INFO  [LCK   ] load common regist list from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine begin
Dec 30 14:26:41.197952 INFO  [LCK   ] load common regist list from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/COMMON_REGIST end
Dec 30 14:26:41.197955 INFO  [LCK   ] load commit takeover list from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine begin
Dec 30 14:26:41.197958 INFO  [LCK   ] load commit takeover list from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/TAKEOVER_REGIST end
Dec 30 14:26:41.197961 INFO  [LCK   ] load lck snapshort end
Dec 30 14:26:41.197965 INFO  [CRM   ] crm load snapshot begin!
Dec 30 14:26:41.197968 INFO  [CRM   ] begin load public info snapshot path:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine
Dec 30 14:26:41.197971 INFO  [CRM   ] scn file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/SCN
Dec 30 14:26:41.197974 INFO  [CRM   ] tableid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/TABLEID
Dec 30 14:26:41.197977 INFO  [CRM   ] gcware_mode file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/GCWARE_MODE
Dec 30 14:26:41.197981 INFO  [CRM   ] taskid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/TASKID
Dec 30 14:26:41.197984 INFO  [CRM   ] vcid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/VCID
Dec 30 14:26:41.197987 INFO  [CRM   ] vcmirrorid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/VCMIRRORID
Dec 30 14:26:41.197990 INFO  [CRM   ] nodeid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/NODEID
Dec 30 14:26:41.197993 INFO  [CRM   ] commitid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/COMMITID
Dec 30 14:26:41.198002 INFO  [CRM   ] distributionid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/DISTRIBUTIONID
Dec 30 14:26:41.198005 INFO  [CRM   ] freedataservernodes file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/FREEDATASERVER
Dec 30 14:26:41.198008 INFO  [CRM   ] freedataclusterstates file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/DATACLUSTERSTATE
Dec 30 14:26:41.198011 INFO  [CRM   ] coordinatorclusterstate file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/COORDINATORSTATE
Dec 30 14:26:41.198015 INFO  [CRM   ] feventlog file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/FEVENTLOG
Dec 30 14:26:41.198018 INFO  [CRM   ] incomplete ddlfevent file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/INCOMPLETE_DDLFEVENT
Dec 30 14:26:41.198021 INFO  [CRM   ] incomplete dmlfevent file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/INCOMPLETE_DMLFEVENT
Dec 30 14:26:41.198024 INFO  [CRM   ] vcmirrorlist file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/VCMIRROR
Dec 30 14:26:41.198027 INFO  [CRM   ] distributiontovc file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/DISTRIBUTION2VC
Dec 30 14:26:41.198030 INFO  [CRM   ] rcvc file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/RCVC
Dec 30 14:26:41.198033 INFO  [CRM   ] KV store file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/KVSTORE
Dec 30 14:26:41.198036 INFO  [CRM   ] finish load public info snapshot path:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine
Dec 30 14:26:41.198039 ERROR [CRM   ] nodeid(1) is repeated! ip is gbase_rh7_001
Dec 30 14:26:41.198043 INFO  [CRM   ] crm load snapshot end success!
Dec 30 14:26:41.198046 INFO  [GCWARE] session manager switch
Dec 30 14:26:41.198049 INFO  [GCWARE] session manager clear begin
Dec 30 14:26:41.198052 INFO  [GCWARE] session manager clear end
Dec 30 14:26:41.198055 INFO  [GCWARE] session manager switch finish
Dec 30 14:26:41.198058 INFO  [CRM   ] crm switch snapshot begin!
Dec 30 14:26:41.198061 NOTIC [LCK   ] [INFO]: lck_sync_data_abort
Dec 30 14:26:41.198064 NOTIC [LCK   ] [INFO]: lck_sync_data_abort end
Dec 30 14:26:41.198067 INFO  [CRM   ] crm switch snapshot end!
Dec 30 14:26:41.198070 NOTIC [GCWARE] begin load CCS meta
Dec 30 14:26:41.198073 NOTIC [GCWARE] finish load raft meta with term: 86, voteFor:1
Dec 30 14:26:41.198076 INFO  [GCWARE] open raft log file /opt/gbase/gcware/data/gcware/REDOLOG.84 for load success, fd:11
Dec 30 14:26:41.198080 INFO  [GCWARE] close raft log /opt/gbase/gcware/data/gcware/REDOLOG.84, fd:11
Dec 30 14:26:41.198083 INFO  [GCWARE] open raft log file /opt/gbase/gcware/data/gcware/REDOLOG.114 for load success, fd:11
Dec 30 14:26:41.198086 INFO  [GCWARE] close raft log /opt/gbase/gcware/data/gcware/REDOLOG.114, fd:11
Dec 30 14:26:41.198089 INFO  [GCWARE] open raft log file /opt/gbase/gcware/data/gcware/REDOLOG.888 for load success, fd:11
Dec 30 14:26:41.247864 INFO  [GCWARE] close raft log /opt/gbase/gcware/data/gcware/REDOLOG.888, fd:11
Dec 30 14:26:41.247883 INFO  [GCWARE] Showy: PN5GBCCS9TcpClientE::CreateEventWithNodeid start, ip gbase_rh7_001 port 5918 nodeid 1
Dec 30 14:26:41.247887 INFO  [GCWARE] Showy: PN5GBCCS9TcpClientE::CreateEventWithNodeid ok, ip gbase_rh7_001 port 5918 nodeid 1
Dec 30 14:26:41.247890 INFO  [GCWARE] node 1 switch to follower with term:86 leader:0
Dec 30 14:26:41.247893 NOTIC [GCWARE] client server begin provide service
Dec 30 14:26:41.258494 NOTIC [GCWARE] client server event loop start
Dec 30 14:26:41.258806 INFO  [GCWARE] open raft log file /opt/gbase/gcware/data/gcware/REDOLOG.888 for append success, fd:22
Dec 30 14:26:41.259015 WARN  [GCWARE] Imp: PN5GBCCS11TcpAcceptorE::run start accept...
Dec 30 14:26:41.259031 INFO  [GCWARE] Showy: PN5GBCCS11TcpAcceptorE::run accepted!, fd 23 ip  port 0
Dec 30 14:26:41.259275 NOTIC [GCWARE] client server acceptor begin
Dec 30 14:26:44.886762 INFO  [GCWARE] node 1 switch to candidate
Dec 30 14:26:44.886921 INFO  [GCWARE] vote req from 1, term:87, candidate:1, term:86,index:8849,mine term:86 index:8849, previous vote:0, current vote :1 grant:1
Dec 30 14:26:44.891330 INFO  [GCWARE] finish save raft meta with term: 87, voteFor:0
Dec 30 14:26:44.896039 INFO  [GCWARE] finish save raft meta with term: 87, voteFor:1
Dec 30 14:26:44.896394 INFO  [GCWARE] dealVoteRpcRes node 1 grant me with term 87
Dec 30 14:26:44.896547 INFO  [GCWARE] node 1 switch to leader with term:87
Dec 30 14:26:44.896555 NOTIC [GCWARE] broadcast node state 1 1
Dec 30 14:26:44.896558 INFO  [GCWARE] statemachine sync with leader change from 0 to 1
Dec 30 14:26:44.911619 INFO  [GCWARE] CCS destroy session 859
Dec 30 14:26:44.911645 INFO  [LCK   ] lck destroy session:859
Dec 30 14:26:44.911649 INFO  [CRM   ] CrmDestroySession,SessionID is:859!
Dec 30 14:26:44.911652 ERROR [GCWARE] server force to close session with session ID:845
Dec 30 14:26:44.911656 INFO  [GCWARE] CCS destroy session 845
Dec 30 14:26:44.911659 INFO  [LCK   ] lck destroy session:845
Dec 30 14:26:44.911662 INFO  [CRM   ] CrmDestroySession,SessionID is:845!
Dec 30 14:26:44.911665 INFO  [GCWARE] CCS destroy session 490
Dec 30 14:26:44.911668 INFO  [LCK   ] lck destroy session:490
Dec 30 14:26:44.911671 NOTIC [LCK   ] during session destroy commit take over delete regist info nodeid: 1 session:490

如何判断gcware服务是否已经可服务

目前数据库版本的gcware服务启动,只是拉起了进程,并不判断进程是否可提供服务了。

[gbase@gbase_rh7_001 ~]$ gcluster_services all start
Starting gcware :                                          [  OK  ]
Starting gcluster :                                        [  OK  ]
Starting gcrecover :                                       [  OK  ]
Starting gbase :                                           [  OK  ]
Starting syncserver :                                      [  OK  ]
Starting GCMonit success!
[gbase@gbase_rh7_001 ~]$ g

而gcware服务时,是有一些数据要处理的,包括与其它正常节点的同步,处理event信息等。如果在没有处理完成前,访问将不能提供服务。

当前版本,建议用循环的方式,进行判断,比如:

[gbase@gbase_rh7_001 ~]$ cat checkGcware.sh
for((i=1;i<=10;i++));
do
  gcadmin # > /dev/null 2>&1
  rtn=$?
  echo $rtn
  if [ $rtn -eq 0 ];
  then
    break
  fi
  sleep 10;
done

当查询失败时,进行等待10秒,然后再次执行。 最多执行10次。 通过调整参数可以做到gcware服务正常了才返回的效果。

其中当服务进程不存在时的返回码是255

[gbase@gbase_rh7_001 ~]$ sh checkGcware.sh
[gcadmin] Could not initialize CRM instance error: [122]->[can not connect to any server]
255
[gcadmin] Could not initialize CRM instance error: [122]->[can not connect to any server]
255
^C

所以也可以将判断改成

#!/bin/bash
#循环次数,和后面的等待秒数决定了最长等待时间
for((i=1;i<=10;i++));
do
  # timeout避免V952的超长时间不返回
  timeout 30 gcadmin # > /dev/null 2>&1
  rtn=$?
  echo $rtn
  # 判断返回值不等于0
  # if [ $rtn -eq 0 ];

  # 判断返回不是255, can not connect to any server
  if [ $rtn -ne 255 ];
  then
    break
  fi

  #等待秒数
  sleep 10;
done

停止

开始停止标记

Cluster Consist Service Engine begin stop with terminate command

停止完成标记

Cluster Consist Service Engine exiting with status 0 at main.c:2381.

完整停止日志

注意,本文是在没有任何连接时的日志,所以这部分日志很干净。

Dec 30 14:19:30.994880 NOTIC [MAIN  ] Cluster Consist Service Engine begin stop with terminate command
Dec 30 14:19:31.137853 WARN  [GCWARE] Imp: PN5GBCCS7UDPRecvE::run UDPRecv [END]
Dec 30 14:19:31.400630 WARN  [GCWARE] Imp: PN5GBCCS7UDPSendE::run UDPSend [END]
Dec 30 14:19:31.638575 WARN  [GCWARE] Imp: PN5GBCCS11TcpAcceptorE::run TCP.Acceptor [END]
Dec 30 14:19:32.532543 WARN  [GCWARE] Imp: PN5GBCCS9EventLoopE::StartLoop TCP.EventLoop [END]
Dec 30 14:19:32.533104 WARN  [GCWARE] Imp: PN5GBCCS11TcpIOThreadE::Stop TcpIOThread [END]
Dec 30 14:19:32.537528 WARN  [GCWARE] Imp: PN5GBCCS6IOLoopE::run CCS thread [End]
Dec 30 14:19:33.487502 INFO  [GCWARE] raft log checksum thread stop
Dec 30 14:19:33.487574 INFO  [GCWARE] raft log thread stop
Dec 30 14:19:34.401835 CRIT  [GCWARE] async worker begin stop
Dec 30 14:19:34.401920 CRIT  [GCWARE] async worker stop
Dec 30 14:19:34.767074 INFO  [GCWARE] client server acceptor end
Dec 30 14:19:35.203327 INFO  [GCWARE] client server event loop end
Dec 30 14:19:35.204539 NOTIC [LCK   ] [DEBUG]: lck_exec_exit_fn
Dec 30 14:19:35.204590 NOTIC [CRM   ] [DEBUG]: crm_exec_exit_fn [ENTER]
Dec 30 14:19:35.204605 INFO  [CRM   ] crm switch snapshot begin!
Dec 30 14:19:35.204994 NOTIC [LCK   ] [INFO]: lck_sync_data_abort
Dec 30 14:19:35.205035 NOTIC [LCK   ] [INFO]: lck_sync_data_abort end
Dec 30 14:19:35.205053 INFO  [CRM   ] crm switch snapshot end!
Dec 30 14:19:35.205066 NOTIC [CRM   ] [DEBUG]: crm_exec_exit_fn [LEAVE]
Dec 30 14:19:35.205078 NOTIC [MAIN  ] Cluster Consist Service Engine exiting with status 0 at main.c:2381.