本文介绍GBase 8a V9.5版本的gcware服务启动和停止的日志,其中有些标志表示服务可用或已经停止。本文也提供了一种手段,循环检测gcadmin命令,直到有节点可以对外提供服务了才返回。
目录导航
启动
开始标记
Cluster Consist Service Engine 9.5.121227: started and ready to provide service.
可提供服务标记
statemachine sync with leader change from 0 to 1
完整启动日志
Dec 30 14:26:37.975666 WARN [MAIN ] Could not set SCHED_RR at priority 99: Operation not permitted (1)
Dec 30 14:26:37.975750 WARN [MAIN ] Could not lock memory of service to avoid page faults: Cannot allocate memory (12)
Dec 30 14:26:37.975754 NOTIC [MAIN ] Cluster Consist Service Engine 9.5.121227: started and ready to provide service.
Dec 30 14:26:37.975759 INFO [MAIN ] Corosync built-in features:
Dec 30 14:26:37.975762 NOTIC [MAIN ] Successfully configured gcw services to load
Dec 30 14:26:37.975766 NOTIC [MAIN ] Successfully read main configuration file '/opt/gbase/gcware/config/gcware.conf'.
Dec 30 14:26:37.975769 WARN [GCWARE] Warning: the to_file config paramater has been obsoleted. See corosync.conf man page to_logfile directive.
Dec 30 14:26:38.173917 INFO [MAIN ] file check succeed.
Dec 30 14:26:38.173939 NOTIC [MAIN ] GBCCS parameters:
Dec 30 14:26:38.173943 NOTIC [MAIN ] leader heart beat:200
Dec 30 14:26:38.173946 NOTIC [MAIN ] election timeout:2000
Dec 30 14:26:38.173949 NOTIC [MAIN ] session timeout:40000
Dec 30 14:26:38.173952 NOTIC [MAIN ] client port:5919
Dec 30 14:26:38.173955 NOTIC [MAIN ] server port:5918
Dec 30 14:26:38.173958 NOTIC [MAIN ] max message size:1048576
Dec 30 14:26:38.173961 NOTIC [MAIN ] max redolog size:536870912
Dec 30 14:26:38.173965 NOTIC [MAIN ] data dir:/opt/gbase/gcware/data/gcware
Dec 30 14:26:38.173968 NOTIC [MAIN ] redo log dir:/opt/gbase/gcware/data/gcware
Dec 30 14:26:38.173971 NOTIC [MAIN ] local nodeid:1
Dec 30 14:26:38.173974 NOTIC [MAIN ] bind addr:gbase_rh7_001
Dec 30 14:26:38.173977 INFO [MAIN ] add config member gbase_rh7_001 with groupid 0 nodeid 1
Dec 30 14:26:38.173980 NOTIC [LCK ] [DEBUG]: lck_exec_init_fn
Dec 30 14:26:38.173983 INFO [GCWARE] begin init raft root instance
Dec 30 14:26:38.173986 INFO [GCWARE] init raft root instance successfully
Dec 30 14:26:38.173989 INFO [LCK ] [INFO]: init_lck_default_info local_nodeid = 1
Dec 30 14:26:39.183445 INFO [LCK ] [DEBUG]: lck_exec_init_fn [LEAVE]
Dec 30 14:26:39.183467 NOTIC [SERV ] Service engine loaded: gcw distributed locking service A.01.01
Dec 30 14:26:39.183473 NOTIC [CRM ] [DEBUG]: crm_exec_init_fn [ENTER]
Dec 30 14:26:39.183477 INFO [CRM ] rlim_cur: 65535 rlim_max: 65535
Dec 30 14:26:39.183481 INFO [GCWARE] begin init raft root instance
Dec 30 14:26:39.183485 INFO [GCWARE] init raft root instance successfully
Dec 30 14:26:39.183489 INFO [CRM ] Reading configuration
Dec 30 14:26:39.183492 INFO [CRM ] Reading coordinator
Dec 30 14:26:39.183496 INFO [CRM ] Reading member
Dec 30 14:26:39.183500 INFO [CRM ] Reading memberaddr
Dec 30 14:26:39.183504 INFO [CRM ] ipaddr = gbase_rh7_001
Dec 30 14:26:39.183508 INFO [CRM ] nodeName = coordinator1
Dec 30 14:26:39.183512 INFO [CRM ] Reading nodeid
Dec 30 14:26:39.183516 INFO [CRM ] nodeid = 1
Dec 30 14:26:39.183520 NOTIC [CRM ] [DEBUG]: crm_exec_init_fn [LEAVE]
Dec 30 14:26:39.183524 NOTIC [SERV ] Service engine loaded: gcw crm service 1.0.0
Dec 30 14:26:39.183532 NOTIC [CLM ] [DEBUG]: clm_exec_init_fn [ENTER]
Dec 30 14:26:39.183536 INFO [CLM ] Reading configuration
Dec 30 14:26:39.183539 INFO [CLM ] check_interval = 30
Dec 30 14:26:39.183542 INFO [CLM ] cfg_whole_check_interval_num = 20
Dec 30 14:26:39.183545 INFO [CLM ] cfg_check_tcp_only = 1
Dec 30 14:26:39.183548 INFO [CLM ] cfg_check_times_judge_failure = 3
Dec 30 14:26:39.183551 INFO [CLM ] cfg_connect_timeout = 5000
Dec 30 14:26:39.183554 INFO [CLM ] gcluster_port = 5258
Dec 30 14:26:39.183557 INFO [CLM ] gnode_port = 5050
Dec 30 14:26:39.183560 INFO [CLM ] syncserver_port = 5288
Dec 30 14:26:39.183563 INFO [CLM ] node_ssh_port = 22
Dec 30 14:26:39.183571 INFO [CLM ] check_coordinator_thread_num = 1
Dec 30 14:26:39.183575 INFO [CLM ] check_dataserver_thread_num = 10
Dec 30 14:26:39.183578 INFO [GCWARE] begin init raft root instance
Dec 30 14:26:39.183581 INFO [GCWARE] init raft root instance successfully
Dec 30 14:26:39.183584 INFO [CLM ] [INFO]: local_nodeid = 1
Dec 30 14:26:40.186270 INFO [CLM ] [DEBUG]: clm_exec_init_fn [LEAVE]
Dec 30 14:26:40.186288 NOTIC [SERV ] Service engine loaded: gcw cluster membership service A.01.01
Dec 30 14:26:40.186334 NOTIC [SERV ] Service engine loaded: corosync extended virtual synchrony service
Dec 30 14:26:40.186344 NOTIC [SERV ] Service engine loaded: corosync configuration service
Dec 30 14:26:40.186347 NOTIC [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Dec 30 14:26:40.186351 NOTIC [SERV ] Service engine loaded: corosync cluster config database access v1.01
Dec 30 14:26:40.186354 NOTIC [SERV ] Service engine loaded: corosync profile loading service
Dec 30 14:26:40.186357 NOTIC [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Dec 30 14:26:40.186360 NOTIC [GCWARE] begin load snapshot index:886 term:69
Dec 30 14:26:40.186363 INFO [GCWARE] session manager load session begin
Dec 30 14:26:40.186366 INFO [GCWARE] session manager load session finish
Dec 30 14:26:40.186369 INFO [LCK ] load lck snapshort begin
Dec 30 14:26:40.186372 INFO [LCK ] [INFO]: init_lck_default_info local_nodeid = 1
Dec 30 14:26:41.197905 INFO [LCK ] load all lock from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine begin
Dec 30 14:26:41.197931 INFO [LCK ] load all lock from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/QUEUE_RESOURCE end
Dec 30 14:26:41.197935 INFO [LCK ] load commit list from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine begin
Dec 30 14:26:41.197939 INFO [LCK ] load commit list from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/FAILOVER end
Dec 30 14:26:41.197942 INFO [LCK ] load recover regist from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine begin
Dec 30 14:26:41.197946 INFO [LCK ] load recover regist from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/RECOVER_REGIST end
Dec 30 14:26:41.197949 INFO [LCK ] load common regist list from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine begin
Dec 30 14:26:41.197952 INFO [LCK ] load common regist list from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/COMMON_REGIST end
Dec 30 14:26:41.197955 INFO [LCK ] load commit takeover list from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine begin
Dec 30 14:26:41.197958 INFO [LCK ] load commit takeover list from file /opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/TAKEOVER_REGIST end
Dec 30 14:26:41.197961 INFO [LCK ] load lck snapshort end
Dec 30 14:26:41.197965 INFO [CRM ] crm load snapshot begin!
Dec 30 14:26:41.197968 INFO [CRM ] begin load public info snapshot path:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine
Dec 30 14:26:41.197971 INFO [CRM ] scn file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/SCN
Dec 30 14:26:41.197974 INFO [CRM ] tableid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/TABLEID
Dec 30 14:26:41.197977 INFO [CRM ] gcware_mode file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/GCWARE_MODE
Dec 30 14:26:41.197981 INFO [CRM ] taskid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/TASKID
Dec 30 14:26:41.197984 INFO [CRM ] vcid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/VCID
Dec 30 14:26:41.197987 INFO [CRM ] vcmirrorid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/VCMIRRORID
Dec 30 14:26:41.197990 INFO [CRM ] nodeid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/NODEID
Dec 30 14:26:41.197993 INFO [CRM ] commitid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/COMMITID
Dec 30 14:26:41.198002 INFO [CRM ] distributionid file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/DISTRIBUTIONID
Dec 30 14:26:41.198005 INFO [CRM ] freedataservernodes file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/FREEDATASERVER
Dec 30 14:26:41.198008 INFO [CRM ] freedataclusterstates file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/DATACLUSTERSTATE
Dec 30 14:26:41.198011 INFO [CRM ] coordinatorclusterstate file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/COORDINATORSTATE
Dec 30 14:26:41.198015 INFO [CRM ] feventlog file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/FEVENTLOG
Dec 30 14:26:41.198018 INFO [CRM ] incomplete ddlfevent file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/INCOMPLETE_DDLFEVENT
Dec 30 14:26:41.198021 INFO [CRM ] incomplete dmlfevent file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/INCOMPLETE_DMLFEVENT
Dec 30 14:26:41.198024 INFO [CRM ] vcmirrorlist file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/VCMIRROR
Dec 30 14:26:41.198027 INFO [CRM ] distributiontovc file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/DISTRIBUTION2VC
Dec 30 14:26:41.198030 INFO [CRM ] rcvc file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/RCVC
Dec 30 14:26:41.198033 INFO [CRM ] KV store file name:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine/KVSTORE
Dec 30 14:26:41.198036 INFO [CRM ] finish load public info snapshot path:/opt/gbase/gcware/data/gcware/SNAPSHOT.69.886/statemachine
Dec 30 14:26:41.198039 ERROR [CRM ] nodeid(1) is repeated! ip is gbase_rh7_001
Dec 30 14:26:41.198043 INFO [CRM ] crm load snapshot end success!
Dec 30 14:26:41.198046 INFO [GCWARE] session manager switch
Dec 30 14:26:41.198049 INFO [GCWARE] session manager clear begin
Dec 30 14:26:41.198052 INFO [GCWARE] session manager clear end
Dec 30 14:26:41.198055 INFO [GCWARE] session manager switch finish
Dec 30 14:26:41.198058 INFO [CRM ] crm switch snapshot begin!
Dec 30 14:26:41.198061 NOTIC [LCK ] [INFO]: lck_sync_data_abort
Dec 30 14:26:41.198064 NOTIC [LCK ] [INFO]: lck_sync_data_abort end
Dec 30 14:26:41.198067 INFO [CRM ] crm switch snapshot end!
Dec 30 14:26:41.198070 NOTIC [GCWARE] begin load CCS meta
Dec 30 14:26:41.198073 NOTIC [GCWARE] finish load raft meta with term: 86, voteFor:1
Dec 30 14:26:41.198076 INFO [GCWARE] open raft log file /opt/gbase/gcware/data/gcware/REDOLOG.84 for load success, fd:11
Dec 30 14:26:41.198080 INFO [GCWARE] close raft log /opt/gbase/gcware/data/gcware/REDOLOG.84, fd:11
Dec 30 14:26:41.198083 INFO [GCWARE] open raft log file /opt/gbase/gcware/data/gcware/REDOLOG.114 for load success, fd:11
Dec 30 14:26:41.198086 INFO [GCWARE] close raft log /opt/gbase/gcware/data/gcware/REDOLOG.114, fd:11
Dec 30 14:26:41.198089 INFO [GCWARE] open raft log file /opt/gbase/gcware/data/gcware/REDOLOG.888 for load success, fd:11
Dec 30 14:26:41.247864 INFO [GCWARE] close raft log /opt/gbase/gcware/data/gcware/REDOLOG.888, fd:11
Dec 30 14:26:41.247883 INFO [GCWARE] Showy: PN5GBCCS9TcpClientE::CreateEventWithNodeid start, ip gbase_rh7_001 port 5918 nodeid 1
Dec 30 14:26:41.247887 INFO [GCWARE] Showy: PN5GBCCS9TcpClientE::CreateEventWithNodeid ok, ip gbase_rh7_001 port 5918 nodeid 1
Dec 30 14:26:41.247890 INFO [GCWARE] node 1 switch to follower with term:86 leader:0
Dec 30 14:26:41.247893 NOTIC [GCWARE] client server begin provide service
Dec 30 14:26:41.258494 NOTIC [GCWARE] client server event loop start
Dec 30 14:26:41.258806 INFO [GCWARE] open raft log file /opt/gbase/gcware/data/gcware/REDOLOG.888 for append success, fd:22
Dec 30 14:26:41.259015 WARN [GCWARE] Imp: PN5GBCCS11TcpAcceptorE::run start accept...
Dec 30 14:26:41.259031 INFO [GCWARE] Showy: PN5GBCCS11TcpAcceptorE::run accepted!, fd 23 ip port 0
Dec 30 14:26:41.259275 NOTIC [GCWARE] client server acceptor begin
Dec 30 14:26:44.886762 INFO [GCWARE] node 1 switch to candidate
Dec 30 14:26:44.886921 INFO [GCWARE] vote req from 1, term:87, candidate:1, term:86,index:8849,mine term:86 index:8849, previous vote:0, current vote :1 grant:1
Dec 30 14:26:44.891330 INFO [GCWARE] finish save raft meta with term: 87, voteFor:0
Dec 30 14:26:44.896039 INFO [GCWARE] finish save raft meta with term: 87, voteFor:1
Dec 30 14:26:44.896394 INFO [GCWARE] dealVoteRpcRes node 1 grant me with term 87
Dec 30 14:26:44.896547 INFO [GCWARE] node 1 switch to leader with term:87
Dec 30 14:26:44.896555 NOTIC [GCWARE] broadcast node state 1 1
Dec 30 14:26:44.896558 INFO [GCWARE] statemachine sync with leader change from 0 to 1
Dec 30 14:26:44.911619 INFO [GCWARE] CCS destroy session 859
Dec 30 14:26:44.911645 INFO [LCK ] lck destroy session:859
Dec 30 14:26:44.911649 INFO [CRM ] CrmDestroySession,SessionID is:859!
Dec 30 14:26:44.911652 ERROR [GCWARE] server force to close session with session ID:845
Dec 30 14:26:44.911656 INFO [GCWARE] CCS destroy session 845
Dec 30 14:26:44.911659 INFO [LCK ] lck destroy session:845
Dec 30 14:26:44.911662 INFO [CRM ] CrmDestroySession,SessionID is:845!
Dec 30 14:26:44.911665 INFO [GCWARE] CCS destroy session 490
Dec 30 14:26:44.911668 INFO [LCK ] lck destroy session:490
Dec 30 14:26:44.911671 NOTIC [LCK ] during session destroy commit take over delete regist info nodeid: 1 session:490
如何判断gcware服务是否已经可服务
目前数据库版本的gcware服务启动,只是拉起了进程,并不判断进程是否可提供服务了。
[gbase@gbase_rh7_001 ~]$ gcluster_services all start
Starting gcware : [ OK ]
Starting gcluster : [ OK ]
Starting gcrecover : [ OK ]
Starting gbase : [ OK ]
Starting syncserver : [ OK ]
Starting GCMonit success!
[gbase@gbase_rh7_001 ~]$ g
而gcware服务时,是有一些数据要处理的,包括与其它正常节点的同步,处理event信息等。如果在没有处理完成前,访问将不能提供服务。
当前版本,建议用循环的方式,进行判断,比如:
[gbase@gbase_rh7_001 ~]$ cat checkGcware.sh
for((i=1;i<=10;i++));
do
gcadmin # > /dev/null 2>&1
rtn=$?
echo $rtn
if [ $rtn -eq 0 ];
then
break
fi
sleep 10;
done
当查询失败时,进行等待10秒,然后再次执行。 最多执行10次。 通过调整参数可以做到gcware服务正常了才返回的效果。
其中当服务进程不存在时的返回码是255
[gbase@gbase_rh7_001 ~]$ sh checkGcware.sh
[gcadmin] Could not initialize CRM instance error: [122]->[can not connect to any server]
255
[gcadmin] Could not initialize CRM instance error: [122]->[can not connect to any server]
255
^C
所以也可以将判断改成
#!/bin/bash
#循环次数,和后面的等待秒数决定了最长等待时间
for((i=1;i<=10;i++));
do
# timeout避免V952的超长时间不返回
timeout 30 gcadmin # > /dev/null 2>&1
rtn=$?
echo $rtn
# 判断返回值不等于0
# if [ $rtn -eq 0 ];
# 判断返回不是255, can not connect to any server
if [ $rtn -ne 255 ];
then
break
fi
#等待秒数
sleep 10;
done
停止
开始停止标记
Cluster Consist Service Engine begin stop with terminate command
停止完成标记
Cluster Consist Service Engine exiting with status 0 at main.c:2381.
完整停止日志
注意,本文是在没有任何连接时的日志,所以这部分日志很干净。
Dec 30 14:19:30.994880 NOTIC [MAIN ] Cluster Consist Service Engine begin stop with terminate command
Dec 30 14:19:31.137853 WARN [GCWARE] Imp: PN5GBCCS7UDPRecvE::run UDPRecv [END]
Dec 30 14:19:31.400630 WARN [GCWARE] Imp: PN5GBCCS7UDPSendE::run UDPSend [END]
Dec 30 14:19:31.638575 WARN [GCWARE] Imp: PN5GBCCS11TcpAcceptorE::run TCP.Acceptor [END]
Dec 30 14:19:32.532543 WARN [GCWARE] Imp: PN5GBCCS9EventLoopE::StartLoop TCP.EventLoop [END]
Dec 30 14:19:32.533104 WARN [GCWARE] Imp: PN5GBCCS11TcpIOThreadE::Stop TcpIOThread [END]
Dec 30 14:19:32.537528 WARN [GCWARE] Imp: PN5GBCCS6IOLoopE::run CCS thread [End]
Dec 30 14:19:33.487502 INFO [GCWARE] raft log checksum thread stop
Dec 30 14:19:33.487574 INFO [GCWARE] raft log thread stop
Dec 30 14:19:34.401835 CRIT [GCWARE] async worker begin stop
Dec 30 14:19:34.401920 CRIT [GCWARE] async worker stop
Dec 30 14:19:34.767074 INFO [GCWARE] client server acceptor end
Dec 30 14:19:35.203327 INFO [GCWARE] client server event loop end
Dec 30 14:19:35.204539 NOTIC [LCK ] [DEBUG]: lck_exec_exit_fn
Dec 30 14:19:35.204590 NOTIC [CRM ] [DEBUG]: crm_exec_exit_fn [ENTER]
Dec 30 14:19:35.204605 INFO [CRM ] crm switch snapshot begin!
Dec 30 14:19:35.204994 NOTIC [LCK ] [INFO]: lck_sync_data_abort
Dec 30 14:19:35.205035 NOTIC [LCK ] [INFO]: lck_sync_data_abort end
Dec 30 14:19:35.205053 INFO [CRM ] crm switch snapshot end!
Dec 30 14:19:35.205066 NOTIC [CRM ] [DEBUG]: crm_exec_exit_fn [LEAVE]
Dec 30 14:19:35.205078 NOTIC [MAIN ] Cluster Consist Service Engine exiting with status 0 at main.c:2381.