存储类故障,包括数据文件丢失,文件读写报错等情况,GBase 8a会先设置一个dmlevent,尝试修复,如果因存储类故障无法修复,比如表不存在,元数据不可读等,则会自动升级到dmlstorageevent。本文通过gc_recover.log日志,分析下恢复流程。
目录导航
故障模拟
建表后,将node里的一个分片表删除,然后在集群load数据。
如下是2节点就请你(10.0.2.201和 10.0.2.202), 其中202上的计算节点t99_n1被删掉了。
集群层建表
gbase> create table t99(id int);
Query OK, 0 rows affected (Elapsed: 00:00:00.31)
gbase> insert into t99 values(1),(2);
Query OK, 2 rows affected (Elapsed: 00:00:00.18)
Records: 2 Duplicates: 0 Warnings: 0
删除gnode上的表
[gbase@rh6-2 gbase]$ gncli testdb
GBase client 8.6.2.43-R28 .125499. Copyright (c) 2004-2021, GBase. All Rights Reserved.
gbase> select * from t99_n1;
+------+
| id |
+------+
| 1 |
| 2 |
+------+
2 rows in set (Elapsed: 00:00:00.00)
gbase> drop table t99_n1;
Query OK, 0 rows affected (Elapsed: 00:00:00.00)
集群层加载数据
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.201/home/gbase/t4.txt' into table `t99`;
ERROR 1733 (HY000): (GBA-01EX-700) Gbase general error: Task 4718604 failed, [10.0.2.202:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: Table 'testdb.t99_n1' doesn't exist
SQL: LOAD /*+ TID('7602966') */ DATA INFILE 'sftp://gbase:*********@10.0.2.201/home/gbase/t4.txt#offset=0&length=4&firstblock&ffsize=4' INTO TABLE `testdb`.`t99_n1` DATA_FORMAT 3 FILE_FORMAT UNDEFINED HOST '::ffff:10.0.2.202' CURRENT_TIMESTAMP 1621214202 SCN_NUMBER 4718604 GCLUSTER_PORT 5258 INTO SERVER (HOST '::ffff:10.0.2.201, ::ffff:10.0.2.202',
日志分析
日志为调度节点的gc_recovery.log,如下是内容,分成不同的段落分析。
-- 收到了任务
2021-05-17 08:53:41.991 [INFO ] <RECOVER-INFO>: MasterAssignTask ddltid num 1.
2021-05-17 08:53:41.997 [INFO ] <RECOVER-INFO-1>: Start Recovering testdb.t-99 tid 523265
2021-05-17 08:53:41.997 [INFO ] <DATA-Recover-1>: Start dml recover testdb.t-99,tid 523265
-- 开始修复
2021-05-17 09:16:44.297 [INFO ] <RECOVER-INFO-1>: Start Recovering testdb.t99 tid 523271
2021-05-17 09:16:44.297 [INFO ] <DATA-Recover-1>: Start dml recover testdb.t99,tid 523271
-- dml的eventid是27
2021-05-17 09:16:44.297 [INFO ] <DATA-Recover-1>: Start to dml recover of eventid(27)
-- 确认有event,其中 3389128714 对应的是10.0.2.202
2021-05-17 09:16:44.312 [INFO ] <GCWare>: GetSourceNodeForDMl for nodeId(3389128714), the node is unValid
-- 尝试刷新表分片
2021-05-17 09:16:44.321 [INFO ] <DATA-Recover-1>: DoNormSqlNoResult, do(flush rollback `testdb`.`t99_n1`)
-- 刷新报错,原因是表不存在
2021-05-17 09:16:44.322 [INFO ] <DATA-Recover-1>: DoNormSqlNoResult, do(flush rollback `testdb`.`t99_n1`), error((GBA-01EX-700) Gbase general error: table doesn't exist)
-- 确认修复任务
2021-05-17 09:16:44.323 [INFO ] <DATA-Recover-1>: recoverinfo gn dml: eventid:27 DB(testdb), TABLE(t99), SLICE(1), TID(523271), src:0x3372351498(10.0.2.201) dst:0x3389128714(10.0.2.202)
-- 加锁
2021-05-17 09:16:44.323 [INFO ] <CLockResource>: Lock: testdb start
2021-05-17 09:16:44.324 [INFO ] <CLockResource>: Lock: testdb end success
2021-05-17 09:16:44.324 [INFO ] <CLockResource>: Lock: testdb.t99.rsync start
2021-05-17 09:16:44.325 [INFO ] <CLockResource>: Lock: testdb.t99.rsync end success
2021-05-17 09:16:44.325 [INFO ] <CLockResource>: Lock: testdb.t99824A6746-7C28-4B39-B1E6-D6E088F858AA start
2021-05-17 09:16:44.326 [INFO ] <CLockResource>: Lock: testdb.t99824A6746-7C28-4B39-B1E6-D6E088F858AA end success
2021-05-17 09:16:44.326 [INFO ] <CLockResource>: Lock: testdb.t99D88707C8-BB5F-41D8-AAF9-55A76780953B start
2021-05-17 09:16:44.328 [INFO ] <CLockResource>: Lock: testdb.t99D88707C8-BB5F-41D8-AAF9-55A76780953B end success
-- 加上DDL锁,因为前面flush的返回是表不存在的故障
2021-05-17 09:16:44.328 [INFO ] <DATA-Recover-1>: DO_DATA_SYNC, Lock_for_ddl
2021-05-17 09:16:44.328 [INFO ] <DATA-Recover-1>: SYNC_SLICE, DISTRIBUTED TABLE: DB: 'testdb', Table: 't99', slice: 'n1'
2021-05-17 09:16:44.328 [INFO ] <DATA-Recover-1>: SYNC_SLICE, write dml storage event first, for prevent sync fail,ddl arrived
-- 升级成dmlstorageevent, 编号为 172
2021-05-17 09:16:44.336 [INFO ] <DATA-Recover-1>: data sync write dmlstorage eventid 172
-- 调用 gc_sync_client 尝试修复
2021-05-17 09:16:44.336 [INFO ] <DATA-Recover-1>: CALL_SYNC, Syncing table slice{DB: 'testdb', Table: 't99', Slice: 'n1'} from sample gnode: '::ffff:10.0.2.201'
2021-05-17 09:16:44.336 [INFO ] <DATA-Recover-1>: CALL_SYNC, Sync command line: 'start 'gc_sync_client' '::ffff:10.0.2.201 testdb t99_n1 ::ffff:10.0.2.201 5258 2 1''
-- 返回错误60
2021-05-17 09:16:44.376 [ERROR] <DATA-Recover-1>: start_sync_by_node, sync table{start 'gc_sync_client' '::ffff:10.0.2.201 testdb t99_n1 ::ffff:10.0.2.201 5258 2 1'} returned with error(1733): (GBA-01EX-700) Gbase general error: gc_sync_client returned error code 60.
2021-05-17 09:16:44.376 [INFO ] <DATA-Recover-1>: check drop sql thread quit.
2021-05-17 09:16:44.376 [INFO ] <DATA-Recover-1>: start_sync_by_node, sync table{start 'gc_sync_client' '::ffff:10.0.2.201 testdb t99_n1 ::ffff:10.0.2.201 5258 2 1'} returned with (60)
-- 尝试刷新
2021-05-17 09:16:44.376 [INFO ] <DATA-Recover-1>: REFRESH_TABLE, Refreshing local gnode, DB: 'testdb', Table: 't99', slice: 'n1'
2021-05-17 09:16:44.376 [INFO ] <DATA-Recover-1>: REFRESH_TABLE, Refresh query: 'refresh table `testdb`.`t99_n1`;'
-- 返回错误,表不存在。 再尝试3次。
2021-05-17 09:16:44.376 [ERROR] <DATA-Recover-1>: REFRESH_TABLE, Refresh table{refresh table `testdb`.`t99_n1`;} returned with error(1733): (GBA-01EX-700) Gbase general error: refresh table not exist, or not EXPRESS table.,will retry three times
2021-05-17 09:16:44.377 [ERROR] <DATA-Recover-1>: REFRESH_TABLE, Refresh table{refresh table `testdb`.`t99_n1`;} returned with error(1733): (GBA-01EX-700) Gbase general error: refresh table not exist, or not EXPRESS table.,retry 0
2021-05-17 09:16:44.377 [ERROR] <DATA-Recover-1>: REFRESH_TABLE, Refresh table{refresh table `testdb`.`t99_n1`;} returned with error(1733): (GBA-01EX-700) Gbase general error: refresh table not exist, or not EXPRESS table.,retry 1
2021-05-17 09:16:44.377 [ERROR] <DATA-Recover-1>: REFRESH_TABLE, Refresh table{refresh table `testdb`.`t99_n1`;} returned with error(1733): (GBA-01EX-700) Gbase general error: refresh table not exist, or not EXPRESS table.,retry 2
-- 查找tableid, 523271.
2021-05-17 09:16:44.399 [ERROR] <DATA-Recover-1>: Find the tid 523271 from source gcluster node
2021-05-17 09:16:44.399 [ERROR] <DATA-Recover-1>: Find the tid by source gclusternode(inGcluster--523271:inEvent --523271), and tid exist in source gcluster
-- show create table 找到了可用的表结构,任务转成dmlstorageevent修复
2021-05-17 09:16:44.400 [ERROR] <DATA-Recover-1>: Find ER_NO_SUCH_TABLE by show create table, convert to dmlstorage event.
2021-05-17 09:16:44.400 [INFO ] <DATA-Recover-1>: Sync cl
2021-05-17 09:16:44.400 [INFO ] <DATA-Recover-1>: CALL_SYNC, Sync end, return 4
2021-05-17 09:16:44.400 [ERROR] <DATA-Recover-1>: SYNC_SLICE, detect local node ::ffff:10.0.2.202 metadata broken, record dmlstorage event 172 for it
2021-05-17 09:16:44.406 [INFO ] <DATA-Recover-1>: SET_ONLINE, Succeeded in setting local node table 'testdb.t99_n1' online, eventinfo(eventid:27,segId:1)
-- 释放锁
2021-05-17 09:16:44.407 [INFO ] <CLockResource>: UnLock: testdb unlock success
2021-05-17 09:16:44.407 [INFO ] <CLockResource>: UnLock: testdb.t99.rsync unlock success
2021-05-17 09:16:44.408 [INFO ] <CLockResource>: UnLock: testdb.t99824A6746-7C28-4B39-B1E6-D6E088F858AA unlock success
2021-05-17 09:16:44.409 [INFO ] <CLockResource>: UnLock: testdb.t99D88707C8-BB5F-41D8-AAF9-55A76780953B unlock success
-- 本次任务完成,结果为失败。
2021-05-17 09:16:44.409 [ERROR] <DATA-Recover-1>: DO_DATA_SYNC, return{6} error!
2021-05-17 09:16:44.409 [ERROR] <DATA-Recover-1>: Find do_data_sync return FAIL!
2021-05-17 09:16:44.409 [INFO ] <DATA-Recover-1>: End dml recover testdb.t99,tid 523271
2021-05-17 09:16:44.409 [INFO ] <RECOVER-INFO-1>: Finishing Recovering testdb.t99,tid 523271
-- 开始新任务
2021-05-17 09:16:44.500 [INFO ] <RECOVER-INFO>: MasterAssignTask dmlstoragetid num 1.
2021-05-17 09:16:44.603 [INFO ] <RECOVER-INFO>: MasterAssignTask dmlstoragetid num 1.
-- 开始恢复 dmlstorageevent
2021-05-17 09:16:44.606 [INFO ] <RECOVER-INFO-1>: Start Recovering . tid 523271
2021-05-17 09:16:44.606 [INFO ] <STORAGE-Recover-1>: Start DMLStorge recover .,tid 523271 eventnum 1
2021-05-17 09:16:44.606 [INFO ] <STORAGE-Recover-1>: Start to DMLStorge recover of eventid(172)
2021-05-17 09:16:44.607 [INFO ] <STORAGE-Recover-1>: recoverinfo gn(dmls): eventid=172, tablename=testdb.t99, slice:n1 src:0x3372351498(10.0.2.201) dst:0x3389128714(10.0.2.202)
2021-05-17 09:16:44.607 [INFO ] <STORAGE-Recover-1>: Revovering ..., for dmlstorage, forgcluster=false, dbname=testdb, tablename=t99, slice=n1, tid=523271, eventid=172
-- 拿到锁
2021-05-17 09:16:44.638 [INFO ] <CLockResource>: Lock: testdb start
2021-05-17 09:16:44.641 [INFO ] <CLockResource>: Lock: testdb end success
2021-05-17 09:16:44.641 [INFO ] <CLockResource>: Lock: testdb.t99.rsync start
2021-05-17 09:16:44.644 [INFO ] <CLockResource>: Lock: testdb.t99.rsync end success
2021-05-17 09:16:44.644 [INFO ] <CLockResource>: Lock: testdb.t99824A6746-7C28-4B39-B1E6-D6E088F858AA start
2021-05-17 09:16:44.648 [INFO ] <CLockResource>: Lock: testdb.t99824A6746-7C28-4B39-B1E6-D6E088F858AA end success
2021-05-17 09:16:44.648 [INFO ] <CLockResource>: Lock: testdb.t99D88707C8-BB5F-41D8-AAF9-55A76780953B start
2021-05-17 09:16:44.651 [INFO ] <CLockResource>: Lock: testdb.t99D88707C8-BB5F-41D8-AAF9-55A76780953B end success
-- 因为元数据损坏,表备份并删除成功。
-- 其中备份是指如果是部分损坏,会将数据临时mv到/userdata目录下,而不是直接删除。修复成功后会自动删除。
2021-05-17 09:16:44.654 [INFO ] <STORAGE-Recover-1>: BackUpTable, meta broken table(t99_n1:523271) has been deleted and backuped success
2021-05-17 09:16:44.709 [INFO ] <RECOVER-INFO>: MasterAssignTask dmlstoragetid num 1.
-- 创建表结构
2021-05-17 09:16:44.712 [INFO ] <STORAGE-Recover-1>: CreateFailNodeField, do "CREATE TABLE /*+ distribution(2)*/ testdb.t99_n1(
"id" int(11) DEFAULT NULL
) ENGINE=EXPRESS TID(523272) UID(1) DEFAULT CHARSET=utf8 TABLESPACE='sys_tablespace' COLUMN_IDS(0) TID(523272)" success
2021-05-17 09:16:44.712 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGroup, do CreateFailNodeField success
-- 调用gc_sync_client修复数据
2021-05-17 09:16:44.712 [INFO ] <STORAGE-Recover-1>: DO_DATA_SYNC, Lock_for_ddl
2021-05-17 09:16:44.712 [INFO ] <STORAGE-Recover-1>: SYNC_SLICE, DISTRIBUTED TABLE: DB: 'testdb', Table: 't99', slice: 'n1'
2021-05-17 09:16:44.712 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Syncing table slice{DB: 'testdb', Table: 't99', Slice: 'n1'} from sample gnode: '::ffff:10.0.2.201'
2021-05-17 09:16:44.712 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync command line: 'start 'gc_sync_client' '::ffff:10.0.2.201 testdb t99_n1 ::ffff:10.0.2.201 5258 3 1''
2021-05-17 09:16:44.816 [INFO ] <RECOVER-INFO>: MasterAssignTask dmlstoragetid num 1.
2021-05-17 09:16:44.898 [INFO ] <STORAGE-Recover-1>: check drop sql thread quit.
2021-05-17 09:16:44.898 [INFO ] <STORAGE-Recover-1>: start_sync_by_node, sync table{start 'gc_sync_client' '::ffff:10.0.2.201 testdb t99_n1 ::ffff:10.0.2.201 5258 3 1'} returned with (0)
-- 运行成功,刷新数据
2021-05-17 09:16:44.898 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync client executed successfully
2021-05-17 09:16:44.898 [INFO ] <STORAGE-Recover-1>: REFRESH_TABLE, Refreshing local gnode, DB: 'testdb', Table: 't99', slice: 'n1'
2021-05-17 09:16:44.898 [INFO ] <STORAGE-Recover-1>: REFRESH_TABLE, Refresh query: 'refresh table `testdb`.`t99_n1`;'
2021-05-17 09:16:44.899 [INFO ] <STORAGE-Recover-1>: Sync cl
-- 数据同步成功
2021-05-17 09:16:44.899 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync end, return 0
-- 检查数据一致性
2021-05-17 09:16:44.899 [INFO ] <CLockResource>: Lock: testdb.t99.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F start
2021-05-17 09:16:44.900 [INFO ] <CLockResource>: Lock: testdb.t99.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F end success
2021-05-17 09:16:44.900 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT,Checking whether SYNC is succeeded{DB: 'testdb', Table: 't99', slice: 'n1'}
2021-05-17 09:16:44.902 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT, SCN of table slice of local GNode and sample GNode(number of working nodes: 2)
2021-05-17 09:16:44.902 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT, Local GNode: 4718603
2021-05-17 09:16:44.902 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT, Sample GNode[1]: 4718603
-- 释放锁
2021-05-17 09:16:44.902 [INFO ] <CLockResource>: UnLock: testdb.t99.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F unlock success
2021-05-17 09:16:44.903 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGroup, do CallSync success
2021-05-17 09:16:44.903 [INFO ] <CLockResource>: UnLock: testdb unlock success
2021-05-17 09:16:44.904 [INFO ] <CLockResource>: UnLock: testdb.t99.rsync unlock success
2021-05-17 09:16:44.904 [INFO ] <CLockResource>: UnLock: testdb.t99824A6746-7C28-4B39-B1E6-D6E088F858AA unlock success
2021-05-17 09:16:44.905 [INFO ] <CLockResource>: UnLock: testdb.t99D88707C8-BB5F-41D8-AAF9-55A76780953B unlock success
2021-05-17 09:16:44.908 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGeoup success!
-- 删除event
2021-05-17 09:16:44.908 [INFO ] <STORAGE-Recover-1>: delete DMLStorge Redo Log start, eventId=172.............
2021-05-17 09:16:44.917 [INFO ] <STORAGE-Recover-1>: delete DMLStorge Redo Log end, eventId=172.............
2021-05-17 09:16:44.917 [INFO ] <RECOVER-INFO-1>: Finishing Recovering .,tid 523271
部分metadata文件故障的恢复日志
复现方案
将metadata下GED目录下的部分文件删掉
-rw------- 1 gbase gbase 8558 May 17 09:16 t99_n2.frm
drwx------ 2 gbase gbase 4096 May 17 09:16 t99_n2.GED
drwx------ 2 gbase gbase 4096 May 17 09:16 t99_n1.GED
-rw------- 1 gbase gbase 8558 May 17 09:16 t99_n1.frm
[gbase@rh6-2 metadata]$ cd t99_n1.GED
[gbase@rh6-2 t99_n1.GED]$ ll
total 28
-rw------- 1 gbase gbase 129 May 17 09:16 C00000.ctl.A
-rw------- 1 gbase gbase 129 May 17 09:16 C00000.ctl.B
-rw------- 1 gbase gbase 106 May 17 09:16 C00000.map
-rw------- 1 gbase gbase 233 May 17 09:16 table.des.A
-rw------- 1 gbase gbase 233 May 17 09:16 table.des.B
-rw------- 1 gbase gbase 52 May 17 09:16 table.state.A
-rw------- 1 gbase gbase 52 May 17 09:16 table.state.B
[gbase@rh6-2 t99_n1.GED]$ rm C00000.*
[gbase@rh6-2 t99_n1.GED]$ ll
total 16
-rw------- 1 gbase gbase 233 May 17 09:16 table.des.A
-rw------- 1 gbase gbase 233 May 17 09:16 table.des.B
-rw------- 1 gbase gbase 52 May 17 09:16 table.state.A
-rw------- 1 gbase gbase 52 May 17 09:16 table.state.B
[gbase@rh6-2 t99_n1.GED]$
日志
2021-05-17 10:09:15.075 [INFO ] <RECOVER-INFO>: MasterAssignTask dmlstoragetid num 1.
2021-05-17 10:09:15.079 [INFO ] <RECOVER-INFO-1>: Start Recovering testdb.t99 tid 523271
2021-05-17 10:09:15.079 [INFO ] <DATA-Recover-1>: Start dml recover testdb.t99,tid 523271
2021-05-17 10:09:15.079 [INFO ] <DATA-Recover-1>: Start to dml recover of eventid(28)
2021-05-17 10:09:15.092 [INFO ] <GCWare>: GetSourceNodeForDMl for nodeId(3389128714), the node is unValid
2021-05-17 10:09:15.095 [INFO ] <DATA-Recover-1>: DoNormSqlNoResult, do(flush rollback `testdb`.`t99_n1`)
2021-05-17 10:09:15.098 [INFO ] <DATA-Recover-1>: recoverinfo gn dml: eventid:28 DB(testdb), TABLE(t99), SLICE(1), TID(523271), src:0x3372351498(10.0.2.201) dst:0x3389128714(10.0.2.202)
2021-05-17 10:09:15.098 [INFO ] <CLockResource>: Lock: testdb start
2021-05-17 10:09:15.101 [INFO ] <CLockResource>: Lock: testdb end success
2021-05-17 10:09:15.101 [INFO ] <CLockResource>: Lock: testdb.t99.rsync start
2021-05-17 10:09:15.103 [INFO ] <CLockResource>: Lock: testdb.t99.rsync end success
2021-05-17 10:09:15.103 [INFO ] <CLockResource>: Lock: testdb.t99824A6746-7C28-4B39-B1E6-D6E088F858AA start
2021-05-17 10:09:15.104 [INFO ] <CLockResource>: Lock: testdb.t99824A6746-7C28-4B39-B1E6-D6E088F858AA end success
2021-05-17 10:09:15.105 [INFO ] <CLockResource>: Lock: testdb.t99D88707C8-BB5F-41D8-AAF9-55A76780953B start
2021-05-17 10:09:15.106 [INFO ] <CLockResource>: Lock: testdb.t99D88707C8-BB5F-41D8-AAF9-55A76780953B end success
2021-05-17 10:09:15.106 [INFO ] <DATA-Recover-1>: DO_DATA_SYNC, Lock_for_ddl
2021-05-17 10:09:15.106 [INFO ] <DATA-Recover-1>: SYNC_SLICE, DISTRIBUTED TABLE: DB: 'testdb', Table: 't99', slice: 'n1'
2021-05-17 10:09:15.106 [INFO ] <GCWare>: testdb.t99 nodeid: 3389128714, segid: 1, tableid 523271 have dmlstorageevent,total num 1,lasteventid: 173
2021-05-17 10:09:15.106 [INFO ] <DATA-Recover-1>: SYNC_SLICE, Find it is a meta broken table testdb.t99,and the event will be given up,event id 28
2021-05-17 10:09:15.117 [INFO ] <DATA-Recover-1>: SET_ONLINE, Succeeded in setting local node table 'testdb.t99_n1' online, eventinfo(eventid:28,segId:1)
2021-05-17 10:09:15.117 [INFO ] <CLockResource>: UnLock: testdb unlock success
2021-05-17 10:09:15.118 [INFO ] <CLockResource>: UnLock: testdb.t99.rsync unlock success
2021-05-17 10:09:15.119 [INFO ] <CLockResource>: UnLock: testdb.t99824A6746-7C28-4B39-B1E6-D6E088F858AA unlock success
2021-05-17 10:09:15.121 [INFO ] <CLockResource>: UnLock: testdb.t99D88707C8-BB5F-41D8-AAF9-55A76780953B unlock success
2021-05-17 10:09:15.121 [INFO ] <DATA-Recover-1>: End dml recover testdb.t99,tid 523271
2021-05-17 10:09:15.121 [INFO ] <STORAGE-Recover-1>: Start DMLStorge recover testdb.t99,tid 523271 eventnum 1
2021-05-17 10:09:15.122 [INFO ] <STORAGE-Recover-1>: Start to DMLStorge recover of eventid(173)
2021-05-17 10:09:15.122 [INFO ] <STORAGE-Recover-1>: recoverinfo gn(dmls): eventid=173, tablename=testdb.t99, slice:n1 src:0x3372351498(10.0.2.201) dst:0x3389128714(10.0.2.202)
2021-05-17 10:09:15.122 [INFO ] <STORAGE-Recover-1>: Revovering ..., for dmlstorage, forgcluster=false, dbname=testdb, tablename=t99, slice=n1, tid=523271, eventid=173
2021-05-17 10:09:15.131 [INFO ] <CLockResource>: Lock: testdb start
2021-05-17 10:09:15.132 [INFO ] <CLockResource>: Lock: testdb end success
2021-05-17 10:09:15.132 [INFO ] <CLockResource>: Lock: testdb.t99.rsync start
2021-05-17 10:09:15.133 [INFO ] <CLockResource>: Lock: testdb.t99.rsync end success
2021-05-17 10:09:15.133 [INFO ] <CLockResource>: Lock: testdb.t99824A6746-7C28-4B39-B1E6-D6E088F858AA start
2021-05-17 10:09:15.134 [INFO ] <CLockResource>: Lock: testdb.t99824A6746-7C28-4B39-B1E6-D6E088F858AA end success
2021-05-17 10:09:15.134 [INFO ] <CLockResource>: Lock: testdb.t99D88707C8-BB5F-41D8-AAF9-55A76780953B start
2021-05-17 10:09:15.136 [INFO ] <CLockResource>: Lock: testdb.t99D88707C8-BB5F-41D8-AAF9-55A76780953B end success
2021-05-17 10:09:15.140 [INFO ] <STORAGE-Recover-1>: BackUpTable, meta broken table(t99_n1:523271) has been deleted and backuped success
2021-05-17 10:09:15.182 [INFO ] <RECOVER-INFO>: MasterAssignTask dmlstoragetid num 1.
2021-05-17 10:09:15.229 [INFO ] <STORAGE-Recover-1>: CreateFailNodeField, do "CREATE TABLE /*+ distribution(2)*/ testdb.t99_n1(
"id" int(11) DEFAULT NULL
) ENGINE=EXPRESS TID(523272) UID(1) DEFAULT CHARSET=utf8 TABLESPACE='sys_tablespace' COLUMN_IDS(0) TID(523272)" success
2021-05-17 10:09:15.229 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGroup, do CreateFailNodeField success
2021-05-17 10:09:15.229 [INFO ] <STORAGE-Recover-1>: DO_DATA_SYNC, Lock_for_ddl
2021-05-17 10:09:15.229 [INFO ] <STORAGE-Recover-1>: SYNC_SLICE, DISTRIBUTED TABLE: DB: 'testdb', Table: 't99', slice: 'n1'
2021-05-17 10:09:15.229 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Syncing table slice{DB: 'testdb', Table: 't99', Slice: 'n1'} from sample gnode: '::ffff:10.0.2.201'
2021-05-17 10:09:15.229 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync command line: 'start 'gc_sync_client' '::ffff:10.0.2.201 testdb t99_n1 ::ffff:10.0.2.201 5258 3 1''
2021-05-17 10:09:15.283 [INFO ] <RECOVER-INFO>: MasterAssignTask dmlstoragetid num 1.
2021-05-17 10:09:15.384 [INFO ] <RECOVER-INFO>: MasterAssignTask dmlstoragetid num 1.
2021-05-17 10:09:15.399 [INFO ] <STORAGE-Recover-1>: check drop sql thread quit.
2021-05-17 10:09:15.400 [INFO ] <STORAGE-Recover-1>: start_sync_by_node, sync table{start 'gc_sync_client' '::ffff:10.0.2.201 testdb t99_n1 ::ffff:10.0.2.201 5258 3 1'} returned with (0)
2021-05-17 10:09:15.400 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync client executed successfully
2021-05-17 10:09:15.400 [INFO ] <STORAGE-Recover-1>: REFRESH_TABLE, Refreshing local gnode, DB: 'testdb', Table: 't99', slice: 'n1'
2021-05-17 10:09:15.400 [INFO ] <STORAGE-Recover-1>: REFRESH_TABLE, Refresh query: 'refresh table `testdb`.`t99_n1`;'
2021-05-17 10:09:15.400 [INFO ] <STORAGE-Recover-1>: Sync cl
2021-05-17 10:09:15.400 [INFO ] <STORAGE-Recover-1>: CALL_SYNC, Sync end, return 0
2021-05-17 10:09:15.400 [INFO ] <CLockResource>: Lock: testdb.t99.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F start
2021-05-17 10:09:15.401 [INFO ] <CLockResource>: Lock: testdb.t99.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F end success
2021-05-17 10:09:15.401 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT,Checking whether SYNC is succeeded{DB: 'testdb', Table: 't99', slice: 'n1'}
2021-05-17 10:09:15.403 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT, SCN of table slice of local GNode and sample GNode(number of working nodes: 2)
2021-05-17 10:09:15.403 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT, Local GNode: 4718603
2021-05-17 10:09:15.403 [INFO ] <STORAGE-Recover-1>: CHECK_SYNC_RESULT, Sample GNode[1]: 4718603
2021-05-17 10:09:15.403 [INFO ] <CLockResource>: UnLock: testdb.t99.09B5BEEC-1EF7-4FA6-9850-C4217A781E0F unlock success
2021-05-17 10:09:15.404 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGroup, do CallSync success
2021-05-17 10:09:15.404 [INFO ] <CLockResource>: UnLock: testdb unlock success
2021-05-17 10:09:15.405 [INFO ] <CLockResource>: UnLock: testdb.t99.rsync unlock success
2021-05-17 10:09:15.406 [INFO ] <CLockResource>: UnLock: testdb.t99824A6746-7C28-4B39-B1E6-D6E088F858AA unlock success
2021-05-17 10:09:15.406 [INFO ] <CLockResource>: UnLock: testdb.t99D88707C8-BB5F-41D8-AAF9-55A76780953B unlock success
2021-05-17 10:09:15.409 [INFO ] <STORAGE-Recover-1>: CreateTableInfoFromSafeGeoup success!
2021-05-17 10:09:15.409 [INFO ] <STORAGE-Recover-1>: delete DMLStorge Redo Log start, eventId=173.............
2021-05-17 10:09:15.416 [INFO ] <STORAGE-Recover-1>: delete DMLStorge Redo Log end, eventId=173.............
2021-05-17 10:09:15.416 [INFO ] <RECOVER-INFO-1>: Finishing Recovering testdb.t99,tid 523271