南大通用GBase 8a查看和强行释放SQL持有锁的方法

GBase 8a在执行时。为避免并发冲突保证一致性,会持有一些锁来保证自己需要的资源在执行期间不会出现问题。锁在SQL执行完毕后会自动释放掉。在某些特殊场景下,特别是一些老版本集群,出现需要强行释放掉锁的需求,比如SQL长时间无法结束,而该SQL持有的锁又导致其它的SQL无法正常运行,同时环境又不能重启节点服务时,可以考虑本文的方法强行释放SQL持有的锁。

对于DDL,DML类的SQL,一般要配合failover清理:GBase 8a查看和清理故障恢复状态Failover的方法

参考

GBase8a MPP Cluster查看集群锁 showlock
GBase 8a查看和清理故障恢复状态Failover的方法

构造锁的测试场景

如下通过lock table语句构造了一个持有锁的SQL场景,连接节点为201。

集群版本:8.6.2.43-R33.129391

[root@rh6-1 ~]# gcadmin
CLUSTER STATE:  ACTIVE
CLUSTER MODE:   NORMAL

=================================================================
|             GBASE COORDINATOR CLUSTER INFORMATION             |
=================================================================
|   NodeName   |     IpAddress     |gcware |gcluster |DataState |
-----------------------------------------------------------------
| coordinator1 |    10.0.2.201     | OPEN  |  OPEN   |    0     |
-----------------------------------------------------------------
=============================================================
|              GBASE DATA CLUSTER INFORMATION               |
=============================================================
|NodeName |     IpAddress     |gnode |syncserver |DataState |
-------------------------------------------------------------
|  node1  |    10.0.2.201     | OPEN |   OPEN    |    0     |
-------------------------------------------------------------
|  node2  |    10.0.2.202     | OPEN |   OPEN    |    0     |
-------------------------------------------------------------
gbase> lock table t1 write;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)

gbase> show processlist;
+-----+-----------------+-----------+--------+---------+-------+-----------------------------+------------------+
| Id  | User            | Host      | db     | Command | Time  | State                       | Info             |
+-----+-----------------+-----------+--------+---------+-------+-----------------------------+------------------+
|   1 | event_scheduler | localhost | NULL   | Daemon  | 42262 | Waiting for next activation | NULL             |
| 258 | root            | localhost | testdb | Query   |     0 | NULL                        | show processlist |
+-----+-----------------+-----------+--------+---------+-------+-----------------------------+------------------+
2 rows in set (Elapsed: 00:00:00.00)

查看SQL持有的锁

通过SQL查看锁

可以通过show detail processlist查看SQL当前持有的锁,需要的锁等信息。详情请参考

GBase8a 显示集群正在跑的SQL进程show [full | detail] processlist

可以看到ID=258的连接持有的3个锁,用分号分割:testdb;testdb.t1;testdb.t1580D5F90-B287-4199-B057-E6FBD44B5BFA

gbase> show detail processlist;
+-----+-------+-----------------+-----------+--------+---------+-------+-----------------------------+----------------------------------------------------------------+------+-------------------------+
| Id  | Tid   | User            | Host      | db     | Command | Time  | State                       | Lock                                                           | Wait | Info                    |
+-----+-------+-----------------+-----------+--------+---------+-------+-----------------------------+----------------------------------------------------------------+------+-------------------------+
|   1 | 18648 | event_scheduler | localhost | NULL   | Daemon  | 42525 | Waiting for next activation | NULL                                                           | NULL | NULL                    |
| 259 |  9985 | root            | localhost | testdb | Query   |     0 | NULL                        | testdb;testdb.t1;testdb.t1580D5F90-B287-4199-B057-E6FBD44B5BFA | NULL | show detail processlist |
+-----+-------+-----------------+-----------+--------+---------+-------+-----------------------------+----------------------------------------------------------------+------+-------------------------+
2 rows in set (Elapsed: 00:00:00.00)

通过gcadmin查看锁

通过命令行的gcadmin showlock查看锁,详情请参考

GBase8a MPP Cluster查看集群锁 showlock

[root@rh6-1 ~]# gcadmin showlock
 +===================================================================================================================+
 |                                                   GCLUSTER LOCK                                                   |
 +===================================================================================================================+
 +---------------------------------------------+----------+-------------------------------+--------------+------+----+
 |                  Lock name                  |  owner   |            content            | create time  |orphan|type|
 +---------------------------------------------+----------+-------------------------------+--------------+------+----+
 |                   testdb                    |10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985)|20220223090419|FALSE | S  |
 +---------------------------------------------+----------+-------------------------------+--------------+------+----+
 |testdb.t1580d5f90-b287-4199-b057-e6fbd44b5bfa|10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985)|20220223090419|FALSE | E  |
 +---------------------------------------------+----------+-------------------------------+--------------+------+----+
 |                  testdb.t1                  |10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985)|20220223090419|FALSE | S  |
 +---------------------------------------------+----------+-------------------------------+--------------+------+----+
 Total : 3
[root@rh6-1 ~]#

通过python查看锁

通过python接口,从gcware.getlocks拿到锁。前面gcadmin showlock的信息就是从这里获得然后格式化输出的。请关注其中的

name:锁名字
lockid:锁ID,这个在showlock里没有显示出来。

[gbase@rh6-1 ubas]$ python
Python 2.6.6 (r266:84292, Oct 12 2012, 14:23:48)
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gcware
>>> help('gcware')

>>> gcware.getlocks()
({'create time': '20220223090419', 'name': 'testdb', 'lockid': 1612718352306798592, 'orphan': 'FALSE', 'content': 'LOCK_Lock_Tables: 259(LWP:9985)', 'owner': '10.0.2.201', 'type': 'SHARE'}, {'create time': '20220223090419', 'name': 'testdb.t1580d5f90-b287-4199-b057-e6fbd44b5bfa', 'lockid': 8587815309107265537, 'orphan': 'FALSE', 'content': 'LOCK_Lock_Tables: 259(LWP:9985)', 'owner': '10.0.2.201', 'type': 'EXCLUSIVE'}, {'create time': '20220223090419', 'name': 'testdb.t1', 'lockid': 7728385150517510147, 'orphan': 'FALSE', 'content': 'LOCK_Lock_Tables: 259(LWP:9985)', 'owner': '10.0.2.201', 'type': 'SHARE'})
>>>

根据锁名字,从里面找到对应的lockid, 用于清理锁的参数。先找工具格式化一下,方便观看

{
	'create time': '20220223090419',
	'name': 'testdb',
	'lockid': 1612718352306798592,
	'orphan': 'FALSE',
	'content': 'LOCK_Lock_Tables: 259(LWP:9985)',
	'owner': '10.0.2.201',
	'type': 'SHARE'
}, {
	'create time': '20220223090419',
	'name': 'testdb.t1580d5f90-b287-4199-b057-e6fbd44b5bfa',
	'lockid': 8587815309107265537,
	'orphan': 'FALSE',
	'content': 'LOCK_Lock_Tables: 259(LWP:9985)',
	'owner': '10.0.2.201',
	'type': 'EXCLUSIVE'
}, {
	'create time': '20220223090419',
	'name': 'testdb.t1',
	'lockid': 7728385150517510147,
	'orphan': 'FALSE',
	'content': 'LOCK_Lock_Tables: 259(LWP:9985)',
	'owner': '10.0.2.201',
	'type': 'SHARE'
}

通过python清理锁的方法

我们尝试释放一个排它锁

	'create time': '20220223090419',
	'name': 'testdb.t1580d5f90-b287-4199-b057-e6fbd44b5bfa',
	'lockid': 8587815309107265537,
	'orphan': 'FALSE',
	'content': 'LOCK_Lock_Tables: 259(LWP:9985)',
	'owner': '10.0.2.201',
	'type': 'EXCLUSIVE'

接口

gcware.unlocktableforce(name,lockid)

执行效果

>>> gcware.unlocktableforce('testdb.t1580d5f90-b287-4199-b057-e6fbd44b5bfa',8587815309107265537);
0
>>> 

查看锁确实少了一个

[root@rh6-1 ~]# gcadmin showlock
 +===============================================================================+
 |                                 GCLUSTER LOCK                                 |
 +===============================================================================+
 +---------+----------+-------------------------------+--------------+------+----+
 |Lock name|  owner   |            content            | create time  |orphan|type|
 +---------+----------+-------------------------------+--------------+------+----+
 | testdb  |10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985)|20220223090419|FALSE | S  |
 +---------+----------+-------------------------------+--------------+------+----+
 |testdb.t1|10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985)|20220223090419|FALSE | S  |
 +---------+----------+-------------------------------+--------------+------+----+
 Total : 2

再开一个连接,尝试拿到锁。发现可以正常拿到Lock table write, 但show detail processlist的持有锁信息并没有刷新,怀疑....

[gbase@rh6-1 ubas]$ gccli testdb

GBase client 8.6.2.43-R33.129391. Copyright (c) 2004-2022, GBase.  All Rights Reserved.

gbase> lock table t1 write;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)

gbase> show detail processlist;
+-----+-------+-----------------+-----------+--------+---------+-------+-----------------------------+----------------------------------------------------------------+------+-------------------------+
| Id  | Tid   | User            | Host      | db     | Command | Time  | State                       | Lock                                                           | Wait | Info                    |
+-----+-------+-----------------+-----------+--------+---------+-------+-----------------------------+----------------------------------------------------------------+------+-------------------------+
|   1 | 18648 | event_scheduler | localhost | NULL   | Daemon  | 46724 | Waiting for next activation | NULL                                                           | NULL | NULL                    |
| 259 |  9985 | root            | localhost | testdb | Sleep   |    74 |                             | testdb;testdb.t1;testdb.t1580D5F90-B287-4199-B057-E6FBD44B5BFA | NULL | NULL                    |
| 267 | 10695 | root            | localhost | testdb | Query   |     0 | NULL                        | testdb;testdb.t1;testdb.t1580D5F90-B287-4199-B057-E6FBD44B5BFA | NULL | show detail processlist |
+-----+-------+-----------------+-----------+--------+---------+-------+-----------------------------+----------------------------------------------------------------+------+-------------------------+
3 rows in set (Elapsed: 00:00:00.00)

如果锁没有释放,会一直卡在这里

gbase> lock table t1 write;
^CQuery aborted by Ctrl+C
ERROR 1100 (HY000): Table 'testdb.t1' was not locked with LOCK TABLES
gbase>

查看实际的锁情况,已经更新

[root@rh6-1 ~]# gcadmin showlock
 +====================================================================================================================+
 |                                                   GCLUSTER LOCK                                                    |
 +====================================================================================================================+
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 |                  Lock name                  |  owner   |            content             | create time  |orphan|type|
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 |                   testdb                    |10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985) |20220223090419|FALSE | S  |
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 |                   testdb                    |10.0.2.201|LOCK_Lock_Tables: 267(LWP:10695)|20220223101414|FALSE | S  |
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 |testdb.t1580d5f90-b287-4199-b057-e6fbd44b5bfa|10.0.2.201|LOCK_Lock_Tables: 267(LWP:10695)|20220223101414|FALSE | E  |
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 |                  testdb.t1                  |10.0.2.201|LOCK_Lock_Tables: 259(LWP:9985) |20220223090419|FALSE | S  |
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 |                  testdb.t1                  |10.0.2.201|LOCK_Lock_Tables: 267(LWP:10695)|20220223101414|FALSE | S  |
 +---------------------------------------------+----------+--------------------------------+--------------+------+----+
 Total : 5

总结

SQL持有的锁是为了一致性,强行清理锁可能导致不可预期的结果,比如数据错误,所以只有在没有其它方法解决,且能通过其它手段确保数据一致性的情况下(比如手工检查和同步数据),才能强行清理锁。

请慎重。