南大通用GBase 8a集群查看所有节点正在运行的SQL

发表于2020年6月9日2020年11月6日作者 laozizhu

注意：该功能在某些V95的早期版本里被去掉了，如果你的版本不支持，请升级集群，或者人工连接各个节点查询show processlist等进行查看。

目录导航

1.1.1. 作用

查看集群当前正在运行的所有任务，特别是运行时间最长的任务。

1.1.2. 使用方法

管理节点

select * from information_schema.COORDINATORS_TASK_INFORMATION;

排除一些不重要的字段，可以追加如下条件.

select COORDINATOR_NAME, ID, user, host, command, start_time, time, state, substring(info,0,100) info from information_schema.COORDINATORS_TASK_INFORMATION where info is not null and info not like '%information_schema.processlist%' order by time desc;

计算节点

select * from information_schema.GNODE_TASK_INFORMATION;

select NODE_NAME, ID, user, host, command, start_time, time, state, substring(info,0,100) info from information_schema.GNODES_TASK_INFORMATION where info is not null and info not like '%information_schema.processlist%' order by time desc;

如下管理节点表，绿色部分是重点关注的内容，其余白色背景的请忽略。

字段	类型	作用
COORDINATOR_NAME	varchar(64)	管理节点名字
ID	bigint(4)	进程ID
TASKID	bigint(4)	数据库内部任务号，每个命令执行时都会增加
SUBTASKID	bigint(4)	未见使用
THREADID	bigint(4)	数据库内线程号，一般在一个session内是不会变动的
USER	varchar(16)	登录用户
HOST	varchar(64)	连接的客户端主机
DB	varchar(64)	默认连接的数据库
COMMAND	varchar(16)	命令类型
START_TIME	timestamp	开始时间
TIME	int(7)	执行时间
STATE	varchar(64)	状态
RESOURCE_POOL_NAME	varchar(64)
RESOURCE_POOL_ID	bigint(21)unsigned
RESOURCE_POOl_PRIORITY	bigint(21)unsigned
WAITING_TIME	bigint(21)unsigned	等待时间
RUNNING_TIME	bigint(21)unsigned	运行时间
LOCK	longtext	已经拿到的锁
WAIT	longtext	等待的锁
INFO	longtext	执行的SQL
TRACE	longtext

计算节点

字段	类型	作用
NODE_NAME	varchar(64)	计算节点名字
ID	bigint(4)	进程ID
TASKID	bigint(4)	数据库内部任务号
SUBTASKID	bigint(4)	数据库内部子任务号
THREADID	bigint(4)	数据库内线程号
USER	varchar(16)	登录用户
HOST	varchar(64)	连接的客户端主机
DB	varchar(64)	默认连接的数据库
COMMAND	varchar(16)	命令类型
START_TIME	timestamp	开始时间
TIME	int(7)	执行时间
STATE	varchar(64)	状态
RESOURCE_POOL_NAME	varchar(64)
RESOURCE_POOL_ID	bigint(21)unsigned
RESOURCE_POOl_PRIORITY	bigint(21)unsigned
WAITING_TIME	bigint(21)unsigned
RUNNING_TIME	bigint(21)unsigned	运行时间
PARALLEL_DEGREE	bigint(21)unsigned	并行线程数
CPU_USAGE	bigint(21)unsigned
MEM_USAGE	bigint(21)unsigned
TEMP_DISKSPACE_SORT	bigint(21)unsigned
TEMP_DISKSPACE_JOIN	bigint(21)unsigned
TEMP_DISKSPACE_AGGR	bigint(21)unsigned
INFO	longtext	执行的SQL
TRACE	longtext

其中CPU等信息，需要开启资源管控功能。

1.1.3. 使用样例

图中为了方便，使用了\G格式，否则在一行里不方便看。

如下提供一个连续运行的脚本；其中将substring(info,0,100)去掉了，显示全部的info为后续排查方便，而sleep 10秒，请根据实际情况修改，问题排查可以考虑减少到5秒。日常跟踪，建议增加到30-60秒，也就是更关注执行时间长的SQL。

[root@gbase101 ~]# cat gbase_monit_sql.sh
#!/bin/sh
#死循环
while [ 2 -gt 1 ];
do
  # current datetime
  date
  # get current running sql in whole gbase 8a cluster
  gccli -h127.0.0.1 -ugbase -pgbase20110531 -vvv -e"select COORDINATOR_NAME, ID, user, host, command, start_time, time, state, info from information_schema.COORDINATORS_TASK_INFORMATION where info is not null and info not like '%information_schema.processlist%' order by time desc"
  # Delay 10 second
  sleep 10
done

然后用 nohup 在后台运行

nohup sh gbase_monit_sql.sh &

Post Views: 1,596

一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

1.1.1. 作用

1.1.2. 使用方法

1.1.3. 使用样例

相关文章: