南大通用GBase 统计表行数的自定义存储过程样例

GBase的每张表,可以通过count来统计行数,本文提供了一个自定义存储过程来统计符合条件的一批表的行数。

提示

本文提供的自定义存储过程,采用的也是循环查询每个表,虽然方便,但同样存在表数量过多时,系统资源占用时间长的问题,所以请尽量通过参数,减少统计范围。

另外,为提升性能,采用了多个count一次执行的方式,所以需要一次性打开多个表。这涉及到数据库的同时可打开的表数量的参数

GBase 8a表打开缓冲参数table_open_cache和表定义缓冲参数table_definition_cache

所以在存储过程中,判断了如果超过3000字符,则先执行一批。如果调高了数据库的参数,可以适当增加3000到10000长度。

存储过程

表类型

默认只查询Express引擎的表,对性能影响小,省去了目录扫描部分的开销,其SQL语句如下

select tbname table_name,isreplicate 
from gbase.table_distribution b 
where b.dbname=DB  and tbname like TBL order by tbname ;

如果要查询当前节点所有的表,则使用下面的SQL,注意information_schema.tables,其like操作将做整个目录的扫描,如果表特别多,耗时长,资源影响高,非特殊场景不建议使用。

select table_name,isreplicate 
from information_schema.tables a
left join gbase.table_distribution b on b.tbname=a.table_name
where table_schema=DB and table_name like TBL
order by table_name;

返回结果

该存储过程最后返回时,可以自行指定排序,默认是中间表的表名字。

select * from COUNT_TABLE;

该表为临时比表,执行完存储过程后,可以重复查询,比如

select * from COUNT_TABLE order by countValue  desc;

完整存储过程

-- 调用information的table,比较方便,但性能不好 
-- 自己每个表count,性能也不好,必须多个表count union在一起才可以。
-- 调用样例如下,参数为db的名字
-- call countDB('dap_model');
-- call countDB('dap_model.asc%');

delimiter ;
drop procedure if exists countDB;
DELIMITER //

CREATE DEFINER="root"@"%" PROCEDURE "countDB"(
  in dbname varchar(64)
 )
BEGIN
 DECLARE TBL varchar(64);
 DECLARE DB varchar(64);
 DECLARE idx int;

 select instr(dbname,'.') into idx;
 if idx > 0 then
  set DB=substr(dbname,0,instr(dbname,'.')-1);
  set TBL=substr(dbname,instr(dbname,'.')+1,length(dbname));
 else
  set DB=dbname;
  set TBL='%';
 end if;
 call _countDB(DB,TBL);
END //

DELIMITER ;

drop procedure if exists _countDB;
DELIMITER //

CREATE DEFINER="root"@"%" PROCEDURE "_countDB"(
  in DB varchar(64),
  in TBL varchar(64)
 )
BEGIN

 DECLARE tableName varchar(64);
 DECLARE isReplicate varchar(64);
 DECLARE SQL_TMP varchar(21000);
 DECLARE done INT DEFAULT 0;
 DECLARE CUR_PM_POINFO CURSOR FOR
  select tbname table_name,isreplicate from gbase.table_distribution b where b.dbname=DB  and tbname like TBL order by tbname ;
 DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = 1;

 drop temporary table if exists COUNT_TABLE;  
 create temporary table COUNT_TABLE(
  tableName varchar(64),
  isReplicate varchar(64),
  countValue bigint
 );

 OPEN CUR_PM_POINFO;
 FETCH CUR_PM_POINFO INTO tableName, isReplicate;
 set SQL_TMP='';
    while ( done = 0 ) do
--      select tableName,isReplicate;
     if(SQL_TMP='')then
      set SQL_TMP=concat('select ''',tableName,''',''',nvl(isReplicate,'NO'),''',count(*) from ',DB,'.',tableName);
     else
      set SQL_TMP=concat(SQL_TMP,' \nunion all select ''',tableName,''',''',nvl(isReplicate,'NO'),''',count(*) from ',DB,'.',tableName);
     end if;
     if(length(SQL_TMP)>=3000) then
      set SQL_TMP=concat('insert into COUNT_TABLE select * from (',SQL_TMP,')a');
      set @sql = SQL_TMP;
        PREPARE s1 FROM @sql;
       EXECUTE s1;
       DEALLOCATE PREPARE s1;
      set SQL_TMP='';
     end if;
     FETCH CUR_PM_POINFO INTO tableName, isReplicate;    
    end while;
    if(SQL_TMP <> '') then
     set SQL_TMP=concat('insert into COUNT_TABLE select * from (',SQL_TMP,')a');
  set @sql = SQL_TMP;
--  select @sql;
  PREPARE s1 FROM @sql;
  EXECUTE s1;
  DEALLOCATE PREPARE s1;
  set SQL_TMP='';
 end if;
 select * from COUNT_TABLE;
-- drop temporary table COUNT_TABLE;    
END //
delimiter ;

执行样例

查询某个库下的所有表的行数

再次警告:如果符合条件的表非常多,则查询时间长,且系统资源占用也长。

gbase> call countdb('testdb');
+-----------+-------------+------------+
| tableName | isReplicate | countValue |
+-----------+-------------+------------+
| d1        | NO          |    1000001 |
| t1        | NO          |         14 |
| t2        | NO          |          1 |
| t3        | NO          |          1 |
| t4        | NO          |          3 |
| test20c   | NO          |    1000001 |
| test20c_e | NO          |    1000000 |
+-----------+-------------+------------+
7 rows in set (Elapsed: 00:00:00.41)

gbase> call countdb('gclusterdb');
+----------------------+-------------+------------+
| tableName            | isReplicate | countValue |
+----------------------+-------------+------------+
| dual                 | NO          |          1 |
| gcs_1694629898_140_7 | NO          |          0 |
| nodedatamap          | NO          |      65536 |
| rebalancing_status   | NO          |          0 |
+----------------------+-------------+------------+
4 rows in set (Elapsed: 00:00:00.32)

Query OK, 0 rows affected (Elapsed: 00:00:00.32)

统计某个库下,匹配的表行数

匹配的方式和like相同,库名和表名间用点分割,具体算法看countDB的代码。

gbase> call countdb('testdb.t%');
+-----------+-------------+------------+
| tableName | isReplicate | countValue |
+-----------+-------------+------------+
| t1        | NO          |         14 |
| t2        | NO          |          1 |
| t3        | NO          |          1 |
| t4        | NO          |          3 |
| test20c   | NO          |    1000001 |
| test20c_e | NO          |    1000000 |
+-----------+-------------+------------+
6 rows in set (Elapsed: 00:00:00.38)

Query OK, 0 rows affected (Elapsed: 00:00:00.38)

gbase> call countdb('testdb.%20%');
+-----------+-------------+------------+
| tableName | isReplicate | countValue |
+-----------+-------------+------------+
| test20c   | NO          |    1000001 |
| test20c_e | NO          |    1000000 |
+-----------+-------------+------------+
2 rows in set (Elapsed: 00:00:00.30)

Query OK, 0 rows affected (Elapsed: 00:00:00.30)