南大通用GBase 8a的replace等函数执行结果太宽导致内存参数不足而报错

replace作为GBase 8a的字符替换函数,用法简单,但如果参数设置不当,也会出现使用故障。特别是目标字符串的长度如果比较大时,容易出现内存参数不足,从而报错的问题。一些其它函数也有类似问题,包括replace,concat,concat_ws,insert,repeat,lpad,rpad,hex,make_set,export_set等。

注意:在9.5.3版本里,修改了内存评估方法,本文所描述的情况只出现在8.6和9.5.2版本。

参考

GBase 8a集群group_concat长度限制

Replace 函数的参数

  • str 是原始字符串
  • src是要别替换的字符
  • dest是将src替换成这个字符

如下是最常见的例子,将字符F替换成G

gbase> create table t3(id int, name varchar(200),address varchar(200));
Query OK, 0 rows affected (Elapsed: 00:00:00.64)

gbase> insert into t3 values (1,'First','111111111111');
Query OK, 1 row affected (Elapsed: 00:00:00.32)

gbase> select * from t3;
+------+-------+--------------+
| id   | name  | address      |
+------+-------+--------------+
|    1 | First | 111111111111 |
+------+-------+--------------+
1 row in set (Elapsed: 00:00:00.00)

gbase> select name,replace(name,'F','G') from t3;
+-------+-----------------------+
| name  | replace(name,'F','G') |
+-------+-----------------------+
| First | Girst                 |
+-------+-----------------------+
1 row in set (Elapsed: 00:00:00.00)

替换内存评估

在替换时,所需的内存是 原始字符串str的长度 * 替换后字符串dest的长度。

因为原始字符串里面的每个字符,都有可能被替换,而内存评估是提前做的,所以只能用最悲观的场景进行。

报错场景

如果替换后的字符串是一个字段,且长度较长,那么在评估是就可能超过32K, 从而从一个varchar替换变成一个longtext的替换。而后者的最大长度为64M,当做group,order时,所需要的内存最少时4G以上,如果用户没有配置这类参数,有可能出现所需内存超过阐述从而报错的情况。

Express out of resources error:too large gbase_buffer, has exceeded the gbase_buffer_hgrby. Total_width:67108888 no_rows:67 Gbase_buffer_hgrby: 33554432

比如

CREATE TABLE "test1" (
  "id" int(11) DEFAULT NULL,
  "name" varchar(200) DEFAULT NULL,
  "address" varchar(1000) DEFAULT NULL
) 
随便insert点数据

gbase> insert into test1 values(1,'123445','address1');
Query OK, 1 row affected (Elapsed: 00:00:00.40)

gbase> select id,name,replace(name,'4',address) c1 from test1 group by id,name,address;
+------+--------+----------------------+
| id   | name   | c1                   |
+------+--------+----------------------+
|    1 | 123445 | 123address1address15 |
+------+--------+----------------------+
1 row in set (Elapsed: 00:00:00.06)

gbase> insert into test1 values(2,'123445645656','address2');
Query OK, 1 row affected (Elapsed: 00:00:00.45)

gbase> select id,name,replace(name,'4',address) c1 from test1 group by id,name,address;
+------+--------------+-----------------------------------+
| id   | name         | c1                                |
+------+--------------+-----------------------------------+
|    1 | 123445       | 123address1address15              |
|    2 | 123445645656 | 123address2address256address25656 |
+------+--------------+-----------------------------------+
2 rows in set (Elapsed: 00:00:00.07)

gbase> insert into test1 values(3,'12344564565633','ad212dress2');
Query OK, 1 row affected (Elapsed: 00:00:00.33)


gbase> select id,name,replace(name,'4',address) c1 from test1 group by id,name,address;
ERROR 1708 (HY000): [2001::201:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-0006) Express out of resources error:too large gbase_buffer, has exceeded the gbase_buffer_hgrby. Total_width:600088 no_rows:67 Gbase_buffer_hgrby: 33554432
SQL: SELECT /*2001::201_353_42_2023-01-18_11:49:28*/ /*+ TID('2048512') */ `_tmp_rht_3372351498_353_t93_1_1673851711_s_n1`.`c2` AS `id`, `_tmp_rht_3372351498_353_t93_1_1673851711_s_n1`.`c3` AS `name`, replace(`_tmp_rht_3372351498_353_t93_1_1673851711_s_n1`.`c3`, '4', `_tmp_rht_3
gbase> select * from test1;
+------+----------------+-------------+
| id   | name           | address     |
+------+----------------+-------------+
|    1 | 123445         | address1    |
|    2 | 123445645656   | address2    |
|    3 | 12344564565633 | ad212dress2 |
+------+----------------+-------------+
3 rows in set (Elapsed: 00:00:00.00)

解决方案

由于参与运算的字段定义宽度评估结果超过了32K, 所以可以通过提高参数值,限制结果宽度,以及新版本的参数控制

提高算子参数的值,比如设置为6G

set gbase_buffer_hgrby=6*1024*1024*1024;

限制结果宽度

业务上确认结果不会超过1000, 那么可以用过substr函数,让执行结果不会到32K以上。

gbase> select id,name,substr(replace(name,'4',address),0,1000) c1 from test1 group by id,name,address;
+------+----------------+----------------------------------------------+
| id   | name           | c1                                           |
+------+----------------+----------------------------------------------+
|    1 | 123445         | 123address1address15                         |
|    3 | 12344564565633 | 123ad212dress2ad212dress256ad212dress2565633 |
|    2 | 123445645656   | 123address2address256address25656            |
+------+----------------+----------------------------------------------+
3 rows in set (Elapsed: 00:00:00.08)

gbase>

新版本限制参数

该参数在2023年3月份的版本可能会支持,老版本还没有这个。其功能与 group_concat_max_len 参数完全一样,限制函数执行结果的宽度。

estimate_func_max_len

默认值为0,不限制,等同与现有longblob的最大值64M。

其它值为限制字节宽度,与字符集有关,如果是utf8,则实际宽度为参数/3, utf8mb4则是 参数/4

gbase> set estimate_func_max_len=1000;
Query OK, 0 rows affected (Elapsed: 00:00:00.02)

gbase> select id,name,replace(name,'4',address) c1 from test1 group by id,name,address;
+------+----------------+----------------------------------------------+
| id   | name           | c1                                           |
+------+----------------+----------------------------------------------+
|    1 | 123445         | 123address1address15                         |
|    3 | 12344564565633 | 123ad212dress2ad212dress256ad212dress2565633 |
|    2 | 123445645656   | 123address2address256address25656            |
+------+----------------+----------------------------------------------+
3 rows in set (Elapsed: 00:00:00.10)

如果执行过程中,结果超过了参数值,则报错

gbase> set estimate_func_max_len=10;
Query OK, 0 rows affected (Elapsed: 00:00:00.01)

gbase> select id,name,replace(name,'4',address) c1 from test1 group by id,name,address;
ERROR 1708 (HY000): [2001::201:5050](GBA-02AD-0005)Failed to query in gnode:
DETAIL: (GBA-01EX-700) Gbase general error: The current length of the replace function result is (13), which exceeds the value of parameter estimate_func_max_len by (10)
SQL: SELECT /*2001::201_14_111_2023-02-08_11:01:39*/ /*+ TID('2293968') */ `_tmp_rht_3372351498_14_t66_1_1675819619_s_n1`.`c2` AS `id`, `_tmp_rht_3372351498_14_t66_1_1675819619_s_n1`.`c3` AS `name`, replace(`_tmp_rht_3372351498_14_t66_1_1675819619_s_n1`.`c3`, '4', `_tmp_rht_337235149
gbase>