南大通用GBase 8a大量单条update的批量优化方案

当有大量的单条update时,本文介绍几种优化方案来提升性能。

单条update场景

比如不同的id,要更新成一个新的数值,且数量很多,比如,几百,几万甚至更多。如果采用逐行的方案,则gbase每次update都要做数据库内部的版本切换,其性能是非常差的。普通的1秒能执行2-3行就很好了。

如下最简单的场景,每次一个条件,大约更新20-40行记录,需要0.5秒。

gbase> desc t1;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id    | int(11)      | YES  |     | NULL    |       |
| name  | varchar(100) | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+
2 rows in set (Elapsed: 00:00:00.00)

gbase> select count(*) from t1;
+----------+
| count(*) |
+----------+
|  1000001 |
+----------+
1 row in set (Elapsed: 00:00:00.01)


gbase> update t1 set name=concat('NAME',id) where id=856;
Query OK, 28 rows affected (Elapsed: 00:00:00.62)
Rows matched: 28  Changed: 28  Warnings: 0

gbase> update t1 set name=concat('NAME',id) where id=857;
Query OK, 29 rows affected (Elapsed: 00:00:00.63)
Rows matched: 29  Changed: 29  Warnings: 0

gbase> update t1 set name=concat('NAME',id) where id=858;
Query OK, 37 rows affected (Elapsed: 00:00:00.52)
Rows matched: 37  Changed: 37  Warnings: 0

批量更新有规律的值

如果数值和其它字段有计算关系,可以一次性更新多行,将条件用in或子查询的形式做过滤条件,避免一行一行的操作。

如下通过in一次性更新3个ip,其性能和单个的相当,等同于提升了3倍。

gbase> update t1 set name=concat('NAME',id) where id in(856,857,858);
Query OK, 94 rows affected (Elapsed: 00:00:00.48)
Rows matched: 94  Changed: 94  Warnings: 0

批量更新无规律的值

数值是通过某些算法得到的,比如余额,其数值无法规律性的得到,则建议先建立一个临时表,将要更新的id和值保存到这个临时表里,建议用复制表。然后通过关联更新来一次性更新所有的值。

如下通过临时表来关联更新,虽然insert时浪费了一些时间,但整体上更新3行耗时1.6秒 (0.42+0.27+1.01), 也基本和逐行相当,但如果一次更新1000个,其update的耗时并没有成比例增加,最终比逐条提升了接近1000倍。

gbase> create table tmpab(id int, name varchar(100));
Query OK, 0 rows affected (Elapsed: 00:00:00.42)
gbase> insert into tmpab values(856,'I AM A 856'),(857,'I AM B 857'),(858,'I AM C 858');
Query OK, 3 rows affected (Elapsed: 00:00:00.27)
Records: 3  Duplicates: 0  Warnings: 0

gbase> select distinct * from t1 where id in (856,857,858);
+------+---------+
| id   | name    |
+------+---------+
|  856 | NAME856 |
|  857 | NAME857 |
|  858 | NAME858 |
+------+---------+
3 rows in set (Elapsed: 00:00:00.05)

gbase> update t1 ,tmpab set t1.name=tmpab.name where t1.id=tmpab.id;
Query OK, 94 rows affected (Elapsed: 00:00:01.01)
Rows matched: 94  Changed: 94  Warnings: 0

base> select distinct * from t1 where id in (856,857,858);
+------+------------+
| id   | name       |
+------+------------+
|  856 | I AM A 856 |
|  857 | I AM B 857 |
|  858 | I AM C 858 |
+------+------------+
3 rows in set (Elapsed: 00:00:00.09)



gbase> truncate table tmpab;
Query OK, 3 rows affected (Elapsed: 00:00:00.38)


gbase> insert into tmpab select distinct id,concat('HAHA',id) from t1 limit 1000;
Query OK, 1000 rows affected (Elapsed: 00:00:00.34)
Records: 1000  Duplicates: 0  Warnings: 0

gbase> update t1 ,tmpab set t1.name=tmpab.name where t1.id=tmpab.id;
Query OK, 31513 rows affected (Elapsed: 00:00:00.58)
Rows matched: 31513  Changed: 31513  Warnings: 0

参考