南大通用GBase 8a不同数据类型比如字符串和数字做比较时的类型转换规则验证

GBase 8a在做比较时,如果数据类型不一致,则会根据一些规则进行自动转换。本文通过一些实际样例来探讨这些自动转换的规则是什么,本文关注点放在字符串和数字,其它类型的单独文章讨论。

如下分三种情况进行探讨,连个常量比较、两个字段比较,以及最常见的一个字段一个常量比较。

环境

2节点集群,一个包含int和varchar字段的表,以及几行数据。 集群版本9.5.2.44

集群

[gbase@gbase_rh7_001 ~]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=============================================================
|           GBASE COORDINATOR CLUSTER INFORMATION           |
=============================================================
|   NodeName   | IpAddress  | gcware | gcluster | DataState |
-------------------------------------------------------------
| coordinator1 | 10.0.2.101 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
| coordinator2 | 10.0.2.115 |  OPEN  |   OPEN   |     0     |
-------------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |                10.0.2.101                |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |                10.0.2.115                |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------

表和数据

gbase> desc b1;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id    | int(11)      | YES  | MUL | NULL    |       |
| name  | varchar(100) | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+
2 rows in set (Elapsed: 00:00:00.00)

gbase> select * from b1;
+------+-------+
| id   | name  |
+------+-------+
|    2 | 2222  |
|    4 | 44444 |
|    1 | 111   |
|    3 | 3333  |
|   99 | First |
+------+-------+
5 rows in set (Elapsed: 00:00:00.02)

2个常量比较

测试结果:类型转化成优先级高的类型后,再进行比较。

2个字符串比较

完全按照ASCII的先后顺序进行比较。但同时注意到,对于字母是默认不区分大小写的,也就是'A'=='a'。如果需要区分,要加上binary关键字。

gbase> select count(*) from b1 where '3'>'111';
+----------+
| count(*) |
+----------+
|        5 |
+----------+
1 row in set (Elapsed: 00:00:00.02)

gbase> select count(*) from b1 where 'A'> 'a';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (Elapsed: 00:00:00.04)

gbase> select count(*) from b1 where 'A'>= 'a';
+----------+
| count(*) |
+----------+
|        5 |
+----------+
1 row in set (Elapsed: 00:00:00.02)

gbase> select count(*) from b1 where 'A'='a';
+----------+
| count(*) |
+----------+
|        5 |
+----------+
1 row in set (Elapsed: 00:00:00.03)

带binary的区分大小写的字符串比较

gbase> select count(*) from b1 where binary 'A' >= binary 'a';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (Elapsed: 00:00:00.07)

gbase> select count(*) from b1 where binary 'A' >= 'a';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (Elapsed: 00:00:00.01)

gbase> select count(*) from b1 where 'A' >= 'a';
+----------+
| count(*) |
+----------+
|        5 |
+----------+
1 row in set (Elapsed: 00:00:00.01)

gbase> select count(*) from b1 where 'A' >= binary 'a';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (Elapsed: 00:00:00.02)

gbase>

2个数字比较

这个不测试了,按数字大小给结果就对了。

1个数字和字符串比较

2个常量比较,要看优先级了。常见类型里,

datetime>date>time>decimal>bigint>int>timestamp>varchar>char

如下的数字和字符串常量比较,会都转成优先级更高的数字,然后比较。

gbase> select count(*) from b1 where 3>'111';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (Elapsed: 00:00:00.01)

gbase> select count(*) from b1 where '3'>111;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (Elapsed: 00:00:00.02)

2个字段比较

测试结果:数据先转成左侧字段的类型,然后比较。

如下最典型的就是数字11和字符串'9'. 如果以数字为基准,11大于9, 如果以字符串为基准,'9'大于'11'

重新生成了数据
gbase> truncate table t;
Query OK, 8 rows affected (Elapsed: 00:00:00.46)

gbase> insert into t values(1,9),(11,9),(9,1),(9,11);
Query OK, 4 rows affected (Elapsed: 00:00:00.18)
Records: 4  Duplicates: 0  Warnings: 0

gbase> select * from t;
+------+------+
| id   | name |
+------+------+
|    1 | 9    |
|   11 | 9    |
|    9 | 1    |
|    9 | 11   |
+------+------+
4 rows in set (Elapsed: 00:00:00.02)

gbase>
gbase> select * from t where id>name;
+------+------+
| id   | name |
+------+------+
|   11 | 9    |
|    9 | 1    |
+------+------+
2 rows in set (Elapsed: 00:00:00.01)

gbase> select * from t where name>id;
+------+------+
| id   | name |
+------+------+
|    1 | 9    |
|   11 | 9    |
+------+------+
2 rows in set (Elapsed: 00:00:00.02)

1个常量一个字段比较

最常见的场景,字段等于某个数值。如下2个字段分别是int和varchar。

测试结果,常量和字段比较,都是转换成字段的类型后再比较。

测试表

gbase> create table t(id int, name varchar(10));
Query OK, 0 rows affected (Elapsed: 00:00:00.51)

gbase> insert into t values(1,1),(100,100);
Query OK, 2 rows affected (Elapsed: 00:00:00.25)
Records: 2  Duplicates: 0  Warnings: 0

gbase> select * from t;
+------+------+
| id   | name |
+------+------+
|    1 | 1    |
|  100 | 100  |
+------+------+
2 rows in set (Elapsed: 00:00:00.05)

数字字段对比常量

如下测试结果看,无论常量是数字还是字符串,无论是比较符的左面还是右面,最终都是以字段的数字类型为准。

gbase> select * from t where id>9;
+------+------+
| id   | name |
+------+------+
|  100 | 100  |
+------+------+
1 row in set (Elapsed: 00:00:00.07)

gbase> select * from t where id>'9';
+------+------+
| id   | name |
+------+------+
|  100 | 100  |
+------+------+
1 row in set (Elapsed: 00:00:00.02)

gbase> select * from t where 9<id;
+------+------+
| id   | name |
+------+------+
|  100 | 100  |
+------+------+
1 row in set (Elapsed: 00:00:00.02)

gbase> select * from t where '9'<id;
+------+------+
| id   | name |
+------+------+
|  100 | 100  |
+------+------+
1 row in set (Elapsed: 00:00:00.02)

字符串字段对比常量

gbase> select * from t where name>9;
Empty set (Elapsed: 00:00:00.01)

gbase> select * from t where name>'9';
Empty set (Elapsed: 00:00:00.05)

gbase> select * from t where 9<name;
Empty set (Elapsed: 00:00:00.03)

gbase> select * from t where '9'<name;
Empty set (Elapsed: 00:00:00.02)

总结

  • 两个常量比较时,以优先级高的类型为准转换
  • 字段和常量比较时,以字段类型为基准转换
  • 两个字段比较时,以左侧字段类型为基准转换