南大通用GBase 8a对加载数据中的保留空格处理方案

GBase 8a的加载,提供了对数据前后空格的处理方式,包括不保留,全部保留,保留前置和保留后置多个参数选项,本文介绍其使用方法。

参考

GBase 8a 集群加载数据LOAD的方法 完整的LOAD语法

环境

版本

gbase> select version();
+-----------------+
| version()       |
+-----------------+
| 9.5.2.36.125743 |
+-----------------+
1 row in set (Elapsed: 00:00:00.01)

其中的commons是用来设置备注信息,方便区分用的。

gbase> desc t2;
+----------+--------------+------+-----+---------+-------+
| Field    | Type         | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| id       | int(11)      | YES  |     | NULL    |       |
| name     | varchar(100) | YES  |     | noname  |       |
| birthday | date         | YES  |     | NULL    |       |
| commons  | varchar(100) | YES  |     | NULL    |       |
+----------+--------------+------+-----+---------+-------+
4 rows in set (Elapsed: 00:00:00.00)

数据

一共三行数据,包含三种空格的状态。

[gbase@gbase_rh7_001 ~]$ cat t2.txt
1,  前面2个空格,2021-01-01
2,后面2个空格  ,2021-02-02
3, 前后1个空格 ,2021-03-03
[gbase@gbase_rh7_001 ~]$

默认加载不保留空格

可以看到字段name的长度16,正好是5个汉字的长度(UTF8字符集,5*3=15个字节)加1个数字的长度。

gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.101/home/gbase/t2.txt' into table t2 fields terminated by ',' set commons='无空格参数加载';
Query OK, 3 rows affected (Elapsed: 00:00:01.14)
Task 9237 finished, Loaded 3 records, Skipped 0 records

gbase> select t2.*,length(name) from t2;
+------+------------------+------------+-----------------------+--------------+
| id   | name             | birthday   | commons               | length(name) |
+------+------------------+------------+-----------------------+--------------+
|    2 | 后面2个空格      | 2021-02-02 | 无空格参数加载        |           16 |
|    1 | 前面2个空格      | 2021-01-01 | 无空格参数加载        |           16 |
|    3 | 前后1个空格      | 2021-03-03 | 无空格参数加载        |           16 |
+------+------------------+------------+-----------------------+--------------+

保留全部空格

长度变成了18,都多了个2个空格。

gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.101/home/gbase/t2.txt' into table t2 fields terminated by ',' PRESERVE BLANKS set commons='PRESERVE BLANKS参数加载';
Query OK, 3 rows affected (Elapsed: 00:00:01.05)
Task 9238 finished, Loaded 3 records, Skipped 0 records

gbase> select t2.*,length(name) from t2 order by commons,id;
+------+--------------------+------------+-----------------------------+--------------+
| id   | name               | birthday   | commons                     | length(name) |
+------+--------------------+------------+-----------------------------+--------------+
|    1 |   前面2个空格      | 2021-01-01 | PRESERVE BLANKS参数加载     |           18 |
|    2 | 后面2个空格        | 2021-02-02 | PRESERVE BLANKS参数加载     |           18 |
|    3 |  前后1个空格       | 2021-03-03 | PRESERVE BLANKS参数加载     |           18 |
|    1 | 前面2个空格        | 2021-01-01 | 无空格参数加载              |           16 |
|    2 | 后面2个空格        | 2021-02-02 | 无空格参数加载              |           16 |
|    3 | 前后1个空格        | 2021-03-03 | 无空格参数加载              |           16 |
+------+--------------------+------------+-----------------------------+--------------+
6 rows in set (Elapsed: 00:00:00.04)

保留前置空格

只有前面的空格被保留了。

gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.101/home/gbase/t2.txt' into table t2 fields terminated by ',' PRESERVE LEADING BLANKS set commons='PRESERVE LEADING BLANKS参数加载';
Query OK, 3 rows affected (Elapsed: 00:00:01.14)
Task 9239 finished, Loaded 3 records, Skipped 0 records

gbase> select t2.*,length(name) from t2 order by commons,id;                                                                                                         +------+--------------------+------------+-------------------------------------+--------------+
| id   | name               | birthday   | commons                             | length(name) |
+------+--------------------+------------+-------------------------------------+--------------+
|    1 |   前面2个空格      | 2021-01-01 | PRESERVE BLANKS参数加载             |           18 |
|    2 | 后面2个空格        | 2021-02-02 | PRESERVE BLANKS参数加载             |           18 |
|    3 |  前后1个空格       | 2021-03-03 | PRESERVE BLANKS参数加载             |           18 |
|    1 |   前面2个空格      | 2021-01-01 | PRESERVE LEADING BLANKS参数加载     |           18 |
|    2 | 后面2个空格        | 2021-02-02 | PRESERVE LEADING BLANKS参数加载     |           16 |
|    3 |  前后1个空格       | 2021-03-03 | PRESERVE LEADING BLANKS参数加载     |           17 |
|    1 | 前面2个空格        | 2021-01-01 | 无空格参数加载                      |           16 |
|    2 | 后面2个空格        | 2021-02-02 | 无空格参数加载                      |           16 |
|    3 | 前后1个空格        | 2021-03-03 | 无空格参数加载                      |           16 |
+------+--------------------+------------+-------------------------------------+--------------+
9 rows in set (Elapsed: 00:00:00.04)

保留后置空格

只有后面的空格被保留了。

gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.101/home/gbase/t2.txt' into table t2 fields terminated by ',' PRESERVE TRAILING BLANKS set commons='PRESERVE TRAILING BLANKS参数加载';
Query OK, 3 rows affected (Elapsed: 00:00:01.15)
Task 9240 finished, Loaded 3 records, Skipped 0 records

gbase> select t2.*,length(name) from t2 order by commons,id;                                                                                                         +------+--------------------+------------+--------------------------------------+--------------+
| id   | name               | birthday   | commons                              | length(name) |
+------+--------------------+------------+--------------------------------------+--------------+
|    1 |   前面2个空格      | 2021-01-01 | PRESERVE BLANKS参数加载              |           18 |
|    2 | 后面2个空格        | 2021-02-02 | PRESERVE BLANKS参数加载              |           18 |
|    3 |  前后1个空格       | 2021-03-03 | PRESERVE BLANKS参数加载              |           18 |
|    1 |   前面2个空格      | 2021-01-01 | PRESERVE LEADING BLANKS参数加载      |           18 |
|    2 | 后面2个空格        | 2021-02-02 | PRESERVE LEADING BLANKS参数加载      |           16 |
|    3 |  前后1个空格       | 2021-03-03 | PRESERVE LEADING BLANKS参数加载      |           17 |
|    1 | 前面2个空格        | 2021-01-01 | PRESERVE TRAILING BLANKS参数加载     |           16 |
|    2 | 后面2个空格        | 2021-02-02 | PRESERVE TRAILING BLANKS参数加载     |           18 |
|    3 | 前后1个空格        | 2021-03-03 | PRESERVE TRAILING BLANKS参数加载     |           17 |
|    1 | 前面2个空格        | 2021-01-01 | 无空格参数加载                       |           16 |
|    2 | 后面2个空格        | 2021-02-02 | 无空格参数加载                       |           16 |
|    3 | 前后1个空格        | 2021-03-03 | 无空格参数加载                       |           16 |
+------+--------------------+------------+--------------------------------------+--------------+
12 rows in set (Elapsed: 00:00:00.04)