南大通用GBase 8a加载LOAD定长格式的数据方法

本文介绍GBase 8a加载定长数据的方法。定长数据指原始数据里,一个字段占用的数据长度是相同的,比如指定100个字节,则原始数据必须是100个字节,不足按照约定补上,比如空格。

参考

GBase 8a 集群加载数据LOAD的方法

环境数据

gbase> desc t2;
+----------+--------------+------+-----+-------------------+-------+
| Field    | Type         | Null | Key | Default           | Extra |
+----------+--------------+------+-----+-------------------+-------+
| id       | int(11)      | YES  |     | NULL              |       |
| name     | varchar(100) | YES  |     | noname            |       |
| birthday | date         | YES  |     | NULL              |       |
| commons  | varchar(100) | YES  |     | NULL              |       |
| loadtime | datetime     | YES  |     | CURRENT_TIMESTAMP |       |
+----------+--------------+------+-----+-------------------+-------+
5 rows in set (Elapsed: 00:00:00.00)

数据

[root@gbase_rh7_001 gbase]# cat t2_3.txt
500ABCDEFG2021-07-08AAA2021-07-08 11:12:13
50112345672021-07-08BBB2021-07-08 12:13:14
502张三李四王五等2021-07-09大中国2021-07-08 13:14:15
502张三72021-07-09大2021-07-08 13:14:15

加载效果

注意data_format 4是定长的,后面的fields length部分,指定每个字段数据的固定长度。单位是字节。

注意原始数据1个汉字UTF8是占用3个字节的,但只对应数据库内1个字符。所以在构造定长数据时,要考虑到这个因素。比如varchar(100),最长的字节数是300,而不是100。

如下例子中的一行错误数据,就是因为张三李四王五等的长度超过了规定的7个字节,导致被匹配到后面的日期字段,从而格式不符。

gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.101/home/gbase/t2_3.txt' into table t2 data_format 4 fields length '3,7,10,3,19';
Query OK, 3 rows affected (Elapsed: 00:00:01.22)
Task 9270 finished, Loaded 3 records, Skipped 1 records

gbase> select * from t2;
+------+---------+------------+---------+---------------------+
| id   | name    | birthday   | commons | loadtime            |
+------+---------+------------+---------+---------------------+
|  500 | ABCDEFG | 2021-07-08 | AAA     | 2021-07-08 11:12:13 |
|  502 | 张三7   | 2021-07-09 | 大      | 2021-07-08 13:14:15 |
|  501 | 1234567 | 2021-07-08 | BBB     | 2021-07-08 12:13:14 |
+------+---------+------------+---------+---------------------+
3 rows in set (Elapsed: 00:00:00.01)

gbase> show load logs limit 1;
+---------+---------+---------+-------------+-------------------------------------------------------+-------------+------------+------------+---------------------+-----------------------------------------------------------------+
| task_id | db_name | tb_name | err_data_ip | file_name                                             | file_offset | record_len | err_column | err_reason          | err_data                                                        |
+---------+---------+---------+-------------+-------------------------------------------------------+-------------+------------+------------+---------------------+-----------------------------------------------------------------+
| 9270    | testdb  | t2      | 10.0.2.101  | sftp://gbase:*********@10.0.2.101/home/gbase/t2_3.txt | 86          | 63         | 3          | Out of range value
 | 502张三李四王五等2021-07-09大中国2021-07-08 13:14:15
           |
+---------+---------+---------+-------------+-------------------------------------------------------+-------------+------------+------------+---------------------+-----------------------------------------------------------------+
1 row in set (Elapsed: 00:00:00.00)