本文介绍GBase 8a加载定长数据的方法。定长数据指原始数据里,一个字段占用的数据长度是相同的,比如指定100个字节,则原始数据必须是100个字节,不足按照约定补上,比如空格。
目录导航
参考
环境数据
表
gbase> desc t2;
+----------+--------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+-------------------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(100) | YES | | noname | |
| birthday | date | YES | | NULL | |
| commons | varchar(100) | YES | | NULL | |
| loadtime | datetime | YES | | CURRENT_TIMESTAMP | |
+----------+--------------+------+-----+-------------------+-------+
5 rows in set (Elapsed: 00:00:00.00)
数据
[root@gbase_rh7_001 gbase]# cat t2_3.txt
500ABCDEFG2021-07-08AAA2021-07-08 11:12:13
50112345672021-07-08BBB2021-07-08 12:13:14
502张三李四王五等2021-07-09大中国2021-07-08 13:14:15
502张三72021-07-09大2021-07-08 13:14:15
加载效果
注意data_format 4是定长的,后面的fields length部分,指定每个字段数据的固定长度。单位是字节。
注意原始数据1个汉字UTF8是占用3个字节的,但只对应数据库内1个字符。所以在构造定长数据时,要考虑到这个因素。比如varchar(100),最长的字节数是300,而不是100。
如下例子中的一行错误数据,就是因为张三李四王五等的长度超过了规定的7个字节,导致被匹配到后面的日期字段,从而格式不符。
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.101/home/gbase/t2_3.txt' into table t2 data_format 4 fields length '3,7,10,3,19';
Query OK, 3 rows affected (Elapsed: 00:00:01.22)
Task 9270 finished, Loaded 3 records, Skipped 1 records
gbase> select * from t2;
+------+---------+------------+---------+---------------------+
| id | name | birthday | commons | loadtime |
+------+---------+------------+---------+---------------------+
| 500 | ABCDEFG | 2021-07-08 | AAA | 2021-07-08 11:12:13 |
| 502 | 张三7 | 2021-07-09 | 大 | 2021-07-08 13:14:15 |
| 501 | 1234567 | 2021-07-08 | BBB | 2021-07-08 12:13:14 |
+------+---------+------------+---------+---------------------+
3 rows in set (Elapsed: 00:00:00.01)
gbase> show load logs limit 1;
+---------+---------+---------+-------------+-------------------------------------------------------+-------------+------------+------------+---------------------+-----------------------------------------------------------------+
| task_id | db_name | tb_name | err_data_ip | file_name | file_offset | record_len | err_column | err_reason | err_data |
+---------+---------+---------+-------------+-------------------------------------------------------+-------------+------------+------------+---------------------+-----------------------------------------------------------------+
| 9270 | testdb | t2 | 10.0.2.101 | sftp://gbase:*********@10.0.2.101/home/gbase/t2_3.txt | 86 | 63 | 3 | Out of range value
| 502张三李四王五等2021-07-09大中国2021-07-08 13:14:15
|
+---------+---------+---------+-------------+-------------------------------------------------------+-------------+------------+------------+---------------------+-----------------------------------------------------------------+
1 row in set (Elapsed: 00:00:00.00)