南大通用GBase 8a 加载LOAD数据无法入库错误 Validate error

GBase 8a 数据库集群通过load方式将数据加载入库到一张表里,当数据某一列的值,无法转换为表对应字段类型时,比如带字母的字符串转化为数字,则会报Validate error这类错误。原因可能是列的顺序匹配错误,或原始数据采集问题。

测试样例

一个3列的表

gbase> create table t_load5(name varchar(100),id int, birth date);
Query OK, 0 rows affected (Elapsed: 00:00:00.10)

样例数据

制造了可能引起Validate error错误的异常数据,第二列是包含字母的字符串,而表定义为数字int类型

[gbase@localhost ~]$ cat /home/gbase/tload.txt
3,Third,2020-01-02
4,Fouth,2020-14-16
9999999999,MAX,2020-12-13

执行加载

有3行错误数据,注意本次加载任务的编号task ID是 1572912

gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.107/home/gbase/tload.txt' into table t_load4 fields terminated by ',';
Query OK, 0 rows affected (Elapsed: 00:00:00.63)
Task 1572912 finished, Loaded 0 records, Skipped 3 records

查看表数据

无数据入库


gbase> select * from t_load4;
Empty set (Elapsed: 00:00:00.00)

查看错误日志和数据

此处为了方便,直接用SQL方式读取。可以看到报错信息 Validate error。注意用SQL方式,可以从结果里直接看到错误数据。如果从文件查看,是分成2个文件保存的。

gbase> show load logs limit 3;
+---------+---------+---------+-------------------+--------------------------------------------------------+-------------+------------+------------+-----------------+----------------------------+
| task_id | db_name | tb_name | err_data_ip       | file_name                                              | file_offset | record_len | err_column | err_reason      | err_data                   |
+---------+---------+---------+-------------------+--------------------------------------------------------+-------------+------------+------------+-----------------+----------------------------+
| 1572914 | testdb  | t_load5 | ::ffff:10.0.2.107 | sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt | 0           | 19         | 2          | Validate error
 | 3,Third,2020-01-02
        |
| 1572914 | testdb  | t_load5 | ::ffff:10.0.2.107 | sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt | 19          | 19         | 2          | Validate error
 | 4,Fouth,2020-14-16
        |
| 1572914 | testdb  | t_load5 | ::ffff:10.0.2.107 | sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt | 38          | 26         | 2          | Validate error
 | 9999999999,MAX,2020-12-13
 |
+---------+---------+---------+-------------------+--------------------------------------------------------+-------------+------------+------------+-----------------+----------------------------+
3 rows in set (Elapsed: 00:00:00.00)

通过文件查看错误日志和数据,其中1572914是加载完成后,返回的taskid.可以看到其中的 Validate error 报错信息。

[root@localhost ~]# cd /opt/gcluster/log/gcluster/loader_logs/1572914
[root@localhost 1572914]# ll
total 12
-rw------- 1 gbase gbase 245 Sep  4 04:46 1572914_loader_result.log
-rw------- 1 gbase gbase  64 Sep  4 04:46 1572914_testdb_t_load5_n1_::ffff:10.0.2.107_20200904044616.err
-rw------- 1 gbase gbase 308 Sep  4 04:46 1572914_testdb_t_load5_n1_::ffff:10.0.2.107_20200904044616.trc
[root@localhost 1572914]# cat 1572914_loader_result.log
1572914|testdb|t_load5|root|::ffff:10.0.2.107|::ffff:10.0.2.107|2020-09-04 04:46:16|2020-09-04 04:46:16|0|64|64|0|3|0|SUCCESS|load data infile 'sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt' into table t_load5 fields terminated by ','|
[root@localhost 1572914]# cat 1572914_testdb_t_load5_n1_\:\:ffff\:10.0.2.107_20200904044616.trc
file_name    |   file_offset    |   record_len   |   column    |    reason
sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt|0|19|2|Validate error
sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt|19|19|2|Validate error
sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt|38|26|2|Validate error
[root@localhost 1572914]# cat 1572914_testdb_t_load5_n1_\:\:ffff\:10.0.2.107_20200904044616.err
3,Third,2020-01-02
4,Fouth,2020-14-16
9999999999,MAX,2020-12-13
[root@localhost 1572914]#

解决方案

修复错误数据,或者调整数据库表字段类型

其它加载错误,请参考

GBase 8a数据库加载LOAD报错信息分析和解决文章汇总

GBase 8a集群通过SQL查询加载历史记录