GBase 8a 在加载LOAD时,如果发现数据和目标表不匹配,会将错误数据和原因,写入到日志里。 其中The number of data columns is less than the number of columns defined 错误是因为文件列数少于表的指定列数。数据按照分隔符切分的字段数量和表或指定的字段数量不匹配导致。
错误原因日志是trc结尾的,错误数据是err结尾的。
如下是测试用例
表结构
目录导航
测试样例
一个2列的表
gbase> create table tload_2(id int, name varchar(200));
Query OK, 0 rows affected (Elapsed: 00:00:00.13)
gbase> desc tload_2;
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(200) | YES | | NULL | |
+-------+--------------+------+-----+---------+-------+
2 rows in set (Elapsed: 00:00:00.00)
样例数据
制造了异常数据,有字段少,也有字段多
[gbase@localhost ~]$ cat /home/gbase/tload.txt
1
2,Second
3,Third,2020-01-02
4,Fouth,2020-14-16
[gbase@localhost ~]$
执行加载
有1行正常入库,3行错误数据
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.107/home/gbase/tload.txt' into table tload_2;
Query OK, 1 row affected (Elapsed: 00:00:00.68)
Task 1572889 finished, Loaded 1 records, Skipped 3 records
查看表数据
确实有1行
gbase> select * from tload_2;
+------+--------+
| id | name |
+------+--------+
| 2 | Second |
+------+--------+
1 row in set (Elapsed: 00:00:00.00)
查看错误日志和数据
此处为了方便,直接用SQL方式读取。可以看到报错信息 The number of data columns is less than the number of columns defined. 还有text parser error。注意用SQL方式,可以从结果里直接看到错误数据。如果从文件查看,是分成2个文件保存的。
gbase> show load logs limit 3;
+---------+---------+---------+-------------------+--------------------------------------------------------+-------------+------------+------------+------------------------------------------------------------------------+---------------------+
| task_id | db_name | tb_name | err_data_ip | file_name | file_offset | record_len | err_column | err_reason | err_data |
+---------+---------+---------+-------------------+--------------------------------------------------------+-------------+------------+------------+------------------------------------------------------------------------+---------------------+
| 1572889 | testdb | tload_2 | ::ffff:10.0.2.107 | sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt | 0 | 2 | 1 | The number of data columns is less than the number of columns defined
| 1
|
| 1572889 | testdb | tload_2 | ::ffff:10.0.2.107 | sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt | 11 | 19 | 3 | text parser error
| 3,Third,2020-01-02
|
| 1572889 | testdb | tload_2 | ::ffff:10.0.2.107 | sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt | 30 | 19 | 3 | text parser error
| 4,Fouth,2020-14-16
|
+---------+---------+---------+-------------------+--------------------------------------------------------+-------------+------------+------------+------------------------------------------------------------------------+---------------------+
通过文件查看错误日志和数据
[gbase@localhost 1572883]$ cd /opt/gcluster/log/gcluster/loader_logs/1572889
[gbase@localhost 1572889]$ ll
total 12
-rw------- 1 gbase gbase 220 Sep 3 14:16 1572889_loader_result.log
-rw------- 1 gbase gbase 40 Sep 3 14:16 1572889_testdb_tload_2_n1_::ffff:10.0.2.107_20200903141603.err
-rw------- 1 gbase gbase 368 Sep 3 14:16 1572889_testdb_tload_2_n1_::ffff:10.0.2.107_20200903141603.trc
[gbase@localhost 1572889]$ cat 1572889_testdb_tload_2_n1_\:\:ffff\:10.0.2.107_20200903141603.trc
file_name | file_offset | record_len | column | reason
sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt|0|2|1|The number of data columns is less than the number of columns defined
sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt|11|19|3|text parser error
sftp://gbase:*********@10.0.2.107/home/gbase/tload.txt|30|19|3|text parser error
[gbase@localhost 1572889]$
[gbase@localhost 1572889]$ cat 1572889_testdb_tload_2_n1_\:\:ffff\:10.0.2.107_20200903141603.err
1
3,Third,2020-01-02
4,Fouth,2020-14-16
解决方案
对于数据字段比表字段多的情况,可以通过table_fields指定来规避。请参考
GBase 8a 集群加载数据LOAD数据和字段不匹配的处理方案
其它加载错误,请参考
《南大通用GBase 8a 加载LOAD数据无法入库错误 skipped records,The number of data columns is less than the number of columns defined》有1条评论
评论已关闭。