当原始数据的字符集和GBase 8a数据库表的字符集不同时,可以在加载LOAD时指定原始数据的字符集类型,由加载程序自动进行编码转换。
当然,转码必然有些许的CPU消耗,如果数据源能匹配,性能更好。
目录导航
参考
环境
表格
gbase> desc t2;
+----------+--------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+-------------------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(100) | YES | | noname | |
| birthday | date | YES | | NULL | |
| commons | varchar(100) | YES | | NULL | |
| loadtime | datetime | YES | | CURRENT_TIMESTAMP | |
+----------+--------------+------+-----+-------------------+-------+
5 rows in set (Elapsed: 00:00:00.01)
数据
[root@gbase_rh7_001 gbase]# cat t2_3.txt
502,张三李四王五等,2021-07-09,大中国,2021-07-08 13:14:15
502,张三,2021-07-09,大,2021-07-08 13:14:15
[root@gbase_rh7_001 gbase]# iconv -f utf8 -t gbk t2_3.txt -o t2_4.txt
[root@gbase_rh7_001 gbase]# cat t2_4.txt
502,▒▒▒▒▒▒▒▒▒▒▒▒▒,2021-07-09,▒▒▒й▒,2021-07-08 13:14:15
502,▒▒▒▒,2021-07-09,▒▒,2021-07-08 13:14:15
[root@gbase_rh7_001 gbase]# file t2_3.txt
t2_3.txt: UTF-8 Unicode text
[root@gbase_rh7_001 gbase]# file t2_4.txt
t2_4.txt: ISO-8859 text
[root@gbase_rh7_001 gbase]#
样例
普通加载出现乱码
由于字符集不匹配,入库的数据是乱码。
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.101/home/gbase/t2_4.txt' into table t2 fields terminated by ',';
Query OK, 2 rows affected (Elapsed: 00:00:01.32)
Task 9280 finished, Loaded 2 records, Skipped 0 records
gbase> select * from t2;
+------+----------------+------------+---------+---------------------+
| id | name | birthday | commons | loadtime |
+------+----------------+------------+---------+---------------------+
| 502 | ▒▒▒▒ | 2021-07-09 | ▒▒ | 2021-07-08 13:14:15 |
| 502 | ▒▒▒▒▒▒▒▒▒▒▒▒▒ | 2021-07-09 | ▒▒▒й▒ | 2021-07-08 13:14:15 |
+------+----------------+------------+---------+---------------------+
2 rows in set (Elapsed: 00:00:00.01)
指定原始数据字符集加载
通过CHARACTER SET GBK参数指定原始数据的字符集是GBK, 加载结果正确。
当前GBase 8a支持的字符集为GB系列(GB2312,GBK,GB18030)和UTF8系列(utf8,utf8mb4)。
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.101/home/gbase/t2_4.txt' into table t2 CHARACTER SET GBK fields terminated by ',';
Query OK, 2 rows affected (Elapsed: 00:00:01.23)
Task 9281 finished, Loaded 2 records, Skipped 0 records
gbase> select * from t2;
+------+-----------------------+------------+-----------+---------------------+
| id | name | birthday | commons | loadtime |
+------+-----------------------+------------+-----------+---------------------+
| 502 | ▒▒▒▒ | 2021-07-09 | ▒▒ | 2021-07-08 13:14:15 |
| 502 | 张三 | 2021-07-09 | 大 | 2021-07-08 13:14:15 |
| 502 | ▒▒▒▒▒▒▒▒▒▒▒▒▒ | 2021-07-09 | ▒▒▒й▒ | 2021-07-08 13:14:15 |
| 502 | 张三李四王五等 | 2021-07-09 | 大中国 | 2021-07-08 13:14:15 |
+------+-----------------------+------------+-----------+---------------------+
4 rows in set (Elapsed: 00:00:00.01)