本文介绍GBase 8a里加载json格式的数据文件的样例和注意事项。
该功能需要新的版本,本文试用的是2023年4月份的9.5.3.28。老版本可能不支持。
目录导航
json加载参数
使用data_format 6来指示加载,数据格式为json。
单行,单个数据加载
数据就一行,只包含单个数据。
gbase> system cat /home/gbase/json1.txt
{"id":"1","name":"First"}
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.103/home/gbase/json1.txt' into table json data_format 6;
Query OK, 1 row affected (Elapsed: 00:00:00.77)
Task 11 finished, Loaded 1 records, Skipped 0 records
gbase> select * from json;
+------+-------+
| id | name |
+------+-------+
| 1 | First |
+------+-------+
1 row in set (Elapsed: 00:00:00.01)
每行单个数据加载
每行一个数据,有多行。
gbase> system cat /home/gbase/json2.txt
{"id":"1","name":"First"}
{"id":"2","name":"second no comer between line"}
gbase> truncate table json;
Query OK, 5 rows affected (Elapsed: 00:00:00.13)
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.103/home/gbase/json2.txt' into table json data_format 6;
Query OK, 2 rows affected (Elapsed: 00:00:00.73)
Task 28 finished, Loaded 2 records, Skipped 0 records
gbase> select * from json;
+------+------------------------------+
| id | name |
+------+------------------------------+
| 2 | second no comer between line |
| 1 | First |
+------+------------------------------+
2 rows in set (Elapsed: 00:00:00.00)
多行之间,如果末尾有逗号,不影响结果,有和没有都可以。解析的时候,是以行为基本单位。
{"id":"1","name":"First"},
{"id":"2","name":"second no comer between line"}
gbase> truncate table json;
Query OK, 2 rows affected (Elapsed: 00:00:00.12)
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.103/home/gbase/json2_2.txt' into table json data_format 6;
Query OK, 2 rows affected (Elapsed: 00:00:00.68)
Task 30 finished, Loaded 2 records, Skipped 1 records
gbase> select * from json;
+------+------------------------------+
| id | name |
+------+------------------------------+
| 1 | First |
| 2 | second no comer between line |
+------+------------------------------+
2 rows in set (Elapsed: 00:00:00.01)
一行里有多个数据
多个数据紧凑在一起,不要有分隔符
gbase> system cat /home/gbase/json2_3.txt
{"id":"1","name":"First"}{"id":"2","name":"second no comer between line"}
gbase> truncate table json;
Query OK, 4 rows affected (Elapsed: 00:00:00.12)
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.103/home/gbase/json2_3.txt' into table json data_format 6;
Query OK, 2 rows affected (Elapsed: 00:00:00.83)
Task 33 finished, Loaded 2 records, Skipped 0 records
gbase> select * from json;
+------+------------------------------+
| id | name |
+------+------------------------------+
| 1 | First |
| 2 | second no comer between line |
+------+------------------------------+
2 rows in set (Elapsed: 00:00:00.00)
如果中间包含了分隔符,比如逗号,只有第一个数据入库了,其余的报错。
gbase> system cat /home/gbase/json2_4.txt
{"id":"1","name":"First"},{"id":"2","name":"second no comer between line"}
gbase> truncate table json;
Query OK, 2 rows affected (Elapsed: 00:00:00.13)
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.103/home/gbase/json2_4.txt' into table json data_format 6;
Query OK, 1 row affected (Elapsed: 00:00:00.59)
Task 35 finished, Loaded 1 records, Skipped 1 records
gbase> select * from json;
+------+-------+
| id | name |
+------+-------+
| 1 | First |
+------+-------+
1 row in set (Elapsed: 00:00:00.00)
用json数组,用逗号分割
前后用中括号,数据之间用逗号分割。
[{"id":"1","name":"First"},{"id":"2","name":"second no comer between line"}]
gbase> truncate table json;
Query OK, 1 row affected (Elapsed: 00:00:00.12)
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.103/home/gbase/json2_5.txt' into table json data_format 6;
Query OK, 2 rows affected (Elapsed: 00:00:02.53)
Task 37 finished, Loaded 2 records, Skipped 0 records
gbase> select * from json;
+------+------------------------------+
| id | name |
+------+------------------------------+
| 1 | First |
| 2 | second no comer between line |
+------+------------------------------+
2 rows in set (Elapsed: 00:00:00.01)
如果用了数组,但中间不包含逗号,结果错误。
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.103/home/gbase/json2_5.txt' into table json data_format 6;
Query OK, 0 rows affected (Elapsed: 00:00:00.67)
Task 38 finished, Loaded 0 records, Skipped 1 records
gbase> system cat /home/gbase/json2_5.txt
[{"id":"1","name":"First"}{"id":"2","name":"second no comer between line"}]
gbase>
多行,每行多个数据
每一行,依然按照前面的单行处理逻辑。单行里如果包含多个数据,可以紧凑的,也可用数组格式。每一行的格式可以不同。
gbase> system cat /home/gbase/json2_6.txt
[{"id":"1","name":"First"},{"id":"2","name":"second no comer between line"}]
{"id":"3","name":"Third"}
gbase> truncate table json;
Query OK, 2 rows affected (Elapsed: 00:00:00.11)
gbase> load data infile 'sftp://gbase:gbase1234@10.0.2.103/home/gbase/json2_6.txt' into table json data_format 6;
Query OK, 3 rows affected (Elapsed: 00:00:00.71)
Task 40 finished, Loaded 3 records, Skipped 0 records
gbase> select * from json;
+------+------------------------------+
| id | name |
+------+------------------------------+
| 1 | First |
| 3 | Third |
| 2 | second no comer between line |
+------+------------------------------+
3 rows in set (Elapsed: 00:00:00.01)
注意事项
嵌套
不支持json的值又嵌套了另一个json格式。比如 {"id":{"id1":"1","id2":"2"}}
数据切块
一个大文件,可以切块,由不同的节点处理,来提高处理速度。但如果有如下情况,不支持切块。此时整个数据文件视同一个json对象整体处理,不可拆分。同时,可以突破单行数据 gbase_loader_max_line_length的参数上限。 但要求所有的数据格式完全正确,不允许出现格式错误。
数据连续存放
也就是不包含分隔符的,且指定line terminated by '' 为空。
数据值中包含了行分隔符的
指定了 having lines separator参数。
常量固定值true,false,null的要求
按照json规定,需要写成false,true和null, 必须是小写,且不能用双引号包围。