Skip to content

Commit ca0c414

Browse files
authored
Merge pull request #261 from bigplaice/support-pgbulkload-pgbigm
support pg_bulkload and pg_bigm extension
2 parents 598d158 + 0abd304 commit ca0c414

7 files changed

Lines changed: 297 additions & 1 deletion

File tree

CN/modules/ROOT/pages/master/ecosystem_components/ecosystem_overview.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ IvorySQL 作为一款兼容 Oracle 且基于 PostgreSQL 的高级开源数据库
3131
| 18 | xref:master/ecosystem_components/pg_hint_plan.adoc[pg_hint_plan] | PG18 | 通过SQL注释中的hints控制执行计划,在不修改SQL逻辑的情况下优化查询性能 | 查询性能优化、执行计划调优、数据库性能分析
3232
| 19 | xref:master/ecosystem_components/redis_fdw.adoc[redis_fdw] | PG18 | 将 Redis 数据映射为 PostgreSQL 外部表,支持通过标准 SELECT/INSERT/UPDATE/DELETE 语句读写 Redis | 统一 SQL 查询、轻量级数据同步、透明化缓存读写及跨库数据分析
3333
| 20 | xref:master/ecosystem_components/age.adoc[Apache AGE] | 1.7.0 | 为 IvorySQL 提供图数据库处理能力,支持 openCypher 查询语言,实现关系型与图数据库的混合使用 | 社交网络分析、知识图谱、欺诈检测、推荐系统、路径规划
34+
| 21 | xref:master/ecosystem_components/pg_bulkload.adoc[pg_bulkload] | 3.1.23 | 为IvorySQL提供高速数据载入工具,可以跳过PG的共享缓存直接将数据导入表中 | 海量数据初始加载,历史数据归档,跨库迁移
35+
| 22 | xref:master/ecosystem_components/pg_bigm.adoc[pg_bigm] | 1.2 | 为 IvorySQL 提供二元分词全文检索能力,适配中日韩文本,快速实现模糊检索与相似度查询 | 中日韩文内容、商品、地址类文字搜索场景
3436
|====
3537

3638
这些插件均经过 IvorySQL 团队的测试和适配,确保在 IvorySQL 环境下稳定运行。用户可以根据业务需求选择合适的插件,进一步提升数据库系统的能力和灵活性。
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
2+
:sectnums:
3+
:sectnumlevels: 5
4+
5+
= pg_bigm
6+
7+
== 概述
8+
pg_bigm 为 IvorySQL 提供二元分词全文检索能力,适配中日韩文本,快速实现模糊检索与相似度查询。
9+
10+
== 安装
11+
IvorySQL的安装包里已经集成了 pg_bigm 插件,如果使用安装包安装的IvorySQL,通常不需要再手动安装 pg_bigm 即可使用。其它安装方式可以参考下面的源码安装步骤。
12+
13+
[TIP]
14+
源码安装环境为 Ubuntu 24.04(x86_64),环境中已经安装了IvorySQL5及以上版本,安装路径为/usr/ivory-5
15+
16+
=== 源码安装
17+
从https://github.com/pgbigm/pg_bigm/releases/tag/v1.2-20250903 下载 pg_bigm v1.2 代码。
18+
19+
** 安装 pg_bigm
20+
[literal]
21+
----
22+
# 源代码解压缩后进入其目录
23+
cd pg_bigm-1.2-20250903
24+
25+
# 编译代码,设置PG_CONFIG环境变量值为pg_config路径,eg:/usr/ivory-5/bin/pg_config
26+
make USE_PGXS=1 PG_CONFIG=/usr/ivory-5/bin/pg_config
27+
sudo make USE_PGXS=1 PG_CONFIG=/usr/ivory-5/bin/pg_config install
28+
----
29+
30+
== 创建Extension
31+
psql 连接到数据库,执行如下命令:
32+
[literal]
33+
----
34+
ivorysql=# CREATE EXTENSION pg_bigm;
35+
CREATE EXTENSION
36+
37+
ivorysql=# SELECT * FROM pg_available_extensions WHERE name = 'pg_bigm';
38+
name | default_version | installed_version | comment
39+
---------+-----------------+-------------------+------------------------------------------------------------------
40+
pg_bigm | 1.2 | 1.2 | text similarity measurement and index searching based on bigrams
41+
(1 row)
42+
----
43+
44+
== 使用
45+
[literal]
46+
----
47+
# 首先创建索引
48+
CREATE TABLE pg_tools (tool text, description text);
49+
50+
INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL');
51+
INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
52+
INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
53+
INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');
54+
55+
CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);
56+
57+
# 执行全文检索
58+
SELECT * FROM pg_tools WHERE description LIKE '%search%';
59+
60+
tool | description
61+
---------+---------------------------------------------------------------------
62+
pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
63+
pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
64+
(2 rows)
65+
----
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
2+
:sectnums:
3+
:sectnumlevels: 5
4+
5+
= pg_bulkload
6+
7+
== 概述
8+
pg_bulkload 是 PostgreSQL 高性能批量数据导入插件,可绕过常规读写机制大幅提升文本数据入库效率,支持数据过滤、错误容错、并行加载与断点恢复,适用于海量历史数据迁移、数仓批量同步、离线日志入库等大批量数据加载场景,相比原生 COPY 命令导入速度显著提升。
9+
10+
== 安装
11+
IvorySQL的安装包里已经集成了pg_bulkload插件,如果使用安装包安装的IvorySQL,通常不需要再手动安装pg_bulkload即可使用。其它安装方式可以参考下面的源码安装步骤。
12+
13+
[TIP]
14+
源码安装环境为 Ubuntu 24.04(x86_64),环境中已经安装了IvorySQL5及以上版本,安装路径为/usr/ivory-5
15+
16+
=== 源码安装
17+
从https://github.com/ossc-db/pg_bulkload/releases/tag/VERSION3_1_23 下载 pg_bulkload v3.1.23代码。
18+
19+
** 安装依赖
20+
[literal]
21+
----
22+
sudo apt install gawk
23+
----
24+
25+
** 安装pg_bulkload
26+
[literal]
27+
----
28+
# 源代码解压缩后进入其目录
29+
cd pg_bulkload-VERSION3_1_23
30+
31+
# 修改 Makefile 以适应IvorySQl的 non-PIE 静态库
32+
33+
LDFLAGS+=-Wl,--build-id
34+
+
35+
+# Workaround for non-PIE static libraries (e.g., IvorySQL's libpgcommon.a)
36+
+# Some distributions build libpgcommon.a without -fPIE, causing link failures
37+
+# when the system defaults to PIE executables.
38+
+ifdef DISABLE_PIE
39+
+CFLAGS+=-no-pie
40+
+LDFLAGS+=-no-pie
41+
+endif
42+
43+
# 编译代码,设置PG_CONFIG环境变量值为pg_config路径,eg:/usr/ivory-5/bin/pg_config
44+
make PG_CONFIG=/usr/ivory-5/bin/pg_config clean
45+
make PG_CONFIG=/usr/ivory-5/bin/pg_config DISABLE_PIE=1
46+
sudo make PG_CONFIG=/usr/ivory-5/bin/pg_config
47+
----
48+
49+
== 创建Extension
50+
psql 连接到数据库,执行如下命令:
51+
[literal]
52+
----
53+
ivorysql=# CREATE extension pg_bulkload;
54+
CREATE EXTENSION
55+
56+
ivorysql=# SELECT * FROM pg_available_extensions WHERE name = 'pg_bulkload';
57+
name | default_version | installed_version | comment
58+
-------------+-----------------+-------------------+-----------------------------------------------------------------
59+
pg_bulkload | 3.1.23 | 3.1.23 | pg_bulkload is a high speed data loading utility for PostgreSQL
60+
(1 row)
61+
----
62+
63+
== 使用
64+
[literal]
65+
----
66+
ivorysql=# create database testdb;
67+
CREATE DATABASE
68+
69+
ivorysql=# \c testdb
70+
You are now connected to database "testdb" as user "highgo".
71+
72+
testdb=# create table tb_asher (id int,name text);
73+
CREATE TABLE
74+
testdb=# \q
75+
76+
# 模拟CSV 文件
77+
seq 100000| awk '{print $0"|asher"}' > bulk_asher.txt
78+
79+
# 将bulk_asher.txt里的数据加载到testdb 库下的 tb_asher表中
80+
/usr/ivory-5/bin/pg_bulkload -i ./bulk_asher.txt -O tb_asher -l ./tb_asher_output.log -P ./tb_asher_bad.txt -o "TYPE=CSV" -o "DELIMITER=|" -d testdb -U highgo -h 127.0.0.1
81+
----

CN/modules/ROOT/pages/master/ecosystem_components/pgrouting.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ make
3939
sudo make install
4040
```
4141

42-
== 创建Extension并确认ddlx版本
42+
== 创建Extension并确认版本
4343

4444
psql 连接到数据库,执行如下命令:
4545
```

EN/modules/ROOT/pages/master/ecosystem_components/ecosystem_overview.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ IvorySQL, as an advanced open-source database compatible with Oracle and based o
3232
|*18*| xref:master/ecosystem_components/pg_hint_plan.adoc[pg_hint_plan] | PG18 | Controls execution plans via SQL comment hints, optimizing query performance without modifying SQL logic | Query performance optimization, execution plan tuning, database performance analysis
3333
|*19*| xref:master/ecosystem_components/redis_fdw.adoc[redis_fdw] | PG18 | connects PostgreSQL with Redis, supporting standard SQL operations including SELECT, INSERT, UPDATE, and DELETE | unified SQL querying, lightweight data synchronization, transparent cache access, and cross-database data analysis
3434
|*20*| xref:master/ecosystem_components/age.adoc[Apache AGE] | 1.7.0 | Provides graph database capabilities to IvorySQL, supports openCypher query language, enabling hybrid use of relational and graph databases | Social network analysis, knowledge graphs, fraud detection, recommendation systems, route planning
35+
|*21*| xref:master/ecosystem_components/pg_bulkload.adoc[pg_bulkload] | 3.1.23 | Provides high-speed data loading tool for IvorySQL, which directly imports data into tables bypassing PostgreSQL shared buffers | initial loading of massive data, historical data archiving and cross-database migration
36+
|*22*| xref:master/ecosystem_components/pg_bigm.adoc[pg_bigm] | 1.2 | Equips IvorySQL with bigram full-text search capability, supporting Chinese, Japanese and Korean texts to implement fuzzy retrieval and similarity query efficiently | text search for articles, commodities and addresses in CJK languages
3537
|====
3638

3739
These plugins have all been tested and adapted by the IvorySQL team to ensure stable operation in the IvorySQL environment. Users can select appropriate plugins based on business needs to further enhance the capabilities and flexibility of the database system.
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
2+
:sectnums:
3+
:sectnumlevels: 5
4+
5+
= pg_bigm
6+
7+
== Overview
8+
pg_bigm provides bigram-based full-text search capabilities for IvorySQL. It is optimized for Chinese, Japanese, and Korean (CJK) text, enabling fuzzy search and similarity queries.
9+
10+
== Installation
11+
The IvorySQL installation package already includes the pg_bigm extension. If you installed IvorySQL using the installation package, you typically do not need to manually install pg_bigm. For other installation methods, refer to the source code installation steps below.
12+
13+
[TIP]
14+
The source code installation environment is Ubuntu 24.04 (x86_64), with IvorySQL 5 or later already installed at `/usr/ivory-5`.
15+
16+
=== Source Code Installation
17+
Download the pg_bigm v1.2 source code from https://github.com/pgbigm/pg_bigm/releases/tag/v1.2-20250903.
18+
19+
** Install pg_bigm
20+
[literal]
21+
----
22+
# Extract the source code and enter its directory
23+
cd pg_bigm-1.2-20250903
24+
25+
# Build the code, setting the PG_CONFIG environment variable to the path of pg_config, e.g., /usr/ivory-5/bin/pg_config
26+
make USE_PGXS=1 PG_CONFIG=/usr/ivory-5/bin/pg_config
27+
sudo make USE_PGXS=1 PG_CONFIG=/usr/ivory-5/bin/pg_config install
28+
----
29+
30+
== Create Extension
31+
Connect to the database via psql and execute the following commands:
32+
[literal]
33+
----
34+
ivorysql=# CREATE EXTENSION pg_bigm;
35+
CREATE EXTENSION
36+
37+
ivorysql=# SELECT * FROM pg_available_extensions WHERE name = 'pg_bigm';
38+
name | default_version | installed_version | comment
39+
---------+-----------------+-------------------+------------------------------------------------------------------
40+
pg_bigm | 1.2 | 1.2 | text similarity measurement and index searching based on bigrams
41+
(1 row)
42+
----
43+
44+
== Usage
45+
[literal]
46+
----
47+
# First, create the index
48+
CREATE TABLE pg_tools (tool text, description text);
49+
50+
INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL');
51+
INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
52+
INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
53+
INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');
54+
55+
CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);
56+
57+
# Execute full-text search
58+
SELECT * FROM pg_tools WHERE description LIKE '%search%';
59+
60+
tool | description
61+
---------+---------------------------------------------------------------------
62+
pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
63+
pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
64+
(2 rows)
65+
----
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
2+
:sectnums:
3+
:sectnumlevels: 5
4+
5+
= pg_bulkload
6+
7+
== Overview
8+
pg_bulkload is a high-performance bulk data loading extension for PostgreSQL. It bypasses conventional read/write mechanisms to significantly improve the efficiency of importing text data into the database. It supports data filtering, error tolerance, parallel loading, and checkpoint recovery, making it suitable for scenarios such as large-scale historical data migration, data warehouse batch synchronization, and offline log ingestion. Compared to the native COPY command, pg_bulkload delivers a notable improvement in import speed.
9+
10+
== Installation
11+
The IvorySQL installation package already includes the pg_bulkload extension. If you installed IvorySQL using the installation package, you typically do not need to manually install pg_bulkload. For other installation methods, refer to the source code installation steps below.
12+
13+
[TIP]
14+
The source code installation environment is Ubuntu 24.04 (x86_64), with IvorySQL 5 or later already installed at `/usr/ivory-5`.
15+
16+
=== Source Code Installation
17+
Download the pg_bulkload v3.1.23 source code from https://github.com/ossc-db/pg_bulkload/releases/tag/VERSION3_1_23.
18+
19+
** Install Dependencies
20+
[literal]
21+
----
22+
sudo apt install gawk
23+
----
24+
25+
** Install pg_bulkload
26+
[literal]
27+
----
28+
# Extract the source code and enter its directory
29+
cd pg_bulkload-VERSION3_1_23
30+
31+
# Modify the Makefile to accommodate IvorySQL's non-PIE static libraries
32+
33+
LDFLAGS+=-Wl,--build-id
34+
+
35+
+# Workaround for non-PIE static libraries (e.g., IvorySQL's libpgcommon.a)
36+
+# Some distributions build libpgcommon.a without -fPIE, causing link failures
37+
+# when the system defaults to PIE executables.
38+
+ifdef DISABLE_PIE
39+
+CFLAGS+=-no-pie
40+
+LDFLAGS+=-no-pie
41+
+endif
42+
43+
# Build the code, setting the PG_CONFIG environment variable to the path of pg_config, e.g., /usr/ivory-5/bin/pg_config
44+
make PG_CONFIG=/usr/ivory-5/bin/pg_config clean
45+
make PG_CONFIG=/usr/ivory-5/bin/pg_config DISABLE_PIE=1
46+
sudo make PG_CONFIG=/usr/ivory-5/bin/pg_config
47+
----
48+
49+
== Create Extension
50+
Connect to the database via psql and execute the following commands:
51+
[literal]
52+
----
53+
ivorysql=# CREATE extension pg_bulkload;
54+
CREATE EXTENSION
55+
56+
ivorysql=# SELECT * FROM pg_available_extensions WHERE name = 'pg_bulkload';
57+
name | default_version | installed_version | comment
58+
-------------+-----------------+-------------------+-----------------------------------------------------------------
59+
pg_bulkload | 3.1.23 | 3.1.23 | pg_bulkload is a high speed data loading utility for PostgreSQL
60+
(1 row)
61+
----
62+
63+
== Usage
64+
[literal]
65+
----
66+
ivorysql=# create database testdb;
67+
CREATE DATABASE
68+
69+
ivorysql=# \c testdb
70+
You are now connected to database "testdb" as user "highgo".
71+
72+
testdb=# create table tb_asher (id int,name text);
73+
CREATE TABLE
74+
testdb=# \q
75+
76+
# Generate a sample CSV file
77+
seq 100000| awk '{print $0"|asher"}' > bulk_asher.txt
78+
79+
# Load the data from bulk_asher.txt into the tb_asher table in the testdb database
80+
/usr/ivory-5/bin/pg_bulkload -i ./bulk_asher.txt -O tb_asher -l ./tb_asher_output.log -P ./tb_asher_bad.txt -o "TYPE=CSV" -o "DELIMITER=|" -d testdb -U highgo -h 127.0.0.1
81+
----

0 commit comments

Comments
 (0)