Skip to content

Commit 61633d2

Browse files
Fix: FDW OPTIONS encoding accepts symbolic names (issue #1726) (#1727)
* Fix: FDW OPTIONS encoding accepts symbolic names (issue #1726) Both the FDW catalog reader (src/backend/access/external/external.c) and the gp_exttable_fdw option validator (gpcontrib/gp_exttable_fdw/option.c) parsed the "encoding" OPTIONS value with atoi(). atoi("UTF8") returns 0 (PG_SQL_ASCII) and PG_VALID_ENCODING(0) is true, so symbolic names like 'UTF8', 'utf-8', 'GBK' silently fell through validation and were stored as SQL_ASCII at read time. By contrast, the legacy CREATE EXTERNAL TABLE ... ENCODING ... path resolves names via pg_char_to_encoding() and persists a numeric form into OPTIONS — only the FDW OPTIONS entry point bypassed that translation. Add a small shared helper parse_fdw_encoding_option(const char *) in src/backend/access/external/external.c (declared in src/include/access/external.h): - first try pg_char_to_encoding(name) — same logic as the legacy path; - otherwise try a strict numeric form via strtol() with end-of-string and PG_VALID_ENCODING() checks (atoi is intentionally avoided, since atoi("UTF8")==0 is the bug being fixed); - otherwise ereport(ERROR). Both the validator and GetExtFromForeignTableOptions() call this helper. On-disk values in pg_foreign_table.ftoptions are stored verbatim as the user wrote them; correctness is established at read time. This avoids a ProcessUtility_hook approach, which is unworkable here because the extension's _PG_init runs lazily on the first dlopen, after the current statement's hook check has already passed. Affected scope: gp_exttable_fdw (used by gp_exttable_server). The standalone pxf_fdw is unaffected — its validator already routes encoding through ProcessCopyOptions, which is name-aware. Behavior change on upgrade: existing rows whose ftoptions literally contain encoding=<name> have, until now, been silently interpreted as SQL_ASCII. After this fix they are interpreted as the named encoding. This will be called out in the release notes; a detection query is provided in the PR description for operators who wish to pin specific tables to numeric form before upgrade. Tests added in gpcontrib/gp_exttable_fdw/{input,output}/gp_exttable_fdw.source cover encoding '6' / 'UTF8' / 'utf-8' / 'GBK' / 'bogus' and an ALTER FOREIGN TABLE ... OPTIONS (SET encoding 'UTF8') path. The pre-existing encoding '-1' error case has its expected error message updated to match the new helper's wording. * test: pad expected output headers to match psql separator widths The new tests added in the previous commit had column header lines without the trailing-space padding that psql's aligned output emits to match the separator. The pre-existing ext_special_uri header (' a | b') was also unintentionally stripped of its trailing space during the same edit. Pure whitespace fix. No behavior change. * test: drop trailing blank line in gp_exttable_fdw expected output pg_regress diffs the expected and actual .out files strictly, including the final newline count. The new encoding test block ended with a stray empty line (";\n\n") while psql produces ";\n", causing a 1-line diff at end-of-file. Pure whitespace fix. * test: reject mixed numeric+letters in FDW encoding option Add a regression case for `encoding '6abc'`. atoi("6abc") would have silently returned 6 (= UTF8), which is the class of bug that motivated moving the FDW encoding option parser off atoi() and onto a strict strtol() form in parse_fdw_encoding_option(). Without this test, the strictness of the numeric path was not directly exercised — only the "unknown name" path ('bogus') was. Pure test addition; no code change. Lands the third of the reviewer's suggestions on issue #1726 (the first two — strict strtol parsing and a single shared helper between the validator and the read path — were already in place in the original fix commit). * ci: retrigger to clear flaky alter_distribution_policy --------- Co-authored-by: chenqiang <chenqiang@hashdata.cn>
1 parent cd3c88f commit 61633d2

5 files changed

Lines changed: 220 additions & 9 deletions

File tree

gpcontrib/gp_exttable_fdw/input/gp_exttable_fdw.source

Lines changed: 71 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,12 +53,26 @@ OPTIONS (format_type 'c', delimiter ',',
5353
location_uris 'file://@hostname@@abs_srcdir@/data/tableless.csv',
5454
reject_limit_type 'p', reject_limit '120');
5555

56-
-- Error, invalid encoding
56+
-- Error, invalid encoding (negative numeric ID)
5757
CREATE FOREIGN TABLE tableless_ext_fdw(a int, b int)
5858
SERVER gp_exttable_server
5959
OPTIONS (format_type 'c', delimiter ',', encoding '-1',
6060
location_uris 'file://@hostname@@abs_srcdir@/data/tableless.csv');
6161

62+
-- Error, invalid encoding (unknown name)
63+
CREATE FOREIGN TABLE tableless_ext_fdw(a int, b int)
64+
SERVER gp_exttable_server
65+
OPTIONS (format_type 'c', delimiter ',', encoding 'bogus',
66+
location_uris 'file://@hostname@@abs_srcdir@/data/tableless.csv');
67+
68+
-- Error, mixed numeric+letters must not be silently truncated to a
69+
-- valid prefix (atoi('6abc') would return 6 = UTF8; strict parsing
70+
-- in parse_fdw_encoding_option() rejects it).
71+
CREATE FOREIGN TABLE tableless_ext_fdw(a int, b int)
72+
SERVER gp_exttable_server
73+
OPTIONS (format_type 'c', delimiter ',', encoding '6abc',
74+
location_uris 'file://@hostname@@abs_srcdir@/data/tableless.csv');
75+
6276
-- OK, no execute_on | log_errors | encoding | is_writable option
6377
CREATE FOREIGN TABLE tableless_ext_fdw(a int, b int)
6478
SERVER gp_exttable_server
@@ -79,3 +93,59 @@ SELECT urilocation FROM pg_exttable WHERE reloid = 'public.ext_special_uri'::reg
7993
SELECT ftoptions FROM pg_foreign_table WHERE ftrelid='public.ext_special_uri'::regclass;
8094
\a
8195
SELECT * FROM ext_special_uri ORDER BY a;
96+
97+
-- ===================================================================
98+
-- Tests for issue #1726: FDW OPTIONS encoding accepts both numeric IDs
99+
-- and symbolic names (UTF8, utf-8, GBK, ...). Names previously parsed
100+
-- via atoi() and silently degraded to SQL_ASCII.
101+
-- ===================================================================
102+
103+
-- Numeric form (baseline; worked before the fix as well).
104+
CREATE FOREIGN TABLE ext_enc_num (a int) SERVER gp_exttable_server
105+
OPTIONS (format_type 'c', delimiter ',',
106+
location_uris 'file:///tmp/ext_enc_ignored.csv',
107+
encoding '6');
108+
SELECT pg_encoding_to_char(encoding) FROM pg_exttable
109+
WHERE reloid = 'ext_enc_num'::regclass;
110+
111+
-- Symbolic name 'UTF8' — used to be silently SQL_ASCII (the bug).
112+
CREATE FOREIGN TABLE ext_enc_utf8 (a int) SERVER gp_exttable_server
113+
OPTIONS (format_type 'c', delimiter ',',
114+
location_uris 'file:///tmp/ext_enc_ignored.csv',
115+
encoding 'UTF8');
116+
SELECT pg_encoding_to_char(encoding) FROM pg_exttable
117+
WHERE reloid = 'ext_enc_utf8'::regclass;
118+
119+
-- Case + dash variant resolved by pg_char_to_encoding().
120+
CREATE FOREIGN TABLE ext_enc_utf8_dash (a int) SERVER gp_exttable_server
121+
OPTIONS (format_type 'c', delimiter ',',
122+
location_uris 'file:///tmp/ext_enc_ignored.csv',
123+
encoding 'utf-8');
124+
SELECT pg_encoding_to_char(encoding) FROM pg_exttable
125+
WHERE reloid = 'ext_enc_utf8_dash'::regclass;
126+
127+
-- Non-UTF8 symbolic name.
128+
CREATE FOREIGN TABLE ext_enc_gbk (a int) SERVER gp_exttable_server
129+
OPTIONS (format_type 'c', delimiter ',',
130+
location_uris 'file:///tmp/ext_enc_ignored.csv',
131+
encoding 'GBK');
132+
SELECT pg_encoding_to_char(encoding) FROM pg_exttable
133+
WHERE reloid = 'ext_enc_gbk'::regclass;
134+
135+
-- ALTER FOREIGN TABLE ... OPTIONS (SET encoding 'UTF8') — same code
136+
-- path, this proves the read-side resolution works after an ALTER too.
137+
CREATE FOREIGN TABLE ext_enc_alter (a int) SERVER gp_exttable_server
138+
OPTIONS (format_type 'c', delimiter ',',
139+
location_uris 'file:///tmp/ext_enc_ignored.csv',
140+
encoding '0');
141+
SELECT pg_encoding_to_char(encoding) FROM pg_exttable
142+
WHERE reloid = 'ext_enc_alter'::regclass;
143+
ALTER FOREIGN TABLE ext_enc_alter OPTIONS (SET encoding 'UTF8');
144+
SELECT pg_encoding_to_char(encoding) FROM pg_exttable
145+
WHERE reloid = 'ext_enc_alter'::regclass;
146+
147+
DROP FOREIGN TABLE ext_enc_num;
148+
DROP FOREIGN TABLE ext_enc_utf8;
149+
DROP FOREIGN TABLE ext_enc_utf8_dash;
150+
DROP FOREIGN TABLE ext_enc_gbk;
151+
DROP FOREIGN TABLE ext_enc_alter;

gpcontrib/gp_exttable_fdw/option.c

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -135,11 +135,13 @@ gp_exttable_permission_check(PG_FUNCTION_ARGS)
135135
}
136136
else if(pg_strcasecmp(def->defname, "encoding") == 0)
137137
{
138-
char *encoding = (char *) defGetString(def);
139-
if (!PG_VALID_ENCODING(atoi(encoding)))
140-
ereport(ERROR,
141-
(errcode(ERRCODE_FDW_INVALID_ATTRIBUTE_VALUE),
142-
errmsg("%s is not a valid encoding code", encoding)));
138+
/*
139+
* Accept either a symbolic encoding name (e.g. 'UTF8', 'GBK')
140+
* or a numeric encoding ID. Reject anything else explicitly,
141+
* rather than letting atoi() silently mistranslate non-numeric
142+
* names to SQL_ASCII.
143+
*/
144+
(void) parse_fdw_encoding_option((char *) defGetString(def));
143145
}
144146
}
145147

gpcontrib/gp_exttable_fdw/output/gp_exttable_fdw.source

Lines changed: 95 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,12 +52,26 @@ OPTIONS (format_type 'c', delimiter ',',
5252
location_uris 'file://@hostname@@abs_srcdir@/data/tableless.csv',
5353
reject_limit_type 'p', reject_limit '120');
5454
ERROR: segment reject limit in PERCENT must be between 1 and 100 (got 120) (seg1 127.0.0.1:7003 pid=5173)
55-
-- Error, invalid encoding
55+
-- Error, invalid encoding (negative numeric ID)
5656
CREATE FOREIGN TABLE tableless_ext_fdw(a int, b int)
5757
SERVER gp_exttable_server
5858
OPTIONS (format_type 'c', delimiter ',', encoding '-1',
5959
location_uris 'file://@hostname@@abs_srcdir@/data/tableless.csv');
60-
ERROR: -1 is not a valid encoding code (seg0 127.0.0.1:7002 pid=8289)
60+
ERROR: "-1" is not a valid encoding name or code
61+
-- Error, invalid encoding (unknown name)
62+
CREATE FOREIGN TABLE tableless_ext_fdw(a int, b int)
63+
SERVER gp_exttable_server
64+
OPTIONS (format_type 'c', delimiter ',', encoding 'bogus',
65+
location_uris 'file://@hostname@@abs_srcdir@/data/tableless.csv');
66+
ERROR: "bogus" is not a valid encoding name or code
67+
-- Error, mixed numeric+letters must not be silently truncated to a
68+
-- valid prefix (atoi('6abc') would return 6 = UTF8; strict parsing
69+
-- in parse_fdw_encoding_option() rejects it).
70+
CREATE FOREIGN TABLE tableless_ext_fdw(a int, b int)
71+
SERVER gp_exttable_server
72+
OPTIONS (format_type 'c', delimiter ',', encoding '6abc',
73+
location_uris 'file://@hostname@@abs_srcdir@/data/tableless.csv');
74+
ERROR: "6abc" is not a valid encoding name or code
6175
-- OK, no execute_on | log_errors | encoding | is_writable option
6276
CREATE FOREIGN TABLE tableless_ext_fdw(a int, b int)
6377
SERVER gp_exttable_server
@@ -96,3 +110,82 @@ SELECT * FROM ext_special_uri ORDER BY a;
96110
3 | 3
97111
(3 rows)
98112

113+
-- ===================================================================
114+
-- Tests for issue #1726: FDW OPTIONS encoding accepts both numeric IDs
115+
-- and symbolic names (UTF8, utf-8, GBK, ...). Names previously parsed
116+
-- via atoi() and silently degraded to SQL_ASCII.
117+
-- ===================================================================
118+
-- Numeric form (baseline; worked before the fix as well).
119+
CREATE FOREIGN TABLE ext_enc_num (a int) SERVER gp_exttable_server
120+
OPTIONS (format_type 'c', delimiter ',',
121+
location_uris 'file:///tmp/ext_enc_ignored.csv',
122+
encoding '6');
123+
SELECT pg_encoding_to_char(encoding) FROM pg_exttable
124+
WHERE reloid = 'ext_enc_num'::regclass;
125+
pg_encoding_to_char
126+
---------------------
127+
UTF8
128+
(1 row)
129+
130+
-- Symbolic name 'UTF8' — used to be silently SQL_ASCII (the bug).
131+
CREATE FOREIGN TABLE ext_enc_utf8 (a int) SERVER gp_exttable_server
132+
OPTIONS (format_type 'c', delimiter ',',
133+
location_uris 'file:///tmp/ext_enc_ignored.csv',
134+
encoding 'UTF8');
135+
SELECT pg_encoding_to_char(encoding) FROM pg_exttable
136+
WHERE reloid = 'ext_enc_utf8'::regclass;
137+
pg_encoding_to_char
138+
---------------------
139+
UTF8
140+
(1 row)
141+
142+
-- Case + dash variant resolved by pg_char_to_encoding().
143+
CREATE FOREIGN TABLE ext_enc_utf8_dash (a int) SERVER gp_exttable_server
144+
OPTIONS (format_type 'c', delimiter ',',
145+
location_uris 'file:///tmp/ext_enc_ignored.csv',
146+
encoding 'utf-8');
147+
SELECT pg_encoding_to_char(encoding) FROM pg_exttable
148+
WHERE reloid = 'ext_enc_utf8_dash'::regclass;
149+
pg_encoding_to_char
150+
---------------------
151+
UTF8
152+
(1 row)
153+
154+
-- Non-UTF8 symbolic name.
155+
CREATE FOREIGN TABLE ext_enc_gbk (a int) SERVER gp_exttable_server
156+
OPTIONS (format_type 'c', delimiter ',',
157+
location_uris 'file:///tmp/ext_enc_ignored.csv',
158+
encoding 'GBK');
159+
SELECT pg_encoding_to_char(encoding) FROM pg_exttable
160+
WHERE reloid = 'ext_enc_gbk'::regclass;
161+
pg_encoding_to_char
162+
---------------------
163+
GBK
164+
(1 row)
165+
166+
-- ALTER FOREIGN TABLE ... OPTIONS (SET encoding 'UTF8') — same code
167+
-- path, this proves the read-side resolution works after an ALTER too.
168+
CREATE FOREIGN TABLE ext_enc_alter (a int) SERVER gp_exttable_server
169+
OPTIONS (format_type 'c', delimiter ',',
170+
location_uris 'file:///tmp/ext_enc_ignored.csv',
171+
encoding '0');
172+
SELECT pg_encoding_to_char(encoding) FROM pg_exttable
173+
WHERE reloid = 'ext_enc_alter'::regclass;
174+
pg_encoding_to_char
175+
---------------------
176+
SQL_ASCII
177+
(1 row)
178+
179+
ALTER FOREIGN TABLE ext_enc_alter OPTIONS (SET encoding 'UTF8');
180+
SELECT pg_encoding_to_char(encoding) FROM pg_exttable
181+
WHERE reloid = 'ext_enc_alter'::regclass;
182+
pg_encoding_to_char
183+
---------------------
184+
UTF8
185+
(1 row)
186+
187+
DROP FOREIGN TABLE ext_enc_num;
188+
DROP FOREIGN TABLE ext_enc_utf8;
189+
DROP FOREIGN TABLE ext_enc_utf8_dash;
190+
DROP FOREIGN TABLE ext_enc_gbk;
191+
DROP FOREIGN TABLE ext_enc_alter;

src/backend/access/external/external.c

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,47 @@
3434

3535
static List *create_external_scan_uri_list(ExtTableEntry *ext, bool *ismasteronly);
3636

37+
/*
38+
* parse_fdw_encoding_option
39+
*
40+
* Parse the value of an "encoding" FDW OPTIONS entry (whether on creation,
41+
* during validation, or when reading back stored ftoptions) into a numeric
42+
* encoding ID. Accepts a symbolic encoding name (e.g. "UTF8", "utf-8", "GBK")
43+
* resolved via pg_char_to_encoding(), or a strictly numeric string (e.g. "6")
44+
* validated via PG_VALID_ENCODING(). Anything else raises ERROR.
45+
*
46+
* Note: atoi() is intentionally avoided in the numeric fallback. atoi("UTF8")
47+
* silently returns 0 (= SQL_ASCII), which is exactly the bug this helper
48+
* exists to fix. strtol() with end-of-string and range checks is strict.
49+
*/
50+
int
51+
parse_fdw_encoding_option(const char *value)
52+
{
53+
int encoding;
54+
char *endptr;
55+
long n;
56+
57+
if (value == NULL || *value == '\0')
58+
ereport(ERROR,
59+
(errcode(ERRCODE_FDW_INVALID_ATTRIBUTE_VALUE),
60+
errmsg("encoding option must not be empty")));
61+
62+
encoding = pg_char_to_encoding(value);
63+
if (encoding >= 0)
64+
return encoding;
65+
66+
errno = 0;
67+
n = strtol(value, &endptr, 10);
68+
if (endptr != value && *endptr == '\0' && errno == 0 &&
69+
n >= 0 && PG_VALID_ENCODING((int) n))
70+
return (int) n;
71+
72+
ereport(ERROR,
73+
(errcode(ERRCODE_FDW_INVALID_ATTRIBUTE_VALUE),
74+
errmsg("\"%s\" is not a valid encoding name or code", value)));
75+
return -1; /* unreachable, keeps compiler happy */
76+
}
77+
3778
void
3879
gfile_printf_then_putc_newline(const char *format,...)
3980
{
@@ -277,7 +318,7 @@ GetExtFromForeignTableOptions(List *ftoptons, Oid relid)
277318

278319
if (pg_strcasecmp(def->defname, "encoding") == 0)
279320
{
280-
extentry->encoding = atoi(defGetString(def));
321+
extentry->encoding = parse_fdw_encoding_option(defGetString(def));
281322
encoding_found = true;
282323
continue;
283324
}

src/include/access/external.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,5 +48,10 @@ extern ExtTableEntry *GetExtFromForeignTableOptions(List *ftoptons, Oid relid);
4848

4949
extern ExternalScanInfo *MakeExternalScanInfo(ExtTableEntry *extEntry);
5050

51+
/*
52+
* Parse an "encoding" FDW OPTIONS value (symbolic name or numeric string)
53+
* into a numeric encoding ID. ereports on invalid input.
54+
*/
55+
extern int parse_fdw_encoding_option(const char *value);
5156

5257
#endif /* EXTERNAL_H */

0 commit comments

Comments
 (0)