5370 sync upstream sql parser#5372
Conversation
Re-import the entire vendored sql-parser tree from upstream hyrise/sql-parser at commit ccd3f68. Brings in 13 upstream commits since the previous baseline c247124: - 9ca19ce Correctly parse named columns for joins - 1969a3c Allow more expressions for selected statements - 57c763a Make COPY statement options compliant to Postgres - 87e6eac Add FOREIGN KEY constraints to CREATE TABLE - 0663319 Support NULLS FIRST and NULLS LAST in ORDER BY - 3456f97 CSV import options for DELIMITER, NULL and QUOTE - ccd3f68 Add schema to functions - 139375e Include cstdint for GCC-13 - 0ede821 Add some newlines at the end of files - plus four CI-only commits (no effect on vendored code) Adds new file src/sql/ImportExportOptions.h. Drops the local sqlparser_win.h header and all Poco-side patches (export macro, SQLParserResult pointerization, MSVC pragmas, Makefile tweaks), which are re-applied in the follow-up commit.
Restores Poco-side modifications dropped by the upstream sync:
- src/sqlparser_win.h: Windows DLL export macros (SQLParser_API
driven by Data_API when included from Poco, or SQLParser_EXPORTS
standalone), _WIN32 strncasecmp/strcasecmp aliases, GCC visibility
fallback.
- SQLParser_API annotation on every public class/struct/free function
in src/SQLParser.h, src/SQLParserResult.h, all of src/sql/*.h
(including new upstream types CsvOptions, ImportExportOptions,
ReferencesSpecification, ForeignKeyConstraint, ColumnConstraints),
and src/util/sqlhelper.h.
- sqlparser_win.h includes in SQLParserResult.h, sql/ColumnType.h,
sql/SQLStatement.h, util/sqlhelper.h.
- SQLParser.h: [[deprecated(Use parse())]] attributes on the legacy
parseSQLString overloads.
- SQLParserResult.{h,cpp}: pointerize statements_ and parameters_ as
pointer-to-vector with mutable lazy allocation; reset(bool mv)
supports move-construct case; setErrorDetails frees previous
errorMsg_ to plug a leak. Upstream did not touch this file in the
imported range so the patch applied cleanly.
- src/parser/bison_parser.cpp: MSVC pragma (disable: 4996, 4267)
around generated code.
- src/parser/flex_lexer.{cpp,h}: MSVC pragma around generated code,
Windows io.h replacement for unistd.h, __has_include(<stdint.h>)
guard on flex INT*_MIN/MAX fallbacks.
- src/sql/Expr.cpp: MSVC pragma around strncpy use in substr().
- Makefile: add -I. -Isrc/ to LIB_CFLAGS and TEST_CFLAGS, normalize
TEST_CFLAGS to -std=c++17.
Verified: standalone make + make test pass (sql_tests success, no
leaks). Poco Data shared library (debug and release) builds and
links cleanly.
…er.h #5370 Restored fork-local removal of #include sql/statements.h from SQLParser.h. The include was originally pulled in transitively by SQLParser.h, but it dragged the full per-statement type set into every translation unit that included SQLParser.h, which created ambiguity issues elsewhere in the Poco tree. Consumers that need the concrete statement types include sql/statements.h directly. SQLParserResult.h still pulls in SQLStatement.h, which is enough for Poco::Data::Statement which only touches SQLStatement::type().
…5370 Adds a Linux job that builds the vendored sql-parser sources with its own Makefile and runs the upstream test suite. Closes the coverage gap where Poco CI compiles SQLParser into libPocoData but only exercises it indirectly through Poco::Data::Statement smoke tests in DataTest. The new job actually parses queries-good.sql and queries-bad.sql, runs valgrind for leaks, and re-runs bison to check the grammar is conflict-free. Triggered by the new sqlparser path filter (Data/SQLParser only), so it does not run on every Data change. bison, flex, and valgrind are installed via apt; ubuntu-24.04 ships versions well above the 3.0 / 2.6 minimums the Makefile requires.
…5370 Extends the vendored hyrise sql-parser to recognize two placeholder syntaxes in addition to the existing standard ?: - $N (postgresql 1-based explicit numbering, e.g. WHERE a = $1) - :name (sqlite named, e.g. WHERE a = :user) Implementation: - New lexer rules for the two patterns, placed before the punctuation catch-all. $ was previously unused. : remained a punctuation token but had no grammar references, so the longer :identifier match wins without conflict. - New bison tokens DOLLAR_PARAM (carries the int N) and NAMED_PARAM (carries the identifier). param_expr now accepts all three forms. $0 is rejected via YYERROR at parse time. - Three ExprType values: kExprParameter for ?, kExprParameterDollar for $N (ival = N), kExprParameterNamed for :name (name = identifier). This keeps consumer switch statements explicit instead of overloading a single enum with discriminator fields. - Factory functions Expr::makeDollarParameter and Expr::makeNamedParameter. isLiteral returns true for all three. The destructor already frees name so no further changes are needed. - The top-level input rule only renumbers ? placeholders sequentially; $N keeps its explicit ival, :name keeps its name. addParameter still sorts by ival, which now orders $N by explicit position and clusters named placeholders. - sqlhelper print case extended to handle the two new ExprType values. - sql_asserts.h now pulls sql/statements.h, restoring the include that prepare_tests.cpp and friends used to get transitively through SQLParser.h before the earlier sync removed it. Verified: standalone make + make test pass cleanly. All three checks green (sql tests including three new placeholder cases, valgrind, and bison grammar conflict). Poco Data lib (debug and release shared) builds and links cleanly via the poco_env.bash flow.
Imports Poco::Data::Utility from macchina.io: - Utility::renderValue<T> renders a single C++ value as a SQL literal. Handles nullptr, Poco::Nullable, std::optional, bool, integral and floating types, Date/Time/DateTime/LocalDateTime, BLOB (X'...' hex), CLOB, Poco::Dynamic::Var (recursive), and anything string-view-like (single-quoted with internal ' doubled). Unknown types fall back to ? so the rendered text stays valid-shaped. - Utility::boundSQL<...> renders a parameterised SQL template by substituting placeholders with stringified bindings. Currently a hand-rolled state-machine scanner supporting ? and $N styles with literal-awareness. (Switch to parser-backed substitution in a follow-up commit.) - Utility::executeSQL<...> wraps Statement execution, captures stmt, bindings, timing, rows-affected/error and reports the outcome via a structured ExecResult callback. - Utility::toJSON serialises an ExecResult as a JSON object with RFC 8259 string escaping. Adds 22 UtilityTest cases under Data/SQLite/testsuite covering renderValue dispatch, boundSQL placeholder handling and arity errors, executeSQL success/constraint/malformed/null paths, and toJSON formatting. The testsuite passes against the in-tree hand-rolled scanner; the follow-up parser refactor must keep it green.
The placeholder param_expr actions now set ival2 to the source column (0-based) of the placeholder's first character. For ? the column is yylloc.total_column - 1, for $N total_column - (1 + digits of N), for :name total_column - 1 - strlen(name). ival2 was previously written once for ? (yyloc.param_list.size(), which duplicated the sequential id later stored in ival by the top-level renumber loop) and never read anywhere in Poco. Repurposing it for the source column gives all three placeholder kinds a consistent meaning: ival -> id (sequential for ?, explicit N for $N, 0 for :name); ival2 -> source column. This lets consumers walk parameters() sorted by ival2 and substitute placeholders against the original SQL text without having to re-scan it themselves. prepare_tests.cpp gains ival2 assertions on the four placeholder cases (? / $N in-order / $N out-of-order / :name) validating the computed column positions match hand-counted offsets. The top-level input rule's renumber loop is unchanged - it only writes ival for kExprParameter, leaving ival2 alone. Verified: standalone make test passes all three checks.
…y::boundSQL #5371 Utility::boundSQL now parses the input SQL with hsql::SQLParser, walks result.parameters() sorted by Expr::ival2 (source column), and substitutes by position. Replaces the hand-rolled state-machine scanner that duplicated lexer responsibilities (literal-awareness, $N parsing, mixing detection). The parser handles single-quoted literals natively, so placeholders inside strings are inert. - ? and $N substitute as before; arity and mixing errors throw Poco::InvalidArgumentException with the same call sites/messages. - :name placeholders are now pass-through (left verbatim in the output). The renderer does not bind by name; future Statement-level work can complete that. - Unparseable SQL throws InvalidArgumentException; executeSQL catches and falls back to the raw template string, matching the pre-existing contract that callbacks always get a populated sql. - The POCO_DATA_NO_SQL_PARSER build retains the original hand-rolled scanner as a fallback so the no-parser CI matrix keeps working. Also lifts the upstream parser's ban on ? in extended_literal contexts (INSERT VALUES, EXECUTE args). The rejection at bison_parser.y:1098 ("Parameter ? is not a valid literal.") was inconsistent with the newer $N / :name placeholders, which the same rule accepts. Common INSERT INTO t VALUES(?, ?, ?) patterns now parse natively. Two queries-bad.sql cases that asserted the rejection move to queries-good.sql. UtilityTest SQL strings updated to wrap VALUES(...) fragments inside INSERT INTO t — the parser-based path validates structure, so bare VALUES() fragments are no longer accepted (they were a quirk of the text-only scanner). testExecuteSQLMalformed adjusted to assert the fall-through behavior (raw sql preserved when boundSQL throws). Verified: SQLite testsuite 122 tests pass (SQLiteTest 100 + UtilityTest 22). Standalone SQLParser make test passes all three checks. flex_lexer.{cpp,h} were regenerated but the byte-identical output to HEAD means they are not in this commit.
The previous commit included SQLParser.h directly in Utility.h so the templated boundSQL body could reach hsql types. That worked for the make build (the SQLite testsuite Makefile adds the SQLParser include paths via target_includes), but the cmake build keeps SQLParser as PRIVATE on the Data target, so any consumer of Utility.h (such as DataSQLite-testrunner) failed with: Utility.h:31:10: fatal error: SQLParser.h: No such file or directory Split boundSQL into a thin template that only renders the binding values plus a non-template back-end boundSQLImpl that lives in Utility.cpp and owns the parser-based substitution. The public header now needs no SQLParser headers at all; the parser dependency is purely internal to PocoData. Verified: cmake build with parser-enabled and parser-disabled matrices both link DataSQLite-testrunner; the 22 UtilityTest cases pass under the cmake build.
There was a problem hiding this comment.
Pull request overview
This PR syncs the vendored Data/SQLParser with upstream changes (plus Poco-specific extensions) and adds a new Poco::Data::Utility helper for rendering/binding SQL for diagnostics, structured execution reporting, and JSON serialization, along with new SQLite tests and CI coverage for the standalone SQLParser suite.
Changes:
- Updated SQL parser grammar/AST to support new syntax (e.g., FOREIGN KEY constraints, JOIN ... USING columns list, ORDER BY NULLS FIRST/LAST, schema-qualified functions, COPY ... WITH (...) options,
$Nand:nameplaceholders). - Added
Poco::Data::Utility(renderValue/boundSQL/executeSQL/toJSON) plus SQLite tests covering the new behavior. - Extended GitHub Actions to run upstream sql-parser build+tests (including valgrind + grammar conflict checks) when SQLParser files change.
Reviewed changes
Copilot reviewed 37 out of 56 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| Data/src/Utility.cpp | Implements SQL literal rendering + ExecResult JSON serialization helpers. |
| Data/include/Poco/Data/Utility.h | Adds Poco::Data::Utility API (renderValue/boundSQL/executeSQL/toJSON). |
| Data/Makefile | Adds Utility object to Data library build. |
| Data/SQLParser/test/tpc_h_tests.cpp | Formatting adjustments for TPCH grammar tests. |
| Data/SQLParser/test/thirdparty/microtest/microtest.h | Macro refactor/formatting for the test framework. |
| Data/SQLParser/test/test.sh | Refactors test script result handling and summary logic. |
| Data/SQLParser/test/sql_tests.cpp | Updates/extends parser unit tests (FKs, COPY options, intervals, etc.). |
| Data/SQLParser/test/sql_parser.cpp | Formatting-only changes in tokenization tests. |
| Data/SQLParser/test/sql_asserts.h | Updates test helper macros / includes. |
| Data/SQLParser/test/select_tests.cpp | Extends SELECT tests (NULL ordering, JOIN USING, schema-qualified functions). |
| Data/SQLParser/test/queries/queries-good.sql | Adds new “good” grammar cases for new features. |
| Data/SQLParser/test/queries/queries-bad.sql | Adds new “bad” grammar cases (negative tests) for new features. |
| Data/SQLParser/test/prepare_tests.cpp | Adds $N and :name placeholder tests + column position tracking. |
| Data/SQLParser/test/auto_query_file_test.cpp | Formatting-only changes to the query-file-driven grammar test. |
| Data/SQLParser/src/util/sqlhelper.cpp | Extends AST printer to handle new Expr/placeholder types. |
| Data/SQLParser/src/SQLParserResult.h | Minor formatting cleanup. |
| Data/SQLParser/src/SQLParserResult.cpp | Formatting cleanup in reset/releaseStatements. |
| Data/SQLParser/src/SQLParser.cpp | Formatting cleanup in parse/tokenize. |
| Data/SQLParser/src/sql/Table.h | Adds JoinDefinition::namedColumns to represent JOIN ... USING columns. |
| Data/SQLParser/src/sql/statements.cpp | Updates statement implementations for new fields/options and cleanup. |
| Data/SQLParser/src/sql/SQLStatement.h | Minor formatting cleanup. |
| Data/SQLParser/src/sql/SQLStatement.cpp | Formatting cleanup in hints destruction. |
| Data/SQLParser/src/sql/ShowStatement.h | Adds ShowStatement header (previously implicit). |
| Data/SQLParser/src/sql/SelectStatement.h | Adds NULL ordering support to OrderDescription. |
| Data/SQLParser/src/sql/InsertStatement.h | Adds InsertStatement header (previously implicit). |
| Data/SQLParser/src/sql/ImportStatement.h | Refactors import/export option types into shared options + adds fields. |
| Data/SQLParser/src/sql/ImportExportOptions.h | Introduces shared Import/Export options + CSV options. |
| Data/SQLParser/src/sql/Expr.h | Adds new placeholder expr types + schema-qualified function support. |
| Data/SQLParser/src/sql/Expr.cpp | Implements new Expr constructors + schema field + placeholder handling. |
| Data/SQLParser/src/sql/ExportStatement.h | Refactors to use shared import/export options + adds fields. |
| Data/SQLParser/src/sql/DropStatement.h | Minor formatting cleanup. |
| Data/SQLParser/src/sql/DeleteStatement.h | Adds DeleteStatement header (previously implicit). |
| Data/SQLParser/src/sql/CreateStatement.h | Adds FK constraint types + references specs + column constraints wrapper. |
| Data/SQLParser/src/sql/CreateStatement.cpp | Implements FK/reference plumbing + moves Create-related impls here. |
| Data/SQLParser/src/sql/ColumnType.h | Formatting + include fixes. |
| Data/SQLParser/src/sql/AlterStatement.h | Adds AlterStatement header (previously implicit). |
| Data/SQLParser/src/parser/parser_typedef.h | Minor formatting cleanup. |
| Data/SQLParser/src/parser/flex_lexer.l | Adds lexer tokens for ENCODING/FOREIGN/REFERENCES and $N/:name. |
| Data/SQLParser/src/parser/flex_lexer.h | Regenerated header formatting/line directives. |
| Data/SQLParser/src/parser/bison_parser.y | Major grammar update for new syntax + conflict enforcement (%expect 0). |
| Data/SQLParser/src/parser/bison_parser.h | Regenerated parser header updates for new tokens/types. |
| Data/SQLParser/README.md | Updates upstream branch references (master → main). |
| Data/SQLParser/Makefile | Fixes flags var + adds clang newline warning + parser-specific warning suppressions. |
| Data/SQLParser/benchmark/queries.cpp | Formatting-only changes. |
| Data/SQLParser/benchmark/parser_benchmark.cpp | Formatting-only changes. |
| Data/SQLParser/benchmark/benchmark.cpp | Formatting-only changes. |
| Data/SQLParser/benchmark/benchmark_utils.h | Formatting-only changes. |
| Data/SQLParser/benchmark/benchmark_utils.cpp | Formatting-only changes. |
| Data/SQLite/testsuite/src/UtilityTest.h | Adds SQLite testsuite coverage for Poco::Data::Utility. |
| Data/SQLite/testsuite/src/UtilityTest.cpp | Implements Utility tests (rendering, binding, execution, JSON). |
| Data/SQLite/testsuite/src/SQLiteTestSuite.cpp | Registers UtilityTest in SQLite testsuite. |
| Data/SQLite/testsuite/Makefile | Adds UtilityTest to build objects. |
| .github/workflows/ci.yml | Adds dedicated job to build/test the standalone SQLParser upstream suite on Linux. |
| .github/path-filters.yml | Adds sqlparser path filter for targeted CI execution. |
Comments suppressed due to low confidence (1)
Data/SQLParser/src/sql/SelectStatement.h:15
NullOrderingis declared as an unscoped enum, but the new code usesNullOrdering::Undefined/First/Last(e.g., in the parser and tests). This will not compile unlessNullOrderingis a scoped enum (e.g.,enum class) or all call sites are updated to use the unscoped enumerators (Undefined,First,Last).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…stics #5371 RenderingBinder is an AbstractBinder subclass that captures bind() traffic as SQL-literal strings instead of forwarding values to a database driver. It overrides every scalar and container bind() overload AbstractBinder exposes. Utility::boundSQL now drives its bindings through a RenderingBinder and reads back captures per (position, row) to produce the diagnostic SQL. Three improvements over the previous renderValue-only path: - STL containers (std::vector/deque/list of any scalar) now render with actual element values instead of '?'. The binder captures up to maxRows entries per binding; the default boundSQL renders only the first row and appends '-- (+N more rows)' if the container has more. boundSQLBulk(sql, maxRows, ...) lets callers ask for K rows joined by ';\n'. - User-defined types with a custom TypeHandler<T> specialization render correctly out of the box: their bind() method calls back into pBinder->bind(pos+i, member, dir) for each column, and the RenderingBinder's scalar overrides capture each member's value. No user opt-in needed. - The renderer's contract for OUT and IN_OUT directions is explicit: IN and IN_OUT capture the value being sent; OUT captures the literal '?' so the rendered SQL stays valid-shaped. Zero overhead on the regular Statement.execute() path - the RenderingBinder is constructed only when Utility::boundSQL or Utility::executeSQL is called; it never replaces the database driver's binder. Utility's formatter primitives (quoteString, formatDate, formatTime, formatDateTime, formatLocalDateTime, formatVar, formatBLOB, formatCLOB) are now public statics so RenderingBinder reuses them directly. Three new UtilityTest cases: - testBoundSQLVectorScalar - vector<int>{1,2,3} renders first row plus '(+2 more rows)' suffix. - testBoundSQLVectorMaxRows3 - same vector at maxRows=3, renders three statements joined. - testBoundSQLCustomTypeHandler - locally-defined TypeHandler<UPair> renders both columns with actual values. The existing 22 cases keep passing under both the make build and both cmake matrices (parser-enabled and POCO_DATA_NO_SQL_PARSER). testExecuteSQLMalformed's expected captured.sql is now conditional on POCO_DATA_NO_SQL_PARSER (parser path validates; scanner path substitutes regardless of SQL validity).
|
Shall SQLParser be moved to dependencies as well? I was not aware that this is not original Poco work. |
I am not sure yet. Given the upstream scarce activity and non-responsiveness (it seems that all you can get is likes), we may take over our fork and break with upstream (we have lots of adaptive code already anyway). I gave it one last attempt to keep the sync and if there's no response, we'll go our own merry way. |
…:renderValue #5371 Eliminates the parallel fmt() overload ladder that lived in RenderingBinder.cpp. RenderingBinder::recordScalar and recordContainer now call Utility::renderValue<T> directly, making renderValue the single source of truth for value -> SQL-literal conversion. renderValue's if-constexpr ladder gains the binder-ABI scalars that were previously missing: - NullData -> NULL (matches AbstractBinder's NullData overload). - char -> single-character quoted string literal (was previously catching as integral and rendering as a number; SQL semantics treat char as TEXT(1), not a numeric scalar). - Poco::UUID -> quoted toString() output. - Poco::UTF16String -> quoted UTF-8 text after UnicodeConverter::convert. The char branch is placed before std::is_integral_v<T> so it short-circuits; Int8/UInt8 remain numeric (they are signed/unsigned char, distinct from char in the C++ type system). Container capture uses an explicit value-type copy when iterating so std::vector<bool>'s proxy iterators decay to bool before reaching renderValue's overload set. Verified across make + cmake (parser-enabled) + cmake (POCO_DATA_NO_SQL_PARSER=ON): all 25 UtilityTest cases pass in every matrix, including the three new container/TypeHandler cases.
…ndering #5371 DataException carries the raw template SQL (placeholders, no values). When a Statement throws, operators see "INSERT INTO t VALUES(?, ?, ?)" rather than the actual bound payload. Adding rendering inside DataException would change default log shape and silently leak PII into every existing logger. Instead, expose an opt-in helper callers invoke from a catch block: try { stmt.execute(); } catch (Poco::Data::DataException& e) { log("failed: " + Utility::boundSQL(stmt) + " - " + e.displayText()); } The helper walks the Statement's bindings, swaps each one's binder to a RenderingBinder, drives bind() row by row up to maxRows (default 1), then restores the original binder. Two reset() calls bracket the bind loop so the helper works whether the Statement was previously executed or not, and leaves bindings ready for a subsequent execute() / boundSQL() call. To make the cascade safe, RenderingBinder::reset() is now a no-op: scalar Binding<T>::reset() calls pBinder->reset(), which would otherwise wipe captures from earlier walked bindings. The binder is scoped to one render call and discarded, so there is nothing to clean up. recordScalar() now appends to the per-position slot instead of overwriting, so container bindings driven row-by-row accumulate per-row captures. For scalar bindings (one bind() per pos) the slot still ends up with exactly one entry, so the existing boundSQL<Args...> template path is unaffected. Adds Statement::bindings() public accessors (impl() is protected) and RenderingBinder::setTotalRows() so the helper can communicate the binding-derived row count to the suffix logic in boundSQLImpl. Five new UtilityTest cases cover scalar / vector / bulk / custom TypeHandler / preserves-execute paths; all 30 cases pass in make, cmake parser-enabled, and cmake POCO_DATA_NO_SQL_PARSER=ON matrices.
ColumnConstraints owns two heap-allocated members (constraints and references, the latter containing heap-allocated ReferencesSpecification pointers). On the happy path those pointers transfer into a ColumnDefinition which deletes them in its destructor; on parse error bison reaches the wildcard %destructor that only frees the wrapper, leaking the inner members. Add a dedicated %destructor for <column_constraints_t> that releases the inner state before deleting the wrapper. valgrind on queries-good.sql + queries-bad.sql now reports 15,209 allocs / 15,209 frees, no leaks. Also correct a typo in microtest.h (inherited from upstream hyrise/sql-parser): ASSERT_STRNEQ contained '!= = 0' which would fail to compile if expanded. No callers use the macro so it was dormant, but it is still broken on its face; change the condition to '== 0' (the inverse of ASSERT_STREQ). Backported to ~/sql-parser fork as fba2a75 / e2f2d23 on the 263-placeholders branch.
…5371 CodeQL's cpp/unused-static-variable and cpp/unused-local-variable rules flag the parameter pack 'bindings' on Utility::boundSQL and Utility::executeSQL as unused. The pack IS used via expansion (bindings... forwarded to boundSQLBulk in one case, into renderings/stmt-binding in the other), but CodeQL's data-flow doesn't track pack expansion through call arguments. Annotate the parameter packs with [[maybe_unused]] to suppress the false positive without changing functional behaviour.
bison_parser.cpp #includes flex_lexer.h, which is generated alongside flex_lexer.cpp from flex_lexer.l. Under parallel make (-j$(nproc)), the compilation of bison_parser.o could start before flex regenerated flex_lexer.h, producing 'hsql_lex was not declared' errors in CI. Add flex_lexer.cpp as an explicit prerequisite of bison_parser.o so make serializes correctly. The reverse dep (flex_lexer.o depends on bison_parser.cpp) was already wired up; this completes the pair. Backported to ~/sql-parser fork as 0e5e4c6 on the 263-placeholders branch.
Follow-up to ec57eb4. CodeQL also flags 'bindings' and 'pos' inside boundSQLBulk: both are used in the fold-expression body (TypeHandler<Args>::bind(pos, bindings, ...), pos += ...), but CodeQL's data-flow doesn't trace through pack expansion or fold expressions. Annotate both with [[maybe_unused]] for consistency with the other variadic wrappers and to suppress the false positives.
Utility::UpdateCallbackType declared the rowid parameter as Poco::Int64, which is long on LP64 platforms (Linux x86_64, aarch64). sqlite3_update_hook expects a callback whose rowid is sqlite3_int64 = long long. The previous reinterpret_cast in eventHookRegister bridged the type mismatch, but calling a function through a pointer of a different type than its actual signature is undefined behavior in C++. UBSan (-fsanitize=function) flags it on every update-hook invocation. Align UpdateCallbackType and Notifier::sqliteUpdateCallbackFn to use long long directly, matching sqlite3_int64 exactly. Drop the reinterpret_cast in Utility::eventHookRegister. Update the test's matching callback signature. Source-compat: existing callbacks declared with Poco::Int64 will need a one-line signature swap to long long on Linux (Windows users are unaffected because Poco::Int64 is already long long there). The change is value- preserving on every supported platform.
matejk
left a comment
There was a problem hiding this comment.
Design / code-quality review for the Poco-authored surface (Utility, RenderingBinder, Statement::bindings()) and the vendored design choices the Poco fork now owns. Auto-generated bison_parser.cpp / flex_lexer.cpp not reviewed line by line.
Leaving 8 inline comments split P1 (worth before merge) and P2 (design polish):
- P1: RAII guard for the binder-swap loop; thread-safety docstring.
- P2: API ergonomics on
executeSQL(context),NullOrdering->enum class, RAII forColumnConstraints, doc fixes ontsMicrosandformatBLOBdialect, possible missingvector<unsigned long>override.
P3 items (TSAN CI job, scanner-fallback hardening, fork-extension tags in Expr.h) intentionally not commented inline -- happy to file as follow-up issues if useful.
…ence #5373 MemoryDB is an in-memory SQLite database with sharded persistence to disk. Behaves like a Poco::Data::Session (operator<<, into, use, now); persists incrementally to a directory of shard files via a background flush timer, at destruction, and on demand via flush(). Live data stays in RAM in the active shard; sealed shards stay on disk and are attached read-only on demand via attachArchived() / historyView(). Features: - sharded persistence with size and age seal triggers - retention policies: Options::retentionMaxAge and retentionMaxBytes - always-on SQLITE_LIMIT_ATTACHED backstop on total shard count (oldest sealed shards auto-dropped at flush time when the cap would be exceeded; a warning is logged via the Poco.Data.SQLite.MemoryDB logger) - history queries spanning the active shard plus sealed shards on disk, via attachArchived/detachArchived or full / time-range historyView() - WITHOUT ROWID tables rejected up front (sharding partitions by rowid) - thread-safe: attach/detach/historyView serialize internally via sqlite3_db_mutex so user statements on the shared Session never see a half-attached connection; attach bookkeeping is reference-counted so concurrent attachers of the same shard pair correctly - crash-safe: shard files written via temp + atomic rename, manifest commits in a single transaction; worst-case loss bounded by the idle/max-flush window Includes: - Data/SQLite/{include,src}/MemoryDB.{h,cpp} - Data/SQLite/testsuite/src/MemoryDBTest.{h,cpp}: ~30 tests covering persistence, retention, history views, deleteShard semantics, the always-on shard-ceiling auto-drop, and a 5-second concurrent stress test (1 writer + N readers/historians/admin) that completes clean under release and ThreadSanitizer - Data/samples/MemoryDB: usage sample - Wiring: SQLiteTestSuite registers MemoryDBTest (and the previously landed SQLiteThreadSafetyTest); testsuite Makefile/CMakeLists pull in MemoryDBTest under the same SQL parser guard as MemoryDB itself Supporting Data layer changes folded in (originally #5371): - Utility::executeSQL takes std::string context instead of std::unique_ptr<std::string>; empty string means none. Removes one heap allocation per call and reads better at call sites. - Utility::boundSQL(stmt) wraps the per-binding binder swap in a RAII guard so an exception in a user TypeHandler<T>::bind cannot leave a binding pointing at a destroyed RenderingBinder. - Utility.h doc updates: explicit not-thread-safe-with-respect-to-the- Statement contract on boundSQL(stmt) / boundSQLBulk(stmt); dialect caveat on the formatBLOB X'AABB' hex literal form (SQLite/MySQL/ standard-SQL, not portable to PostgreSQL bytea or MS SQL 0xAABB).
…oryDB #5373 Data/SQLite/CMakeLists.txt now adds Data/SQLParser{/src,} to DataSQLite's PRIVATE include path when POCO_DATA_NO_SQL_PARSER is not defined, mirroring Data/CMakeLists.txt's existing handling for the Data library. Without it, MemoryDB.cpp's #include SQLParser.h failed to resolve under cmake (the gnu-make build had this covered already via Data/SQLite/Makefile's target_includes). Also adds Data/SQLite/src/ to DataSQLite-testrunner's PRIVATE include path so MemoryDBTest.cpp's #include <sqlite3.h> (used by testShardCeilingAutoDrop to lower SQLITE_LIMIT_ATTACHED at runtime) resolves via the symlink the bundled SQLite build drops there. The testsuite Makefile already does this; this just brings cmake to parity.
…5370 Two leaks in SQLParser::tokenize(): 1. SQL_NAMED_PARAM (added in #263) strdup()s the parameter name into yylval.sval at flex_lexer.l:247, but the cleanup branch only checked SQL_IDENTIFIER and SQL_STRING -- every :name placeholder leaked. 2. The loop lexed a token, pushed it, lexed the next, then checked the next token's type and freed yylval.sval. Because the second lex overwrote yylval, the first sval-bearing token's pointer was lost. What got freed was always the next token's sval -- correct in steady state but a guaranteed leak on the first sval-bearing token of every tokenize() call. Restructure: lex once per iteration, free immediately if the token type allocates an sval, push, loop. The token set mirrors bison's own at bison_parser.y:194, so the direct-lex (tokenize) path and the parse path now clean up the same set.
…5371 Utility::renderValue<T> falls through to the std::is_convertible_v<const T&, std::string_view> branch for T = const char* or char*, then calls std::string_view(v). When v is null this hits the string_view(const char*) constructor's contract violation (traits::length on nullptr is UB). The nullptr-literal branch at the top of renderValue only matches T = std::nullptr_t and so doesn't catch a pointer variable that happens to be null. Fold an is_pointer_v compile-time guard into the string_view branch so character-pointer types null-check at runtime and return NULL, matching what the bind layer would emit for a null pointer. std::string, std::string_view, and other non-pointer types implicitly convertible to string_view stay on the existing fast path (no runtime branch added). Test: const char* and char* null variants assert NULL in UtilityTest::testRenderValue, alongside the existing nullptr-literal case.
… dir #5373 Follow-up to b0efe72. That commit added /Data/SQLite/src/ (gnu-make) and /Data/SQLite/src/ (cmake) to the testrunner include path so MemoryDBTest.cpp's #include <sqlite3.h> would resolve. The Linux gnu-make build worked because Data/SQLite's own Makefile runs an prebuild that drops a sqlite3.h symlink there. The Windows cmake build doesn't run that prebuild - sqlite3.h was never present at that path, and ninja failed with: fatal error C1083: Cannot open include file: 'sqlite3.h': No such file or directory Point at the canonical bundled SQLite header location instead: cmake: PRIVATE $<TARGET_PROPERTY:SQLite::SQLite3, INTERFACE_INCLUDE_DIRECTORIES> (resolves to dependencies/sqlite3/src for bundled, or the find_package(SQLite3) result for POCO_SQLITE_UNBUNDLED; takes only the include dirs from the SQLite::SQLite3 target, no link, so PocoDataSQLite's already-linked SQLite object library isn't pulled in a second time) gnu-make: /dependencies/sqlite3/src (always present in the source checkout; the system header on the default include path takes over for unbundled) Both work on Linux and Windows, bundled or unbundled, regardless of whether Data/SQLite's prebuild has ever run.
…tatic #5373 Follow-up to the previous testsuite include-path fix. MemoryDBTest's testShardCeilingAutoDrop and testHistoryViewTimeRangeOverflow call sqlite3_limit() directly to constrain the runtime SQLITE_LIMIT_ATTACHED cap. The include resolution worked once we routed the test's <sqlite3.h> through dependencies/sqlite3/src, but Windows clang-cmake then failed at link: lld-link: error: undefined symbol: sqlite3_limit >>> referenced by ... MemoryDBTest.cpp.obj The bundled SQLite is compiled into PocoDataSQLite.lib via an OBJECT library (BUILD_LOCAL_INTERFACE PRIVATE). On Linux .so the symbols are exported by default; on Windows static (.lib with POCO_STATIC) nothing without __declspec(dllexport) is visible to downstream linkers, and SQLITE_API doesn't emit one. Linking SQLite::SQLite3 directly into the testrunner would re-pull the OBJECT library's .o files and clash with PocoDataSQLite's already-linked copies. Wrap it in POCO: Utility::setAttachedLimit(Session&, int newVal). The wrapper carries the SQLite_API export through PocoDataSQLite's normal public surface, so the testrunner (and any external caller) gets a portable way to set SQLite's per-connection attached-DB cap without needing the raw handle, <sqlite3.h>, or a direct link against the bundled SQLite object library. Tests now call Utility::setAttachedLimit instead of sqlite3_limit. MemoryDBTest.cpp drops its #include <sqlite3.h>, and the include-path fixes from the previous commit (testsuite CMakeLists pointing at SQLite::SQLite3's INTERFACE_INCLUDE_DIRECTORIES, testsuite Makefile pointing at dependencies/sqlite3/src) are reverted - no longer needed.
Audit-driven fixes to MemoryDB: - _columnCopyCache populate-after-DDL race: a writer racing onDDL could reinstall the pre-DDL ColumnCopy and silently drop newly-added columns on the next flush. Snapshot _schemaVersion before releasing the lock and only install if unchanged. - doFlush could miss a CREATE TABLE that landed in the sample-to-lock window: the new table was absent from the slice plan, its row stayed in main but never reached disk, and _dirty got cleared. Re-sample userTables after the IO loop and set _dirty if any appeared. The dtor now unconditionally flushes with allowSeal=false to catch this case without producing a fresh shard on shutdown. - onDDL incremented _schemaVersion before _schemaLog.push_back; bad_alloc on vector growth desynchronized the two and the next catalog write committed version N+1 with N log entries. Swap the order so the only throwing op runs first. - _sealRequested arriving after maybeSeal but before the post-IO _dirty recompute would be lost until the next user write. Fold it into the recompute. - traceCallback rejected only on sql[0]==-; this matched the SQLite trigger-subprogram marker but also any user statement starting with a -- comment, letting a WITHOUT ROWID CREATE through via the session() bypass. Skip -- and slash-star comment prefixes before classification. - WITHOUT ROWID rejection could be lost if the _rejected string assignment threw bad_alloc. Set a noexcept atomic _poisoned first; throwIfRejected uses an atomic fast path and a static fallback reason. - detachArchived held _attachMutex across its 1ms retry sleep, blocking deleteShard (which holds _flushMutex) and transitively stalling flush. Release and re-acquire each iteration; the refcount is re-checked so concurrent attaches stay correct. - deleteShard is now idempotent on a vanished id (returns silently instead of throwing) so retention loops over a stale shards() snapshot stay correct under concurrency. Header docs updated to capture the three-mutex layering rationale and the new contracts. Redundant inner _stateMutex acquisitions on _attached removed since _attachMutex now solely covers it. New lastFlushTime() accessor for time-bounded durability waits. testDeleteNonexistentShardRefused updated for the new no-throw contract.
Binder.h and Extractor.h were the last two public headers including <sqlite3.h>. Each did it only so a small inline template body (bindLOB<T>, extractLOB<T>) could call sqlite3 free functions; the other public SQLite headers had already moved to opaque forward declarations. Consumers of Poco::Data::SQLite therefore needed an sqlite3.h on their include path, and any mismatch against the sqlite3.h POCO was built with (system vs bundled, different SQLITE_VERSION, custom SQLITE_API decoration) produced declaration/ODR errors at compile or link time. On Windows static builds this was the same family of conflicts as #5373. Fix: forward-declare sqlite3_stmt as an opaque struct in both headers and route the three sqlite3 calls through non-template static helpers (Binder::bindBlobStatic, Extractor::columnBytes, Extractor::columnBlobPtr) defined in the .cpp files. Template bodies stay inline and the public API is unchanged; the include graph just loses one level of unintended exposure.
Adds tests that lock down the parser-layer behaviour POCO's production
code relies on, plus reproductions of recent fix motivations so any
upstream sql-parser sync that regresses them fails loudly in the test
layer rather than silently in MemoryDB / Utility.
Data/testsuite/src/SQLParserTest:
- testResetClearsParameters: reparse loop with reset() between
iterations - the lifecycle the upstream sval-leak fix hardened.
- testNamedParameter: :name placeholders parse to kExprParameterNamed
with names {user, age}.
- testAlterDropColumnIfExists: down-casts to AlterStatement +
DropColumnAction and asserts columnName, ifExists, table name.
- testDropDiscrimination: DROP TABLE vs DROP INDEX dispatch.
- testDeleteShape: DeleteStatement::expr nullptr distinguishes truncate
from filtered delete - what MemoryDB branches on for the truncate
optimization.
- testComments: -- line comments survive parsing across statements.
- testTokenize: backstops the SQLParser::tokenize loop the upstream
sval-leak fix rewrote.
- Inline fix to the pre-existing testSQLParser failure formatter: L13
was rendering with no separator between line and column; replaced
with L<line>:C<col> and switched bare fail() to failmsg() so the
formatted message actually surfaces on the CppUnit report.
Data/SQLite/testsuite/src/UtilityTest:
- testBoundSQLNamed: pins the current pass-through behaviour of :name
placeholders in Utility::boundSQL (Utility.cpp kExprParameterNamed
branch sets paramIndex=npos so the span is copied verbatim and the
supplied args are silently ignored). If this branch is ever taught
to substitute values, the test must be rewritten.
Data/SQLite/testsuite/src/MemoryDBTest:
- testCommentPrefixedWithoutRowidRejected: covers both operator<< and
session() bypass paths with a leading -- comment. Pins the motivation
for the leading-comment-skip rewrite in onStatement; the previous
single-char trigger-marker check let -- prefixed CREATE bypass the
trace-hook backstop.
- testDropTableClassified: DROP TABLE survives close-reopen via the
schema log, proving the kStmtDrop + kDropTable dispatch.
- testAlterTableClassified: ALTER TABLE ADD COLUMN survives close-
reopen. ADD COLUMN is not modelled as an AlterAction by hsql so this
exercises the first-keyword DDL fallback in onStatement.
The bison/flex regen in 01f7ec2 overwrote a POCO-local hand-patch in the generated flex_lexer.h that wrapped the unistd.h include in a Windows guard. MSVC has no unistd.h, so clang-cl Windows builds fail with: flex_lexer.h(470): fatal error: 'unistd.h' file not found Re-apply the same conditional, with a POCO-LOCAL comment so the next person who regens the lexer knows it has to be re-applied. flex's %option set in flex_lexer.l has no equivalent for this - nounistd would drop the include on every platform but flex's emitted runtime calls (read/write/close/isatty) need either unistd.h or io.h, so the conditional has to live in the generated header.
No description provided.