Skip to content

Commit bb74a19

Browse files
Merge pull request #479 from LASTRADA-Software/fix/postgres-windows-unicode-roundtrip
Fix PostgreSQL Unicode round-trip on Windows by switching ODBC to W variants
2 parents 08d4e7c + d44b7d5 commit bb74a19

24 files changed

Lines changed: 1398 additions & 323 deletions

.agent/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# `.agent/` — Agent deep-dives
2+
3+
Supporting material for AI agents (Claude Code, Codex, etc.) working on Lightweight. The primary entry point is `../AGENT.md`; this directory keeps the longer deep-dives out of the top-level guide so it stays scannable.
4+
5+
| File | Topic |
6+
|------|-------|
7+
| [`architecture.md`](architecture.md) | Component map deep-dive, canonical examples for formatter dispatch, `SqlDataBinder<T>` specialization, `DataMapper` field/relationship mechanics, connection pool, migration model |
8+
| [`databases.md`](databases.md) | Per-DBMS specifics: features, connection-string formats, Docker harness, known footguns (Unicode round-trips, `NULL` semantics, identifier quoting) |
9+
| [`testing.md`](testing.md) | Catch2 conventions, `.test-env.yml` schema, `--test-env=<name>` flag, the `dbtool` Python integration suite, sanitizer/valgrind presets |
10+
| [`cpp-guidelines.md`](cpp-guidelines.md) | Self-contained C++23 ruleset (general + Lightweight-specific). No external `cpp.md` required |
11+
12+
These files are loaded into agent context via the top-level `CLAUDE.md` (`@.agent/`).

.agent/architecture.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Architecture deep-dive
2+
3+
## Layers
4+
5+
```
6+
┌─────────────────────────────────────────────┐
7+
user code ──▶ │ DataMapper (Field, BelongsTo, HasMany,…) │ high-level
8+
├─────────────────────────────────────────────┤
9+
│ SqlQuery DSL (Select/Insert/Update/Delete) │
10+
│ SqlMigration / SqlMigrationLock │
11+
├─────────────────────────────────────────────┤
12+
│ SqlStatement / SqlConnection │ low-level
13+
│ SqlDataBinder<T> (per-type bind/fetch) │
14+
├─────────────────────────────────────────────┤
15+
│ SqlQueryFormatter — per-DBMS dispatch │
16+
│ ├── SqlServerFormatter │
17+
│ ├── PostgreSqlFormatter │
18+
│ └── SQLiteFormatter │
19+
├─────────────────────────────────────────────┤
20+
│ ODBC (unixODBC / Windows ODBC) │ driver
21+
└─────────────────────────────────────────────┘
22+
```
23+
24+
Anything that varies between DBMSes belongs in `SqlQueryFormatter`. Business logic above must be database-agnostic.
25+
26+
## Canonical examples
27+
28+
### Per-DBMS formatter override
29+
File: `src/Lightweight/QueryFormatter/PostgreSqlFormatter.hpp`
30+
31+
```cpp
32+
class PostgreSqlFormatter final: public SQLiteQueryFormatter
33+
{
34+
public:
35+
[[nodiscard]] std::string QueryLastInsertId(std::string_view) const override
36+
{
37+
return "SELECT lastval();";
38+
}
39+
40+
[[nodiscard]] std::string_view DateFunction() const noexcept override
41+
{
42+
return "CURRENT_DATE";
43+
}
44+
// … BinaryLiteral, BuildColumnDefinition, DropTable (CASCADE) overrides …
45+
};
46+
```
47+
48+
To add a new dialect-sensitive primitive:
49+
1. Add a `[[nodiscard]] virtual` method in `SqlQueryFormatter.hpp` with the most reasonable default.
50+
2. Override it in any formatter where the default is wrong.
51+
3. Have the caller request the SQL fragment from the formatter — never inline a `switch` on `SqlServerType`.
52+
53+
### `SqlDataBinder<T>` specialization shape
54+
Files: `src/Lightweight/SqlDataBinder.hpp` (primary template), `src/Lightweight/DataBinder/*.hpp` (specializations).
55+
56+
Each specialization typically provides:
57+
- `static SQLRETURN InputParameter(SQLHSTMT, SQLUSMALLINT, T const&, SqlDataBinderCallback&)`
58+
- `static SQLRETURN OutputColumn(SQLHSTMT, SQLUSMALLINT, T*, SQLLEN*, SqlDataBinderCallback&)`
59+
- `static SQLRETURN GetColumn(SQLHSTMT, SQLUSMALLINT, T*, SQLLEN*, SqlDataBinderCallback const&)`
60+
- `static std::string Inspect(T const&)` (optional — used by tracing and `DataMapper::Inspect`)
61+
62+
For Unicode-bearing strings, reuse `BasicStringBinder.hpp` helpers (`GetRawColumnArrayData`, `GetColumnUtf16`, `BindOutputColumnNonUtf16Unicode`) and `UnicodeConverter.hpp` rather than re-implementing buffer growth, NULL handling, or transcoding.
63+
64+
### `DataMapper` field & relationship primitives
65+
- `Field<T, …Modifiers>` — one column with optional `PrimaryKey`, `SqlRealName`, default value.
66+
- `BelongsTo<&Other::id, SqlRealName{"fk_col"}>` — many-to-one (lazy-loads via dereference).
67+
- `HasMany<&Child::parent_id>` — one-to-many.
68+
- `HasManyThrough<&Through::a_id, &Through::b_id>` — many-to-many via join table.
69+
- `HasOneThrough<…>` — one-to-one via intermediate.
70+
71+
Records are plain aggregates of these field/relation types — there is no required base class. Reflection (C++20 member pointers, or C++26 reflection when `LIGHTWEIGHT_CXX26_REFLECTION`) drives column enumeration.
72+
73+
### Connection pool
74+
File: `src/Lightweight/DataMapper/Pool.{hpp,cpp}`. The pool is the canonical way to share connections across threads — never store a `SqlConnection` as a long-lived bare member.
75+
76+
### Migrations
77+
- DSL: `src/Lightweight/SqlQuery/Migrate.{hpp,cpp}`, `MigrationPlan.{hpp,cpp}`.
78+
- Runner: `src/Lightweight/SqlMigration.{hpp,cpp}`.
79+
- Cross-process lock: `src/Lightweight/SqlMigrationLock.{hpp,cpp}`.
80+
- The `dbtool` subcommand and the `lup2dbtool` adapter live in `src/tools/`.
81+
82+
## Where reflection lives
83+
84+
`Lightweight.cppm` exports the public surface as a C++20 module when `LIGHTWEIGHT_BUILD_MODULES=ON`. The `Member(x)` macro in `src/tests/Utils.hpp` (and similar in production headers) selects between C++20 member pointers (`&x`) and C++26 reflection (`^^x`) depending on `LIGHTWEIGHT_CXX26_REFLECTION`.
85+
86+
## Key types to recognise
87+
88+
| Type | Role |
89+
|------|------|
90+
| `SqlServerType` | Enum: `MICROSOFT_SQL`, `POSTGRESQL`, `SQLITE`, `MYSQL`, `UNKNOWN`. Derived from the connected driver |
91+
| `SqlConnectInfo` | Parsed connection-string descriptor |
92+
| `SqlError` / `SqlErrorDetection` | Error reporting; pair with `std::expected<T, SqlError>` in new APIs |
93+
| `SqlScopedTraceLogger` / `SqlLogger` | Tracing hooks (driven by `--trace-sql --trace-odbc`) |
94+
| `ThreadSafeQueue` | Internal MPMC queue used by the pool |

.agent/cpp-guidelines.md

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# C++ guidelines (self-contained)
2+
3+
This file is the canonical, project-internal C++ ruleset. Contributors and agents do **not** need any external file (no `~/.claude/rules/cpp.md`, no global notes) — every rule that applies to Lightweight is here.
4+
5+
## 1. General C++23 baseline
6+
7+
### Design
8+
- **Data-driven design** — avoid hard-coded magic values. Prefer descriptor tables, configuration, or polymorphic dispatch over scattered conditionals.
9+
- **Dependency injection** — pass collaborators through constructors / parameters so tests can substitute fakes. Anything touching I/O, time, randomness, or the database must be injectable.
10+
11+
### Documentation
12+
- **Doxygen** on every new public function, parameter, return, class, struct, and member:
13+
```cpp
14+
/// Short description.
15+
/// @param name Description.
16+
/// @return Description.
17+
```
18+
- Be factual — no marketing prose.
19+
20+
### Type & const correctness
21+
- **`const`-correctness** throughout: refs, pointers, member functions, parameters that don't mutate.
22+
- **`auto` type deduction** for readability; **structured bindings** for tuple-like returns.
23+
- Mark return values **`[[nodiscard]]`** where ignoring the result would be a bug — this includes builders, query objects, and anything returning `std::expected`.
24+
25+
### Modern C++ surface
26+
- Prefer **C++23**: `constexpr`, `std::ranges`, `std::format`, `std::expected`, `std::span`.
27+
- **`std::expected<T, E>`** with monadic chaining (`and_then`, `or_else`, `transform`, `transform_error`) for fallible operations. Avoid nested `if`s when a chain reads better.
28+
- **`std::span`** for arrays / contiguous sequences in API surfaces.
29+
- **Range views** for generation/transformation: `std::views::iota`, `std::views::filter`, `std::views::transform`, etc.
30+
31+
### Forbidden constructs
32+
- **C-style loops are forbidden.** Use range-based `for` and the views above.
33+
- **No raw owning pointers.** `std::unique_ptr` / `std::shared_ptr` for ownership, RAII for resources.
34+
- **No `NOLINT` suppressions** — fix the underlying issue clang-tidy reports. The `linux-clang-debug` preset enables `clang-tidy` automatically.
35+
- **No new third-party dependencies** without strong justification — vcpkg manifest at `vcpkg.json` lists what we already accept.
36+
37+
### Tooling
38+
- Run **`clang-format`** on every changed file (project `.clang-format` is authoritative).
39+
- Build with the matching preset (`linux-clang-debug` / `windows-clangcl-debug`); resolve every warning — `PEDANTIC_COMPILER_WERROR=ON`.
40+
- All changes must be covered by unit tests; aim to **increase** code coverage with every PR.
41+
42+
## 2. Lightweight-specific patterns
43+
44+
### Namespace & symbol visibility
45+
- All public symbols live in the `Lightweight` namespace; the alias `Light` is provided for brevity in user code.
46+
- Public headers must be **self-contained** — they must compile with no PCH and only their own includes.
47+
- Keep ABI-affecting changes deliberate; the library is consumed as a vcpkg port.
48+
49+
### Error handling on the public API
50+
Prefer `std::expected<T, SqlError>` for fallible public APIs. Reserve exceptions for programmer errors (precondition violation, contract misuse). Internally, ODBC `SQLRETURN` codes are wrapped via the helpers in `src/Lightweight/SqlError.{hpp,cpp}` and `SqlErrorDetection.hpp`.
51+
52+
```cpp
53+
return statement.Prepare(sql)
54+
.and_then([&] { return statement.Execute(args...); })
55+
.transform([&](auto&&) { return …; })
56+
.transform_error([](SqlError const& e) { return wrapWithContext(e); });
57+
```
58+
59+
### ODBC `SQLHANDLE` lifetime
60+
ODBC handles (env, dbc, stmt) are owned by `SqlConnection` / `SqlStatement` (and the small RAII wrappers near them). They are released in the destructor. Never:
61+
- allocate a raw `SQLHANDLE` at call sites,
62+
- store one beyond the lifetime of its wrapper,
63+
- pass `SQLHANDLE` across abstraction boundaries (pass the wrapper instead).
64+
65+
### Per-DBMS dispatch
66+
Per-DBMS branching belongs only inside `SqlQueryFormatter` and its subclasses. Business logic must call a virtual method on the formatter and let the override decide. **Do not** write `if (server == SqlServerType::POSTGRESQL)` outside `QueryFormatter/`.
67+
68+
If a new feature behaves differently on each DBMS:
69+
1. Add a `[[nodiscard]] virtual` method to `SqlQueryFormatter` with a sensible default.
70+
2. Override per DBMS in `PostgreSqlFormatter`, `SQLiteFormatter`, `SqlServerFormatter`.
71+
3. Cover all three in tests.
72+
73+
### `SqlDataBinder<T>` contract for Unicode strings
74+
Unicode-bearing string types must round-trip identically across MSSQL, PostgreSQL, and SQLite. This is non-trivial because each driver has different opinions about UTF-16 vs UTF-8.
75+
76+
The repo's hard-won rules (codified in commits `894c67c7`, `89885982`, `f311cb9f`):
77+
78+
- Bind Unicode payloads via **`SQL_C_WCHAR`** (UTF-16) uniformly across the `BasicStringBinder` surface — `BasicStringBinder.hpp` does this.
79+
- For non-`u16string` Unicode types, transcode through a temporary `std::u16string` using `UnicodeConverter.hpp`, bind that, and copy back in the post-process callback (`SqlDataBinderCallback::PlanPostProcessOutputColumn`).
80+
- Use `SQL_C_WCHAR` even where the driver "would also accept" `SQL_C_CHAR` — the latter is unreliable for non-ASCII on `psqlODBC`.
81+
- For PostgreSQL connection strings, **disable LF↔CRLF translation** (`LFConversion=0`); without this, embedded newlines in character data silently mutate.
82+
83+
### `[[nodiscard]]` policy on builders
84+
Query builders (`SqlQuery::Select`, `Insert`, `Update`, `Delete`, `Migrate`) and `DataMapper::Query<…>` return objects that *must* be terminated (`.All()`, `.First()`, `.Execute()`, etc.). Mark every step `[[nodiscard]]` so accidental discard is a build error.
85+
86+
### Catch2 + DI in tests
87+
Tests must obtain a connection through the test fixture wired to `--test-env=<name>`, never by constructing one with hard-coded strings. New connection strings go in `.test-env.yml`. See `.agent/testing.md`.
88+
89+
### Reflection
90+
The library uses C++20 member pointers by default and supports a C++26 reflection mode (`LIGHTWEIGHT_CXX26_REFLECTION`). New code that enumerates record fields should use the abstraction (e.g., the `Member(x)` macro pattern in `src/tests/Utils.hpp`) rather than committing to one mode.
91+
92+
### Modules
93+
`Lightweight.cppm` is the C++20 module aggregator (`LIGHTWEIGHT_BUILD_MODULES=ON`, requires CMake ≥ 3.28). New public headers must be addable to the module export list without breaking the build.
94+
95+
## 3. Quick checklist before pushing
96+
97+
- `clang-format` clean
98+
- `clang-tidy` clean (no `NOLINT`s added)
99+
- `linux-clang-debug` builds with no warnings
100+
- Tests run green against `sqlite3`, `mssql2022`, `postgres`
101+
- New public APIs have doxygen + `[[nodiscard]]` where relevant
102+
- No new `if (server == …)` outside `QueryFormatter/`
103+
- New SQL primitives go through `SqlQueryFormatter` virtuals
104+
- Unicode-bearing types round-trip via `SQL_C_WCHAR` / `BasicStringBinder.hpp`
105+
- `/simplify` was run; out-of-scope findings were either addressed or surfaced to the user

.agent/databases.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Per-DBMS notes
2+
3+
Lightweight is exercised against three databases in CI (`.github/workflows/build.yml``dbms_test_matrix`). Every change must be verified against all three before being marked done — driver semantics differ in non-obvious ways.
4+
5+
## SQLite3
6+
7+
- **`SqlServerType::SQLITE`**, test-env name `sqlite3`.
8+
- **Driver**: `sqliteodbc` (`libsqliteodbc` on Linux, `sqliteodbc` MSI on Windows).
9+
- **Connection string**: `DRIVER=SQLite3;Database=test.db` (Linux) or `DRIVER={SQLite3 ODBC Driver};Database=file::memory:` (Windows).
10+
- **Strengths**: fast local tests, no daemon needed.
11+
- **Footguns**:
12+
- Type affinity instead of strict types — column types are advisory.
13+
- No real `BOOLEAN` (stored as INTEGER).
14+
- Limited migration support (no native `ALTER COLUMN` semantics; the migration layer emulates).
15+
- Valgrind is run *only* against SQLite3 in CI to catch driver-leaked allocations (`build.yml` adds `valgrind --leak-check=full ... --error-exitcode=1` for the sqlite3 matrix entry).
16+
17+
## Microsoft SQL Server
18+
19+
- **`SqlServerType::MICROSOFT_SQL`**, test-env names `mssql2017`, `mssql2019`, `mssql2022`, `mssql` (Windows LocalDB).
20+
- **Driver**: `ODBC Driver 17/18 for SQL Server` (`mssql-tools18` on Linux installs ODBC Driver 18; the Windows CI uses LocalDB + Driver 17).
21+
- **Connection strings**:
22+
- Linux Docker: `Driver={ODBC Driver 18 for SQL Server};SERVER=localhost;PORT=1433;UID=SA;PWD=Lightweight!Test42;TrustServerCertificate=yes;DATABASE=LightweightTest`
23+
- Windows LocalDB: `Driver={ODBC Driver 17 for SQL Server};Server=(LocalDB)\MSSQLLocalDB;Database=TestDB;Trusted_Connection=Yes;`
24+
- **Footguns**:
25+
- `wchar_t` is 16-bit on Windows; on Linux the driver still expects UTF-16 even though `wchar_t` is 32-bit. Use the `WideChar` typedef from `src/tests/Utils.hpp` (or `char16_t` directly) for portable UTF-16 in tests.
26+
- `IDENTITY` columns are queried via `SELECT @@IDENTITY` (note the formatter override).
27+
- Schema-qualified identifiers use `[schema].[name]` quoting.
28+
- 2017 `mssql-tools` (no `-C` flag); 2018+ `mssql-tools18` (requires `-C` for self-signed cert).
29+
- 2025 image is currently disabled in CI (image broken).
30+
31+
## PostgreSQL
32+
33+
- **`SqlServerType::POSTGRESQL`**, test-env name `postgres`.
34+
- **Driver**: `psqlODBC` — pick `PostgreSQL Unicode` (NOT `PostgreSQL ANSI`) for any Unicode-bearing workload.
35+
- **Connection string**: `Driver={PostgreSQL Unicode};Server=localhost;Port=5432;Uid=postgres;Pwd=Lightweight!Test42;Database=test`
36+
- **Footguns**:
37+
- **Newline translation**: `psqlODBC` defaults to LF↔CRLF translation. The project's test connection strings disable it explicitly (commit `f311cb9f`); preserve that in any new test-env entry, or character-data round-trips with embedded newlines will silently mutate.
38+
- **Unicode round-trip**: `SQL_C_WCHAR` (UTF-16) is the binding type that round-trips reliably across all DataBinder paths (commit `894c67c7`); avoid `SQL_C_CHAR` for non-ASCII strings unless you've validated the driver actually transcodes.
39+
- `SERIAL` columns: detect via `nextval(` in the captured default value (see `PostgreSqlFormatter::BuildColumnDefinition`); restored backups carry the sequence default rather than `AUTO_INCREMENT`.
40+
- Identifier quoting uses `"`. Folding to lowercase happens for unquoted identifiers — the formatter quotes everything to keep behaviour consistent.
41+
- `SELECT lastval();` retrieves the last inserted id (formatter override). Race-prone if multiple clients insert into different sequences simultaneously — call it inside the same session right after the insert.
42+
43+
## Docker harness
44+
45+
`scripts/tests/docker-databases.py` is the supported way to spin up MSSQL/Postgres locally:
46+
47+
```sh
48+
python3 scripts/tests/docker-databases.py --start --wait # all
49+
python3 scripts/tests/docker-databases.py --start --wait mssql2022 postgres
50+
python3 scripts/tests/docker-databases.py --status
51+
python3 scripts/tests/docker-databases.py --stop
52+
python3 scripts/tests/docker-databases.py --remove
53+
python3 scripts/tests/docker-databases.py --load-sql Chinook.sql mssql2022
54+
```
55+
56+
The script is **idempotent** and waits through four phases (internal health → external port → stability → post-ready stabilisation) before reporting ready. Container ports: MSSQL 2017 → 1432, 2019 → 1434, 2022 → 1433, 2025 → 1435, Postgres → 5432. The shared password is `Lightweight!Test42`.
57+
58+
## When can you legitimately skip a DB?
59+
60+
Only when the test exercises a feature that the database *intrinsically* does not support — e.g., a SQL Server-only construct on SQLite. Use the `UNSUPPORTED_DATABASE(stmt, dbType)` macro from `src/tests/Utils.hpp`:
61+
62+
```cpp
63+
TEST_CASE("merge upsert", "[migration]") {
64+
auto stmt = ConnectTestEnv();
65+
UNSUPPORTED_DATABASE(stmt, SqlServerType::SQLITE);
66+
// … MSSQL/Postgres-only path …
67+
}
68+
```
69+
70+
Do **not** use it to dodge a flaky failure or a Unicode bug — that's a real bug, fix it.

0 commit comments

Comments
 (0)