Skip to content

Commit 258cb64

Browse files
authored
feat(oracledb): native JSON, VECTOR ergonomics, smart LOB coercion (#430)
## Summary Overhauls Oracle's type coercion path so the common cases just work and the uncommon cases have an explicit escape hatch. ### Native JSON - 21c+ binds Python `dict` / `list` directly via `DB_TYPE_JSON` (binary OSON); 19c-20c falls back to `BLOB CHECK (... IS JSON)`; pre-19c uses `CLOB CHECK (... IS JSON)`. The right path is picked from the server's major version, cached on the connection. - The default JSON serializer strategy is now `"driver"` so the binary path isn't skipped by an upstream string serialization. - Output side: `DB_TYPE_JSON` columns return Python objects as-is, and BLOB / CLOB columns whose `type_name` includes `JSON` are auto-parsed. ### VECTOR ergonomics (Oracle 23ai) - `list[float]`, `list[int]`, `tuple[...]`, `array.array`, and `np.ndarray` all bind to `DB_TYPE_VECTOR` with no flag toggle. Integer sequences in the int8 range pack as int8; everything else falls back to float32. - New `vector_return_format` driver feature (`"numpy"` / `"list"` / `"array"`) controls how VECTOR reads materialize. Defaults to `"numpy"` when NumPy is installed, `"list"` otherwise. Errors loudly if `"numpy"` is requested without NumPy. - Module renamed from `_numpy_handlers` to `_vector_handlers` to reflect the broader payload coverage. Public API (`numpy_converter_in`, etc.) is unchanged. ### Smart LOB coercion - New typed wrappers — `OracleClob`, `OracleBlob`, `OracleJson` — let users bypass the size heuristics when they want explicit control. `OracleClob(bytes)` decodes utf-8 before binding; `OracleBlob(str)` encodes utf-8; `OracleJson(...)` defers to the JSON handler chain so the value never gets coerced into a CLOB intermediary. - Wrappers work for both named (`{"col": OracleClob(...)}`) and positional (`(1, OracleClob(...))`) bind shapes. - The 4000 / 2000 byte thresholds are now `driver_features` settings — `oracle_varchar2_byte_limit` and `oracle_raw_byte_limit` — so users on databases with `MAX_STRING_SIZE=EXTENDED` can opt into 32767-byte VARCHAR2 without auto-coercion to CLOB.
1 parent 5a05d38 commit 258cb64

24 files changed

Lines changed: 2716 additions & 461 deletions

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,3 +73,4 @@ tools/scripts/profiles/*.prof
7373
# Beads / Dolt files (added by bd init)
7474
.dolt/
7575
.beads-credential-key
76+
.codex

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ repos:
1717
- id: mixed-line-ending
1818
- id: trailing-whitespace
1919
- repo: https://github.com/charliermarsh/ruff-pre-commit
20-
rev: "v0.15.11"
20+
rev: "v0.15.12"
2121
hooks:
2222
- id: ruff
2323
args: ["--fix"]

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ pre-commit: ## Run pre-commit hooks
221221
.PHONY: slotscheck
222222
slotscheck: ## Run slotscheck
223223
@echo "${INFO} Running slotscheck... 🔍"
224-
@uv run slotscheck sqlspec/
224+
@PYTHONWARNINGS="ignore:::google.adk.features._feature_decorator" uv run slotscheck sqlspec/
225225
@echo "${OK} Slotscheck complete ✨"
226226

227227
.PHONY: fix

pyproject.toml

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,10 @@ fsspec = ["fsspec"]
5151
litestar = ["litestar"]
5252
msgspec = ["msgspec"]
5353
mypyc = ["sqlglot[c]>=30.0.0"]
54-
mysql-connector = ["mysql-connector-python"]
54+
mysql-connector = [
55+
"mysql-connector-python; python_version < '3.12'",
56+
"mysql-connector-python<9.7.0; python_version >= '3.12'",
57+
]
5558
nanoid = ["fastnanoid>=0.4.1"]
5659
obstore = ["obstore"]
5760
opentelemetry = ["opentelemetry-instrumentation"]
@@ -157,6 +160,16 @@ sqlspec = "sqlspec.__main__:run_cli"
157160
sqlspec-dark = "tools.sphinx_ext.pygments_styles:SQLSpecDarkStyle"
158161
sqlspec-light = "tools.sphinx_ext.pygments_styles:SQLSpecLightStyle"
159162

163+
[tool.uv]
164+
# mysql-connector-python 9.7.0 dropped cp312/cp313/cp314 wheels (regression vs 9.6.0).
165+
# Override dependency metadata for every source that requests mysql-connector-python.
166+
# Keep the uncapped dependency on Python <3.12, and force the cap on Python >=3.12
167+
# so transitive pulls (e.g., pytest-databases[mysql]) resolve to an existing wheel.
168+
override-dependencies = [
169+
"mysql-connector-python; python_version < '3.12'",
170+
"mysql-connector-python<9.7.0; python_version >= '3.12'",
171+
]
172+
160173
[build-system]
161174
build-backend = "hatchling.build"
162175
requires = ["hatchling", "hatch-mypyc"]
@@ -209,6 +222,10 @@ include = [
209222
"sqlspec/data_dictionary/**/*.py", # Data dictionary mixin (required for adapter inheritance)
210223
"sqlspec/adapters/**/core.py", # Adapter compiled helpers
211224
"sqlspec/adapters/**/type_converter.py", # All adapters type converters
225+
"sqlspec/adapters/oracledb/_param_types.py", # Slot-based LOB/JSON parameter wrappers
226+
"sqlspec/adapters/oracledb/_json_handlers.py", # Native JSON inputtypehandler / outputtypehandler chain
227+
"sqlspec/adapters/oracledb/_uuid_handlers.py", # UUID ↔ RAW(16) inputtypehandler / outputtypehandler chain
228+
"sqlspec/adapters/oracledb/_vector_handlers.py", # DB_TYPE_VECTOR inputtypehandler / outputtypehandler dispatch
212229
"sqlspec/utils/text.py", # Text utilities
213230
"sqlspec/utils/sync_tools.py", # Synchronous utility functions
214231
"sqlspec/utils/type_guards.py", # Type guard utilities

sqlspec/adapters/oracledb/__init__.py

Lines changed: 31 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
1-
import sqlspec.adapters.oracledb._numpy_handlers as numpy_handlers
2-
from sqlspec.adapters.oracledb._numpy_handlers import (
3-
DTYPE_TO_ARRAY_CODE,
4-
numpy_converter_in,
5-
numpy_converter_out,
6-
numpy_input_type_handler,
7-
numpy_output_type_handler,
8-
register_numpy_handlers,
1+
import sqlspec.adapters.oracledb._json_handlers as json_handlers
2+
import sqlspec.adapters.oracledb._vector_handlers as vector_handlers
3+
from sqlspec.adapters.oracledb._json_handlers import (
4+
json_converter_in_blob,
5+
json_converter_in_clob,
6+
json_converter_out_blob,
7+
json_converter_out_clob,
8+
json_input_type_handler,
9+
json_output_type_handler,
10+
register_json_handlers,
911
)
12+
from sqlspec.adapters.oracledb._param_types import OracleBlob, OracleClob, OracleJson
1013
from sqlspec.adapters.oracledb._typing import (
1114
OracleAsyncConnection,
1215
OracleAsyncCursor,
@@ -20,6 +23,14 @@
2023
uuid_input_type_handler,
2124
uuid_output_type_handler,
2225
)
26+
from sqlspec.adapters.oracledb._vector_handlers import (
27+
DTYPE_TO_ARRAY_CODE,
28+
numpy_converter_in,
29+
numpy_converter_out,
30+
numpy_input_type_handler,
31+
numpy_output_type_handler,
32+
register_numpy_handlers,
33+
)
2334
from sqlspec.adapters.oracledb.config import (
2435
OracleAsyncConfig,
2536
OracleConnectionParams,
@@ -42,24 +53,35 @@
4253
"OracleAsyncCursor",
4354
"OracleAsyncDriver",
4455
"OracleAsyncExceptionHandler",
56+
"OracleBlob",
57+
"OracleClob",
4558
"OracleConnectionParams",
4659
"OracleDriverFeatures",
60+
"OracleJson",
4761
"OraclePoolParams",
4862
"OracleSyncConfig",
4963
"OracleSyncConnection",
5064
"OracleSyncCursor",
5165
"OracleSyncDriver",
5266
"OracleSyncExceptionHandler",
5367
"default_statement_config",
68+
"json_converter_in_blob",
69+
"json_converter_in_clob",
70+
"json_converter_out_blob",
71+
"json_converter_out_clob",
72+
"json_handlers",
73+
"json_input_type_handler",
74+
"json_output_type_handler",
5475
"numpy_converter_in",
5576
"numpy_converter_out",
56-
"numpy_handlers",
5777
"numpy_input_type_handler",
5878
"numpy_output_type_handler",
79+
"register_json_handlers",
5980
"register_numpy_handlers",
6081
"register_uuid_handlers",
6182
"uuid_converter_in",
6283
"uuid_converter_out",
6384
"uuid_input_type_handler",
6485
"uuid_output_type_handler",
86+
"vector_handlers",
6587
)
Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
"""Oracle native JSON type handlers.
2+
3+
Provides automatic conversion between Python ``dict`` / ``list`` / ``tuple`` values
4+
and Oracle's JSON storage types via connection type handlers.
5+
6+
Routing matrix (input):
7+
8+
* Oracle 21c+ native ``JSON``: bind via ``DB_TYPE_JSON`` (binary OSON).
9+
* Oracle 19c-20c with ``BLOB CHECK (... IS JSON)``: bind via ``DB_TYPE_BLOB`` with
10+
UTF-8 JSON bytes.
11+
* Oracle 12c-18c with ``CLOB CHECK (... IS JSON)``: bind via ``DB_TYPE_CLOB`` with
12+
serialized JSON string.
13+
* Server major version is read from ``connection._sqlspec_oracle_major`` (set in
14+
``OracleSyncConfig._init_connection`` / ``OracleAsyncConfig._init_connection``).
15+
When unknown, default to 21c+ behavior.
16+
17+
Routing matrix (output):
18+
19+
* ``DB_TYPE_JSON``: passthrough (python-oracledb already returns ``dict``).
20+
* ``DB_TYPE_BLOB`` with ``JSON`` in column ``type_name``: parse via
21+
``json_converter_out_blob``.
22+
* ``DB_TYPE_CLOB`` with ``JSON`` in column ``type_name``: parse via
23+
``json_converter_out_clob``.
24+
25+
Handlers chain to any pre-existing ``inputtypehandler`` / ``outputtypehandler``
26+
registered on the connection (e.g. NumPy vector, UUID), so registration order
27+
matters: register JSON after numpy, before UUID is also safe because each
28+
handler returns ``None`` for values it does not own.
29+
"""
30+
31+
from typing import TYPE_CHECKING, Any
32+
33+
from sqlspec.adapters.oracledb._typing import DB_TYPE_BLOB, DB_TYPE_CLOB, DB_TYPE_JSON
34+
from sqlspec.utils.serializers import from_json, to_json
35+
36+
if TYPE_CHECKING:
37+
from oracledb import AsyncConnection, AsyncCursor, Connection, Cursor
38+
39+
__all__ = (
40+
"json_converter_in_blob",
41+
"json_converter_in_clob",
42+
"json_converter_out_blob",
43+
"json_converter_out_clob",
44+
"json_input_type_handler",
45+
"json_output_type_handler",
46+
"register_json_handlers",
47+
)
48+
49+
50+
_JSON_TYPE_NAME_MARKER = "JSON"
51+
52+
# Server-version thresholds for JSON binding strategy selection.
53+
# 21c+ supports DB_TYPE_JSON (binary OSON); 19c-20c uses BLOB CHECK (... IS JSON);
54+
# pre-19c uses CLOB CHECK (... IS JSON).
55+
_NATIVE_JSON_MIN_MAJOR = 21
56+
_BLOB_IS_JSON_MIN_MAJOR = 19
57+
58+
59+
def json_converter_in_clob(value: Any) -> str:
60+
"""Serialize a Python value to a JSON string for CLOB binding."""
61+
return to_json(value)
62+
63+
64+
def json_converter_in_blob(value: Any) -> bytes:
65+
"""Serialize a Python value to UTF-8 JSON bytes for BLOB binding."""
66+
return to_json(value, as_bytes=True)
67+
68+
69+
def json_converter_out_clob(value: "str | None") -> Any:
70+
"""Parse a JSON string from a CLOB read back into a Python value."""
71+
if value is None:
72+
return None
73+
return from_json(value)
74+
75+
76+
def json_converter_out_blob(value: "bytes | None") -> Any:
77+
"""Parse JSON bytes from a BLOB read back into a Python value."""
78+
if value is None:
79+
return None
80+
return from_json(value)
81+
82+
83+
def _is_json_payload(value: Any) -> bool:
84+
"""Return True if the value should be claimed by the JSON input handler.
85+
86+
``dict`` and ``tuple``/``list`` of dicts are claimed. Sequences whose first
87+
element is a number are NOT claimed — those are vector embeddings and
88+
belong to the vector handler.
89+
"""
90+
if isinstance(value, dict):
91+
return True
92+
if isinstance(value, (list, tuple)):
93+
if not value:
94+
# Empty sequence: ambiguous (could be empty vector or empty list).
95+
# Defer to the next handler in the chain.
96+
return False
97+
first = value[0]
98+
# Reject sequences of numbers (vector embeddings).
99+
return not (isinstance(first, (int, float)) and not isinstance(first, bool))
100+
return False
101+
102+
103+
def _input_type_handler(cursor: "Cursor | AsyncCursor", value: Any, arraysize: int) -> Any:
104+
"""Oracle input type handler for JSON-shaped Python values."""
105+
if not _is_json_payload(value):
106+
return None
107+
108+
server_major = getattr(cursor.connection, "_sqlspec_oracle_major", None)
109+
110+
if server_major is None or server_major >= _NATIVE_JSON_MIN_MAJOR:
111+
return cursor.var(DB_TYPE_JSON, arraysize=arraysize)
112+
if server_major >= _BLOB_IS_JSON_MIN_MAJOR:
113+
return cursor.var(DB_TYPE_BLOB, arraysize=arraysize, inconverter=json_converter_in_blob)
114+
return cursor.var(DB_TYPE_CLOB, arraysize=arraysize, inconverter=json_converter_in_clob)
115+
116+
117+
def _output_type_handler(cursor: "Cursor | AsyncCursor", metadata: Any) -> Any:
118+
"""Oracle output type handler for JSON-bearing column reads."""
119+
type_code = getattr(metadata, "type_code", None)
120+
121+
if type_code is DB_TYPE_JSON:
122+
# Native JSON: python-oracledb returns dict/list directly. No conversion.
123+
return None
124+
125+
type_name = (getattr(metadata, "type_name", "") or "").upper()
126+
if _JSON_TYPE_NAME_MARKER not in type_name:
127+
return None
128+
129+
if type_code is DB_TYPE_BLOB:
130+
return cursor.var(DB_TYPE_BLOB, arraysize=cursor.arraysize, outconverter=json_converter_out_blob)
131+
if type_code is DB_TYPE_CLOB:
132+
return cursor.var(DB_TYPE_CLOB, arraysize=cursor.arraysize, outconverter=json_converter_out_clob)
133+
return None
134+
135+
136+
def json_input_type_handler(cursor: "Cursor | AsyncCursor", value: Any, arraysize: int) -> Any:
137+
"""Public input type handler entry point."""
138+
return _input_type_handler(cursor, value, arraysize)
139+
140+
141+
def json_output_type_handler(cursor: "Cursor | AsyncCursor", metadata: Any) -> Any:
142+
"""Public output type handler entry point."""
143+
return _output_type_handler(cursor, metadata)
144+
145+
146+
def register_json_handlers(connection: "Connection | AsyncConnection") -> None:
147+
"""Register JSON type handlers on an Oracle connection.
148+
149+
Chains to any existing handlers via ``_JsonInputHandler`` / ``_JsonOutputHandler``
150+
wrapper classes so vector / UUID handlers continue to fire for non-JSON values.
151+
"""
152+
try:
153+
existing_input = connection.inputtypehandler
154+
except AttributeError:
155+
existing_input = None
156+
try:
157+
existing_output = connection.outputtypehandler
158+
except AttributeError:
159+
existing_output = None
160+
161+
connection.inputtypehandler = _JsonInputHandler(existing_input)
162+
connection.outputtypehandler = _JsonOutputHandler(existing_output)
163+
164+
165+
class _JsonInputHandler:
166+
"""Chaining wrapper that claims dict/list/tuple values, falling back otherwise."""
167+
168+
__slots__ = ("_fallback",)
169+
170+
def __init__(self, fallback: "Any | None") -> None:
171+
self._fallback = fallback
172+
173+
def __call__(self, cursor: "Cursor | AsyncCursor", value: Any, arraysize: int) -> Any:
174+
result = _input_type_handler(cursor, value, arraysize)
175+
if result is not None:
176+
return result
177+
if self._fallback is not None:
178+
return self._fallback(cursor, value, arraysize)
179+
return None
180+
181+
182+
class _JsonOutputHandler:
183+
"""Chaining wrapper that claims JSON-bearing columns, falling back otherwise."""
184+
185+
__slots__ = ("_fallback",)
186+
187+
def __init__(self, fallback: "Any | None") -> None:
188+
self._fallback = fallback
189+
190+
def __call__(self, cursor: "Cursor | AsyncCursor", metadata: Any) -> Any:
191+
result = _output_type_handler(cursor, metadata)
192+
if result is not None:
193+
return result
194+
if self._fallback is not None:
195+
return self._fallback(cursor, metadata)
196+
return None

0 commit comments

Comments
 (0)