Skip to content

Commit da3e487

Browse files
authored
Fix the failing build due to API changes in DuckDB v1.5.2 (#21)
1 parent 12024e6 commit da3e487

8 files changed

Lines changed: 188 additions & 15 deletions

File tree

.editorconfig

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
1-
# EditorConfig is awesome: https://EditorConfig.org
1+
# EditorConfig: https://EditorConfig.org
22

3-
# Top-most EditorConfig file
43
root = true
54

6-
# Global settings (applicable to all files unless overridden)
75
[*]
86
charset = utf-8
97
end_of_line = lf
@@ -12,24 +10,19 @@ indent_size = 4
1210
insert_final_newline = true
1311
trim_trailing_whitespace = true
1412

15-
# Rust files
1613
[*.rs]
1714
max_line_length = 100
1815

19-
# Markdown files
2016
[*.md]
21-
max_line_length = 120
17+
max_line_length = 150
2218
trim_trailing_whitespace = false
2319

24-
# Bash scripts
2520
[*.sh]
2621
indent_size = 2
2722

28-
# YAML files
2923
[*.{yaml,yml}]
3024
indent_size = 2
3125

32-
# C & C++ files
3326
[*.{c,cpp,h,hpp}]
3427
indent_size = 2
3528
max_line_length = 100

.github/workflows/dist_pipeline.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,18 @@
11
name: Build Extension Binaries
2+
23
on:
34
workflow_dispatch:
5+
schedule: # Run every day at midnight (UTC)
6+
- cron: '0 0 * * *'
47
pull_request:
58
branches:
69
- main
710
paths-ignore:
811
- '**.md'
912
- 'docs/**'
1013
push:
14+
branches:
15+
- develop
1116
tags:
1217
- 'v*'
1318

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,3 +95,5 @@ _rust.h
9595
uv.lock
9696
tests/temp_models/
9797
*.cast
98+
.claude/
99+
.codex

AGENTS.md

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
# AGENTS.md
2+
3+
This file provides guidance to coding agents collaborating on this repository.
4+
5+
## Mission
6+
7+
Infera is a DuckDB extension for running machine learning inference (on ONNX models) directly from SQL.
8+
It has two tightly coupled layers:
9+
10+
1. A Rust core that loads and caches ONNX models, runs inference through Tract, manages engine state, and exposes a C ABI.
11+
2. A C++ DuckDB extension layer that registers SQL functions and table functions on top of that Rust ABI.
12+
13+
Priorities, in order:
14+
15+
1. Correctness and safety of the SQL-facing inference behavior.
16+
2. Compatibility with supported DuckDB versions and extension CI.
17+
3. Reliable model loading, cache behavior, and error propagation.
18+
4. Small, well-tested changes that preserve the existing Rust/C++ boundary.
19+
20+
## Core Rules
21+
22+
- Use English for code, comments, docs, tests, and commit messages.
23+
- Prefer focused fixes over broad refactoring.
24+
- Preserve the existing Rust/C ABI unless the task explicitly requires changing it.
25+
- Treat `infera/bindings/include/rust.h` as generated code. If Rust FFI signatures change, regenerate it with `make create-bindings`.
26+
- Do not edit vendored code under `external/` unless the task is explicitly about updating or patching a vendored dependency.
27+
- Do not add new dependencies, network behavior, or background processes unless the requirement clearly calls for them.
28+
- Keep docs and examples aligned with user-visible SQL behavior.
29+
30+
## Writing Style
31+
32+
- Use Oxford commas in inline lists: "a, b, and c" not "a, b, c".
33+
- Do not use em dashes. Restructure the sentence, or use a colon or semicolon instead.
34+
- Avoid colorful adjectives and adverbs. Write "TCP proxy" not "lightweight TCP proxy", "scoring components" not "transparent scoring components".
35+
- Use noun phrases for checklist items, not imperative verbs. Write "redundant index detection" not "detect redundant indexes".
36+
- Headings in Markdown files must be in the title case: "Build from Source" not "Build from source". Minor words (a, an, the, and, but, or, for, in,
37+
on, at, to, by, of, is, are, was, were, be) stay lowercase unless they are the first word.
38+
39+
## Repository Layout
40+
41+
- `infera/src/lib.rs`: Rust crate entry point and public exports for the C ABI surface.
42+
- `infera/src/engine.rs`: Inference engine state, model registry, and execution paths.
43+
- `infera/src/model.rs`: ONNX model loading, metadata, and representation.
44+
- `infera/src/config.rs`: Runtime configuration and environment-driven settings.
45+
- `infera/src/http.rs`: Remote model fetching.
46+
- `infera/src/error.rs`: Error types and last-error plumbing shared across the FFI boundary.
47+
- `infera/src/ffi_utils.rs`: Shared helpers for the FFI boundary.
48+
- `infera/bindings/infera_extension.cpp`: DuckDB extension implementation that maps SQL calls to the Rust ABI.
49+
- `infera/bindings/include/infera_extension.hpp`: C++ extension declarations.
50+
- `infera/bindings/include/rust.h`: Generated C header for the Rust ABI.
51+
- `CMakeLists.txt`: Top-level CMake integration, platform detection, and Corrosion setup.
52+
- `extension_config.cmake`: DuckDB extension wiring and linkage to the prebuilt Rust static library.
53+
- `test/sql/`: Sqllogictest files for SQL-level extension behavior.
54+
- `test/models/`: Sample ONNX models used by SQL tests.
55+
- `test/concurrency/`: Concurrency and stress tests.
56+
- `docs/examples/`: SQL examples that should remain runnable against a local build.
57+
- `.github/workflows/tests.yml`: Rust tests and SQL tests in CI.
58+
- `.github/workflows/lints.yml`: Rust formatting and clippy checks in CI.
59+
- `.github/workflows/dist_pipeline.yml`: cross-platform extension packaging against DuckDB `main` and `v1.5.2`.
60+
61+
## Architecture Notes
62+
63+
### Rust Core
64+
65+
The Rust crate owns inference-facing behavior: model loading from local paths or URLs, ONNX parsing through Tract, engine state, model caching, tensor
66+
shaping, JSON output formatting, and error handling.
67+
All SQL-visible behavior should ultimately reduce to deterministic Rust operations exposed through the FFI layer.
68+
69+
### FFI Boundary
70+
71+
The boundary between Rust and C++ is intentionally narrow:
72+
73+
- Rust returns primitive values, heap-allocated C strings, or result structs that own their own buffers.
74+
- C++ is responsible for converting those values into DuckDB vectors and freeing Rust-allocated memory with the matching `infera_free_*` functions.
75+
- Errors should cross the boundary through the existing last-error mechanism instead of ad hoc conventions.
76+
77+
When changing anything on one side of the boundary, inspect the matching code on the other side in the same change.
78+
79+
### DuckDB Layer
80+
81+
`infera/bindings/infera_extension.cpp` registers scalar functions such as `infera_predict` and `infera_predict_from_blob`, plus table-style behavior
82+
for
83+
listing and inspecting loaded models.
84+
DuckDB API compatibility matters here. If a change touches vector access, function registration, or scans, verify against the vendored DuckDB headers
85+
in `external/duckdb`.
86+
87+
### Build Integration
88+
89+
`make release` and `make debug` build the Rust crate first, then build DuckDB plus the extension.
90+
`extension_config.cmake` expects a prebuilt Rust static library and links it into the DuckDB extension targets.
91+
`CMakeLists.txt` also contains platform and Rust-target selection logic used by local builds and CI distribution builds.
92+
93+
## Generated and Derived Files
94+
95+
- `infera/bindings/include/rust.h` is generated from the Rust crate via `cbindgen`.
96+
- `infera/target/`, `build/`, and coverage outputs such as `infera/cobertura.xml` are build artifacts, not source.
97+
- Do not hand-edit generated artifacts unless the task explicitly requires it, and you explain why.
98+
99+
## Rust Conventions
100+
101+
- Edition: Rust 2021.
102+
- Format with `cargo fmt` through `make rust-format`.
103+
- Lint with `cargo clippy` through `make rust-lint`.
104+
- Follow the existing error style: return typed errors internally, then translate them once at the FFI boundary.
105+
- Avoid `unwrap()` and `expect()` in production code. CI denies them via clippy.
106+
- Prefer existing crates and helpers already in use before introducing new abstractions.
107+
108+
## C++ and DuckDB Conventions
109+
110+
- Keep C++ changes narrowly scoped to DuckDB integration concerns.
111+
- Match current DuckDB APIs used by the vendored headers in `external/duckdb/src/include`.
112+
- Be careful with vector mutability and ownership. Many DuckDB helpers have separate const and mutable accessors.
113+
- Keep user-facing SQL function names, signatures, and error messages stable unless the task explicitly changes them.
114+
115+
## Required Validation
116+
117+
Run the narrowest relevant checks, then expand if the change crosses layers.
118+
119+
| Area | Command | Use When |
120+
|----------------------|--------------------|---------------------------------------------------------------------|
121+
| Rust formatting | `make rust-format` | Any Rust code changed |
122+
| Rust lint | `make rust-lint` | Any Rust code changed |
123+
| Rust tests | `make rust-test` | Rust logic, FFI, model loading, inference, or HTTP behavior changed |
124+
| Extension build | `make release` | C++, CMake, linkage, or SQL-facing behavior changed |
125+
| SQL tests | `make test` | SQL functions, table functions, or DuckDB integration changed |
126+
| Examples | `make examples` | User-visible SQL behavior or docs/examples changed |
127+
| Combined local check | `make check` | Small Rust-only changes |
128+
129+
Minimum expectations:
130+
131+
- Rust-only logic changes: `make rust-test` and `make rust-lint`.
132+
- FFI changes: `make create-bindings`, `make rust-test`, and `make release`.
133+
- C++ or SQL-surface changes: `make release` and `make test`.
134+
- Docs/examples updates that affect runnable SQL: `make examples`.
135+
136+
## Testing Expectations
137+
138+
- Rust tests in `infera/src/` cover crate behavior, engine state, model loading, and error paths.
139+
- SQL tests in `test/sql/*.test` validate the extension from DuckDB's side and are the right place for user-visible SQL regressions.
140+
- Concurrency tests in `test/concurrency/` cover thread-safety of the inference engine.
141+
- Prefer local or offline-friendly tests for model-facing logic. Do not make CI depend on remote model downloads unless the repository already does so
142+
for that path.
143+
- If you change model loading, cache semantics, inference output shape, or SQL function output shape, add or update tests in the layer where the
144+
regression would be caught first.
145+
146+
## Change Design Checklist
147+
148+
Before coding:
149+
150+
1. Identify whether the task belongs to Rust core, FFI boundary, C++ DuckDB layer, build wiring, or tests.
151+
2. Check whether the change affects one DuckDB version or both `main` and `v1.5.2`.
152+
3. Decide whether `rust.h`, docs examples, or SQL tests need to move with the code.
153+
4. Confirm whether the change is safe under offline or model-free test conditions.
154+
155+
Before submitting:
156+
157+
1. Relevant build and test commands pass locally, or any gaps are explicitly called out.
158+
2. Generated headers are refreshed if the Rust ABI is changed.
159+
3. User-visible SQL changes are covered by `test/sql` and reflected in docs/examples where appropriate.
160+
4. Changes to DuckDB integration are reviewed against the vendored headers, not memory of older APIs.
161+
162+
## Commit and PR Hygiene
163+
164+
- Keep commits scoped to one logical change.
165+
- Mention both layers when relevant: for example, "update Rust FFI and DuckDB binding for X".
166+
- PR descriptions should include:
167+
1. Behavioral change summary.
168+
2. Validation runs locally.
169+
3. Whether the change affects Rust only, SQL surface, or cross-version DuckDB compatibility.

README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,8 @@
77
<h2>Infera</h2>
88

99
[![Tests](https://img.shields.io/github/actions/workflow/status/CogitatorTech/infera/tests.yml?label=tests&style=flat&labelColor=282c34&logo=github)](https://github.com/CogitatorTech/infera/actions/workflows/tests.yml)
10-
[![Code Quality](https://img.shields.io/codefactor/grade/github/CogitatorTech/infera?label=quality&style=flat&labelColor=282c34&logo=codefactor)](https://www.codefactor.io/repository/github/CogitatorTech/infera)
1110
[![Examples](https://img.shields.io/badge/examples-view-green?style=flat&labelColor=282c34&logo=github)](https://github.com/CogitatorTech/infera/tree/main/docs/examples)
12-
[![Docs](https://img.shields.io/badge/docs-view-blue?style=flat&labelColor=282c34&logo=read-the-docs)](https://github.com/CogitatorTech/infera/tree/main/docs)
11+
[![Docs](https://img.shields.io/badge/docs-read-blue?style=flat&labelColor=282c34&logo=read-the-docs)](https://github.com/CogitatorTech/infera/tree/main/docs)
1312
[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-007ec6?style=flat&labelColor=282c34&logo=open-source-initiative)](https://github.com/CogitatorTech/infera)
1413

1514
In-Database Machine Learning for DuckDB

external/duckdb

Submodule duckdb updated 5163 files

external/extension-ci-tools

infera/bindings/infera_extension.cpp

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,11 @@
2323

2424
namespace duckdb {
2525

26+
template <typename T>
27+
static T *GetFlatVectorDataWritable(Vector &vector) {
28+
return const_cast<T *>(FlatVector::GetData<T>(vector));
29+
}
30+
2631
/**
2732
* @brief Retrieves the last error message from the Infera Rust core.
2833
* @return A string containing the error message, or "unknown error" if not set.
@@ -236,7 +241,7 @@ static void Predict(DataChunk &args, ExpressionState &state, Vector &result) {
236241
throw InvalidInputException(err_msg);
237242
}
238243
result.SetVectorType(VectorType::FLAT_VECTOR);
239-
auto result_data = FlatVector::GetData<float>(result);
244+
auto result_data = GetFlatVectorDataWritable<float>(result);
240245
for (idx_t i = 0; i < batch_size; i++) {
241246
result_data[i] = res.data[i];
242247
}
@@ -355,7 +360,7 @@ static void PredictMulti(DataChunk &args, ExpressionState &state, Vector &result
355360
throw InvalidInputException(err_msg);
356361
}
357362
result.SetVectorType(VectorType::FLAT_VECTOR);
358-
auto result_data = FlatVector::GetData<string_t>(result);
363+
auto result_data = GetFlatVectorDataWritable<string_t>(result);
359364
const size_t output_cols = res.cols;
360365
for (idx_t row_idx = 0; row_idx < batch_size; row_idx++) {
361366
std::ostringstream oss;

0 commit comments

Comments
 (0)