You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Three small follow-ups that landed alongside the larger v26.06.1
features and bug fixes:
- arc_query_client_disconnects_total Prometheus counter (PR #466,
closes#426) — observability follow-up to PR #424's streaming-on-
client-disconnect fix. Placed under the existing #424 entry.
- Arrow streaming tests now leak-detect with CheckedAllocator (PR
#465, closes#427) — test-hygiene follow-up to the same #424 work;
buildArrowBatch helper now releases builders + cols defensively so
CheckedAllocator + AssertSize(t,0) catches future Arrow leaks.
- Line Protocol parser sub-slice indexing in splitOnDelimiter (PR
#467, closes#354) — ingest hot-path perf: ParseLine -41% ns /
-52% allocs, ParseBatch -31% ns / -50% allocs. Includes the bench
table and notes the declined dynamic-cap-8 suggestion from Gemini
review (caused a +9% regression on the full path).
Doc-only; +35 lines in RELEASE_NOTES_2026.06.1.md.
Co-authored-by: Ignacio Van Droogenbroeck <ignacio@vandroogenbroeck.net>
Copy file name to clipboardExpand all lines: RELEASE_NOTES_2026.06.1.md
+35Lines changed: 35 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -740,6 +740,22 @@ The fix is mechanical: capture the error from `bufio.Writer.Flush()` and break t
740
740
741
741
This is **complementary to but distinct from** the #420 retention/delete leak: that fix targeted DuckDB native heap residue after S3 reads; this one targets in-flight Arrow record batches held on the goroutine stack when the client abandons a query mid-stream.
Follow-up observability for the disconnect fix above. Operators investigating "why does RSS spike when this dashboard refreshes" used to grep logs for the streaming-truncation `Warn` line to count Grafana panel-close events. The new counter exposes the same signal as a Prometheus time series:
Three labels distinguish (a) the `/api/v1/query/arrow` IPC endpoint, (b) the `duckdb_arrow`-build path that `/api/v1/query` takes by default, and (c) the pure `database/sql` streaming JSON path. Wired at all 5 existing client-disconnect Warn-log sites in [internal/api/query_arrow.go](internal/api/query_arrow.go), [internal/api/query_arrow_json.go](internal/api/query_arrow_json.go), and [internal/api/query.go](internal/api/query.go). Guarded by the existing `isClientError()` predicate, so server-side stream failures stay in `IncQueryErrors` only — the new counter is strictly client-side noise. Typed `DisconnectPathArrowIPC` / `DisconnectPathArrowJSON` / `DisconnectPathSQLJSON` constants prevent typos at compile time; an unknown label is silently dropped by the increment switch.
752
+
753
+
### Arrow streaming tests now leak-detect with `CheckedAllocator` (PR #465, closes #427)
754
+
755
+
Test-hygiene follow-up to the disconnect fix. The `buildArrowBatch` helper in [internal/api/query_arrow_json_test.go](internal/api/query_arrow_json_test.go) was leaking Arrow allocator buffers — invisible under `memory.NewGoAllocator` (Go's GC absorbs the residue), but fatal under `memory.NewCheckedAllocator` because the builders and column arrays kept their own reference counts after `array.NewRecord` retained them.
756
+
757
+
The helper now releases builders + cols inside two `defer` blocks (registered before their init loops, with nil-check guards for mid-loop panics). `TestStreamArrowJSON_FlushErrorBreaksLoop` switched from `NewGoAllocator` → `NewCheckedAllocator` + `defer alloc.AssertSize(t, 0)`, so future regressions where the disconnect-break path forgets to release a record will fail the test loudly instead of silently growing the heap.
758
+
743
759
### S3 Endpoint Scheme Normalization for DuckDB (PR #422)
744
760
745
761
The AWS SDK Go v2 accepts `s3_endpoint` with or without an `http(s)://` prefix; DuckDB's `httpfs` extension expects a bare `host:port` and prepends scheme based on `s3_use_ssl`. With `s3_endpoint = "http://host:port"` in `arc.toml` (matching the AWS SDK convention), DuckDB built malformed URLs of the form `http://http://host:port/...` and every `read_parquet()` against S3 failed with `Could not resolve hostname`.
@@ -752,6 +768,25 @@ The DELETE API rewrites parquet files to remove rows matching a WHERE clause and
752
768
753
769
Replaced the TeeReader with a two-step "hash, then seek-and-upload": `io.Copy` into the SHA256 hasher, `Seek(0, io.SeekStart)`, pass the seekable `*os.File` directly to `storage.WriteReader`. The second read hits OS page cache so disk I/O is unchanged. Validated against MinIO over plain HTTP: 199/199 files rewritten (pre-fix: 0/200).
754
770
771
+
### Line Protocol parser: sub-slice indexing in `splitOnDelimiter` (PR #467, closes #354)
772
+
773
+
The ingest hot path's tokenizer in [internal/ingest/lineprotocol.go](internal/ingest/lineprotocol.go) appended bytes one-at-a-time into a growing `[]byte` per part, allocating a fresh backing array (with several `append`-driven grows) for every space and every comma split in every line-protocol line. Replaced with sub-slice indexing: track a `start` index, emit `data[start:i]` sub-slices on delimiter hits, and let the returned `[][]byte` alias the input buffer.
774
+
775
+
Behavior is unchanged at every edge — consecutive delimiters skip empty parts, leading/trailing delimiters handled identically, escape branch preserves the `\x` pair, quote tracking unchanged, returned-empty-on-empty-input preserved. Aliasing is safe because every downstream caller copies via `string(...)` or `unescape()` before storing anything beyond the parse call (`models.Record` holds `map[string]string` and `map[string]interface{}` only — no `[]byte` references escape).
Two follow-up micro-optimizations from Gemini review landed in the same PR: a pre-sized parts slice (`make([][]byte, 0, 4)`) so the first append is allocation-free on typical telegraf lines, and a lazy-allocation path that returns a 1-element literal slice for no-delimiter inputs (measurements with no tags, single-field writes) — additional -720 bytes per ParseBatch (-6.6%). A third Gemini suggestion (dynamic cap=8 on comma) was declined with bench evidence (caused a +9% regression on the full path because most real comma-splits have ≤4 parts; oversizing them costs more in aggregate than the rare 5+-part case saves).
787
+
788
+
`BenchmarkSplitOnDelimiter` added to `lineprotocol_test.go` covering both space + comma paths so future regressions on the ingest hot path don't go unnoticed.
789
+
755
790
---
756
791
757
792
_Maintainer notes: keep this file at the repo root (per [memory/project_release_strategy.md](memory/project_release_strategy.md)); do not write to `docs/RELEASE_NOTES_*` (that path is stale)._
0 commit comments