Commit 8ae1a2a
authored
fix: address 11 review findings (7 critical + 4 high-impact) (#38)
* fix(client): route ProviderTypeVertex to VertexClient (Anthropic-on-Vertex)
The registry's ProviderTypeVertex case instantiated NewGeminiClient
against config.VertexGeminiBaseURL -- a Gemini-on-Vertex URL. The
VertexClient defined in vertex.go (Anthropic-on-Vertex, Name() ==
"anthropic-vertex") was never instantiated, so any user with
Provider("vertex") hit Gemini's wire format at a Gemini-on-Vertex
endpoint, producing 4xx.
The gemini-vertex deployment kind in setup/deployment.go is unchanged;
that path is the explicit "Gemini on Vertex" route and is still served
by NewGeminiClient + VertexGeminiBaseURL.
Adds client/provider_registry_test.go covering: (1) Vertex branch
returns *VertexClient with the correct projectID/region/token/baseURL;
(2) region defaults to us-central1; (3) missing VERTEX_PROJECT_ID
returns a clear error.
Closes: C2 in docs/plans/fix-critical-and-high-review.md
* fix(credentials): bound keyring wait time + deduplicate Get path
The keyringDo helper spawned a goroutine to wrap the non-ctx-aware
go-keyring library. On cancellation, the outer select returned ctx.Err()
but the inner goroutine continued running fn() to completion. If the
keyring daemon was stuck (e.g. user locked the keyring indefinitely),
the goroutine leaked forever and the caller waited forever.
Refactor:
- Add keyringDoWithTimeout(ctx, fn, timeout) with an early return for
already-cancelled contexts (no goroutine spawned) and a hard upper
bound (keyringWait = 30s) on the wait. The inner goroutine still runs
fn to completion (the keyring library does not accept ctx), but the
caller is now released within the bounded window.
- Move keyringStore.Get onto keyringDo, eliminating the duplicate
inline select+goroutine pattern and the same leak in the Get path.
Cannot fully eliminate the inner goroutine without switching keyring
libraries; the bound is the best practical mitigation.
Tests: keyring_platform_test.go covers normal return, already-
cancelled (fn not invoked), prompt release on cancel, ctx timeout,
hard timeout, and the Get path.
Closes: C6 in docs/plans/fix-critical-and-high-review.md
* fix(client): opt-in flag for auto-registering dynamic providers
A poisoned OPENAI_API_BASE / OPENAI_BASE_URL in a leaked .envrc or
shared shell history would silently auto-register a ghost OpenAI-
compatible provider and route requests (with the user's OPENAI_API_KEY
header) to an attacker-controlled server.
The auto-register path in getOrCreateProvider is now gated on
EYRIE_ALLOW_DYNAMIC_PROVIDERS (accepts 1/true/yes, any case). Default
is deny: unknown provider names return ErrUnknownProvider. When the
opt-in is set and registration fires, a slog.Warn line records the
provider name, base URL, and the opt-in env var so the behavior is
auditable.
Explicit RegisterDynamicProvider calls from user code are unaffected.
Migration: users who relied on auto-registration (e.g. local LiteLLM,
Ollama on a custom port) should set EYRIE_ALLOW_DYNAMIC_PROVIDERS=1.
Tests: provider_registry_test.go covers default-deny, opt-in path,
opt-in without base URL, the WARN log, and the env var parser.
Closes: C7 in docs/plans/fix-critical-and-high-review.md
* feat(client): unify Gemini SSE parsing with shared parser
Gemini had a bespoke 4KB streamLoop parser while every other
provider uses the shared parseSSEStream (64KB initial buffer,
2MB max) + a per-provider processXxxStream function. Bug fixes
to SSE parsing didn't reach Gemini, and a single 5KB event could
split mid-codepoint or mid-JSON in the 4KB buffer.
This commit:
- Adds processGeminiStream, modeled on processAnthropicStream
and processOpenAIStream
- Routes GeminiClient.StreamChat through the new path
- Preserves the original Gemini "done with usage" contract
(usage and StopReason combined into one done event when both
are present in the final chunk)
- Adds EYRIE_GEMINI_SHARED_PARSER=0 opt-out flag to revert to
the old streamLoop for one release. The old path will be
removed once the new path is validated in production.
The old streamLoop and processStreamChunk are kept as the
opt-out path, with no behavior change when the opt-out is set.
Tests: gemini_stream_test.go covers text, tool call, done with
usage, empty stream, context cancel, 200KB large event, the
feature flag (default + 0 + false + 1), and direct unit tests
of processGeminiStream. Plus a sanity check that the new path
doesn't clobber the client's HTTP client.
Closes: H1 in docs/plans/fix-critical-and-high-review.md
* refactor(client): extract buildAnthropicRequest and buildOpenAIRequest
Chat and StreamChat in anthropic.go had ~120 lines of duplicate
setup: every field (System, Temperature, TopP, TopK, StopSequences,
EnableCaching, Tools, Thinking, Metadata, ServiceTier, OutputConfig)
was re-applied in both methods. A drift in either (e.g., adding a
new field to Chat but forgetting StreamChat) was silent and only
caught in production. openai.go had a smaller ~15-line duplicate
of the request construction (the body was already extracted via
buildRequestBase).
Extracted two helpers:
- AnthropicClient.buildAnthropicRequest(ctx, messages, opts, stream)
- OpenAIClient.buildOpenAIRequest(ctx, messages, opts, stream)
Both take a `stream` bool that controls:
- the `stream: true` field in the body
- the `Accept: text/event-stream` header
Net diff: -28 lines in production (60 insertions, 88 deletions)
plus 352 lines of new tests. The key test is
TestAnthropic_ChatVsStream_SameBody, which captures the request
bodies from both methods via a mock server and asserts they're
byte-identical (modulo the `stream` field). This catches drift
automatically: any future change that diverges between the two
methods will fail this test.
Other tests cover the error paths (model required, size limit),
the GetBody re-readability needed for the MiMo 401 retry, the
system-prompt merging behavior, and the same byte-equality +
Accept-header assertions for the OpenAI Chat Completions path.
Closes: H2 in docs/plans/fix-critical-and-high-review.md
* refactor(client): unify Anthropic response parsing across providers
The Anthropic response-parsing logic (text / thinking /
redacted_thinking / tool_use extraction, plus usage fields
including cache tokens and thinking tokens) was duplicated
three times: inlined in anthropic.go Chat, inlined in vertex.go
Chat, and as a private responseFromAnthropic helper in bedrock.go.
A wire-format change (e.g., a new content block type) required
three edits; missing one meant a provider silently dropped blocks
(e.g., Bedrock might lose thinking content while Anthropic kept it).
Extracted parseAnthropicResponse(ar, requestID, orgID) into
anthropic.go (next to the anthropicResponse struct definition).
All three call sites now use it:
- Anthropic: parseAnthropicResponse(ar, requestID, orgID)
- Bedrock: parseAnthropicResponse(ar, reqID, "")
- Vertex: parseAnthropicResponse(ar, reqID, "")
The orgID parameter is explicit: only Anthropic's wire protocol
carries the Anthropic-Organization-Id response header; Bedrock
and Vertex pass empty.
Net production diff: -40 lines. The redacted_thinking safety
invariant (never echo the encrypted blob back to the caller) is
now enforced in one place and is enforced by
TestParseAnthropicResponse_RedactedThinking.
Tests: 11 total (3 renamed from TestResponseFromAnthropic_*,
now with +orgID assertion, plus 8 new in
anthropic_response_test.go covering thinking, redacted_thinking,
mixed content, multiple text/tool blocks, bad JSON in tool_use,
and total-tokens calculation).
Closes: H3 in docs/plans/fix-critical-and-high-review.md
* feat(client): wire *EyrieError into provider error paths
The structured *EyrieError type was defined in errors.go with
IsRetriable/IsAuthError/IsRateLimited helpers, but no provider
returned it — every error path used fmt.Errorf with the
"eyrie: <provider> API error (status=N, request_id=X)" format.
The structured type was dead code.
fallback.isRetriableError worked around this by regex-parsing
the error message via types.IsTransient and types.ExtractHTTPStatus
— fragile (a log-format change would silently misclassify).
This commit:
- formatAPIError now returns *EyrieError with a new `op` parameter
(chat, stream, count_tokens, embedding)
- All 12 call sites updated to pass the op
- fallback.isRetriableError now uses errors.As(err, &eyrieErr) +
eyrieErr.IsRetriable() as the structured path, with the existing
regex path as a fallback for legacy errors (transport errors,
before the EyrieError migration)
- The IsRetriable/IsAuthError/IsRateLimited helpers are no longer
dead code
Tests: 3 new (ReturnsEyrieError / AuthError / RetriableCodes) plus
4 updated test assertions to match the new "<provider> <op> failed"
format. The IsRateLimited / IsRetriable / IsAuthError predicates
are now exercised end-to-end.
Closes: H4 in docs/plans/fix-critical-and-high-review.md
* docs(eyrie): mark fix-critical-and-high-review plan as complete
All 8 items (4 critical + 4 high-impact) are committed and ready
for review on PR #38. The plan now has a completion summary table
at the top with status, commits, and a net diff summary.
Closes: docs tracking for the 30/60/90 plan
* fix(client): wire inner error into *EyrieError
formatAPIError built an *EyrieError but never set the Err field, so
EyrieError.Unwrap() always returned nil. That broke the contract
needed for errors.Is / errors.As to traverse the underlying cause
(e.g. context.Canceled or an io read error from the provider body).
Make parseProviderError return the read error alongside the
structured detail, plumb it through formatAPIError as a new inner
parameter, and assign it to EyrieError.Err. All seven provider
call sites (anthropic, openai, azure, vertex, bedrock chat,
bedrock stream, embedding) now propagate the read error so
downstream code can inspect it with errors.Is.
Also gofmt'd bedrock.go: the Chat() return + closing brace at
lines 80-81 were orphaned with extra indentation. The gofmt -w
change is naturally part of the bedrock.go call-site update and
is folded into this commit.
Cover the new contract with TestFormatAPIError_InnerErrorUnwrap,
which uses io.ErrUnexpectedEOF as the inner cause and asserts both
errors.Is traversal and that the inner message is appended to the
rendered error string.
Refs: PR #38 review issues #1, #3.
* docs(client): clarify doWithRetry's transport-layer scope
The plan doc claims doWithRetry was migrated to errors.As, but it
operates at the transport layer — only seeing transport errors and
raw HTTP status codes from httpClient.Do. It cannot receive an
*EyrieError because those are constructed by formatAPIError AFTER
the retry loop returns.
Add a doc comment pointing readers at fallback.go:240-244, where
the structured *EyrieError.IsRetriable()/IsAuthError() checks
actually live and drive provider rotation. This accurately
describes the architecture and prevents future readers from
chasing the same red herring.
Refs: PR #38 review issue #2.
* refactor(client): rename TestFormatAPIError_OmitsRequestIDWhenEmpty
The test name "NoHintFallsBackToDetail" no longer matches what the
test actually asserts after the EyrieError migration. The test
verifies the error string omits "request_id=" when the caller
passes an empty requestID, so rename it to match that contract.
Refs: PR #38 review issue #4.
* fix(credentials): use NewTimer in keyringDoWithTimeout
keyringDoWithTimeout used time.After(timeout) inside the select,
which allocates a timer that lives in the runtime until the
timeout elapses — even when we return early via the result channel
or ctx.Done. Under load (many fast keyring calls with short
timeouts) this leaks timer entries until each one expires.
Switch to time.NewTimer + defer timer.Stop. Stop is safe whether
the timer has already fired or not; if it has fired and the
runtime hasn't drained the channel yet, Stop drains it
transparently. The keyringWait cap stays the same; only the
timer lifecycle changes.
Refs: PR #38 review issue #5.
* chore(ci): apply gofumpt and wrap bare URLs for markdownlint
* chore(lint): fix ineffassign and SA9003 in gemini stream and anthropic test1 parent ca64bb8 commit 8ae1a2a
22 files changed
Lines changed: 2343 additions & 257 deletions
File tree
- client
- credentials
- docs/plans
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
217 | 217 | | |
218 | 218 | | |
219 | 219 | | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
220 | 264 | | |
221 | 265 | | |
222 | 266 | | |
| |||
367 | 411 | | |
368 | 412 | | |
369 | 413 | | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | | - | |
374 | | - | |
375 | | - | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
376 | 427 | | |
377 | 428 | | |
378 | | - | |
| 429 | + | |
379 | 430 | | |
380 | 431 | | |
381 | 432 | | |
382 | 433 | | |
383 | 434 | | |
384 | | - | |
385 | 435 | | |
386 | 436 | | |
387 | 437 | | |
| |||
391 | 441 | | |
392 | 442 | | |
393 | 443 | | |
394 | | - | |
| 444 | + | |
395 | 445 | | |
396 | 446 | | |
397 | 447 | | |
| |||
412 | 462 | | |
413 | 463 | | |
414 | 464 | | |
| 465 | + | |
415 | 466 | | |
416 | 467 | | |
417 | 468 | | |
| |||
424 | 475 | | |
425 | 476 | | |
426 | 477 | | |
427 | | - | |
| 478 | + | |
428 | 479 | | |
429 | 480 | | |
430 | 481 | | |
431 | 482 | | |
432 | | - | |
| 483 | + | |
433 | 484 | | |
434 | 485 | | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
435 | 489 | | |
436 | 490 | | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
437 | 504 | | |
438 | 505 | | |
439 | 506 | | |
| |||
446 | 513 | | |
447 | 514 | | |
448 | 515 | | |
449 | | - | |
| 516 | + | |
| 517 | + | |
450 | 518 | | |
451 | 519 | | |
452 | 520 | | |
453 | 521 | | |
454 | 522 | | |
455 | 523 | | |
456 | 524 | | |
457 | | - | |
458 | | - | |
459 | | - | |
460 | | - | |
461 | | - | |
462 | | - | |
463 | | - | |
464 | | - | |
465 | | - | |
466 | | - | |
467 | | - | |
468 | | - | |
469 | | - | |
470 | | - | |
471 | | - | |
472 | | - | |
473 | | - | |
474 | | - | |
475 | | - | |
476 | | - | |
477 | | - | |
478 | | - | |
479 | | - | |
480 | | - | |
481 | | - | |
482 | | - | |
483 | | - | |
484 | | - | |
485 | | - | |
486 | | - | |
| 525 | + | |
487 | 526 | | |
488 | 527 | | |
489 | 528 | | |
| |||
494 | 533 | | |
495 | 534 | | |
496 | 535 | | |
497 | | - | |
498 | | - | |
499 | | - | |
500 | | - | |
501 | | - | |
502 | | - | |
503 | | - | |
504 | | - | |
505 | | - | |
506 | | - | |
507 | | - | |
508 | | - | |
509 | | - | |
510 | | - | |
511 | | - | |
512 | | - | |
513 | | - | |
514 | | - | |
515 | | - | |
516 | | - | |
517 | | - | |
518 | | - | |
519 | | - | |
520 | | - | |
521 | | - | |
522 | | - | |
523 | | - | |
524 | | - | |
525 | | - | |
526 | | - | |
527 | | - | |
528 | | - | |
529 | | - | |
530 | | - | |
531 | | - | |
532 | | - | |
533 | | - | |
534 | | - | |
535 | | - | |
536 | | - | |
537 | | - | |
538 | | - | |
539 | | - | |
540 | | - | |
541 | | - | |
542 | | - | |
543 | | - | |
544 | | - | |
545 | | - | |
546 | | - | |
547 | | - | |
548 | | - | |
549 | | - | |
550 | | - | |
551 | | - | |
552 | | - | |
| 536 | + | |
553 | 537 | | |
554 | | - | |
| 538 | + | |
555 | 539 | | |
556 | | - | |
557 | | - | |
558 | | - | |
559 | 540 | | |
560 | 541 | | |
561 | 542 | | |
| |||
565 | 546 | | |
566 | 547 | | |
567 | 548 | | |
568 | | - | |
| 549 | + | |
569 | 550 | | |
570 | | - | |
| 551 | + | |
571 | 552 | | |
572 | 553 | | |
573 | 554 | | |
| |||
690 | 671 | | |
691 | 672 | | |
692 | 673 | | |
693 | | - | |
| 674 | + | |
| 675 | + | |
694 | 676 | | |
695 | 677 | | |
696 | 678 | | |
| |||
0 commit comments