Add Vertex AI provider support for Claude models#152
Open
fcorrea wants to merge 1 commit into
Open
Conversation
Route Claude requests through Google Cloud Vertex AI when ANTHROPIC_VERTEX_PROJECT_ID and CLOUD_ML_REGION (or ANTHROPIC_VERTEX_REGION) are set. Authentication via Google Application Default Credentials: - GCE/Cloud Run metadata server (automatic on GCP) - authorized_user credentials (gcloud auth application-default login) - service_account credentials (JSON key file) Service account JWT signing uses ring (already a transitive dep via rustls) - pure Rust, no subprocess or openssl dependency required. Vertex-specific API differences handled: - model stripped from request body (encoded in the URL instead) - anthropic_version sent in body as 'vertex-2023-10-16' (not a header) - anthropic-beta header omitted; prompt caching works natively via cache_control blocks in the request body
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Vertex AI provider support for Claude models
Summary
Add support for routing Claude requests through Google Cloud Vertex AI when
ANTHROPIC_VERTEX_PROJECT_IDandCLOUD_ML_REGION(orANTHROPIC_VERTEX_REGION) are setIntroduce
AuthModeenum (ApiKey|OAuth|Vertex { url }) inanthropic.rsto cleanly separate the three authentication paths without branching on booleansImplement Google Application Default Credentials (ADC) token fetching with in-memory caching, supporting:
authorized_usercredentials (fromgcloud auth application-default login)service_accountcredentials (from a downloaded service account JSON key)Service account JWT signing uses
ring(already a transitive dependency viarustls) ��� pure Rust, no subprocess oropenssldependency requiredBuild the correct Vertex AI streaming endpoint:
{region}-aiplatform.googleapis.com/v1/projects/{project}/locations/{region}/publishers/anthropic/models/{model}:streamRawPredictHandle Vertex-specific API differences:
modelis stripped from the request body (it's in the URL)anthropic_version: "vertex-2023-10-16"is sent in the body instead of as a headeranthropic-betaheader is sent (Vertex rejects unknown beta headers; prompt caching works natively viacache_controlblocks in the request body)Add
build_anthropic_vertex_route()so the model picker shows Vertex AI as a distinct providerInitialize
AnthropicProviderwhen Vertex env vars are present, even without Claude OAuth credentials (startup.rs)Update the model picker UI to show "Vertex AI" as the provider when Vertex env vars are set (
inline_interactive.rs)Edge cases and tradeoffs
Token refresh: Google ADC tokens are cached in-memory with a 60-second safety margin before expiry. On GCE this hits the metadata server; elsewhere it refreshes via the OAuth2 token endpoint or a signed JWT for service accounts.
globalregion: WhenCLOUD_ML_REGION=global, the endpoint useshttps://aiplatform.googleapis.com(no subdomain), matching Anthropic's SDK behavior. Multi-region identifiersusandeuare also accepted by Vertex.Prompt caching: Fully supported on Vertex —
cache_controlblocks flow through in the request body unchanged. Theanthropic-beta: prompt-caching-2024-07-31header is intentionally omitted because Vertex rejects unrecognized beta headers.Service account key format:
ringaccepts PKCS#8 DER (the format Google generates for service account JSON keys) with a PKCS#1 DER fallback. The PEM headers are stripped and the body base64-decoded before passing toring.Code size:
anthropic.rsgrew significantly. The ADC/JWT machinery could be extracted to a separate module in a follow-up.Validation
Tested locally with
ANTHROPIC_VERTEX_PROJECT_ID,CLOUD_ML_REGION, andauthorized_userADC credentials. Full multi-turn conversations with tool use and prompt caching worked correctly through Vertex — this PR was written while jcode was actively running through Vertex AI.cargo fmt --all -- --check: passescargo check -p jcode: passescargo clippyon changed files: no new warningsPanic budget: no new
unwrap/expectin our changesPre-existing budget failures (
jcode-import-core,jcode-tui-mermaid, unrelated files) are present on master before this changeEnvironment variables
ANTHROPIC_VERTEX_PROJECT_IDCLOUD_ML_REGIONANTHROPIC_VERTEX_REGION)us-east5,global,us,euGOOGLE_APPLICATION_CREDENTIALS~/.config/gcloud/application_default_credentials.jsonNeed help on this PR? Tag
@codesmithwith what you need.