Skip to content

chore(rivetkit): remove legacy metrics & prometheus auth#4724

Closed
NathanFlurry wants to merge 1 commit intochore_rivekit_actionsfrom
04-23-chore_rivetkit_remove_legacy_metrics_prometheus_auth
Closed

chore(rivetkit): remove legacy metrics & prometheus auth#4724
NathanFlurry wants to merge 1 commit intochore_rivekit_actionsfrom
04-23-chore_rivetkit_remove_legacy_metrics_prometheus_auth

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 24, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

PR Review: chore(rivetkit): remove legacy metrics & prometheus auth (#4724)

Summary

This PR does three loosely related things:

  1. Renames RIVET_INSPECTOR_TOKEN to _RIVET_TEST_INSPECTOR_TOKEN to clarify it is a test-only override, with production inspector auth relying on the per-actor KV token at key [3].
  2. Renames the inspector_token field / RIVET_INSPECTOR_TOKEN env var that gated the Prometheus /metrics endpoint to metrics_token / _RIVET_METRICS_TOKEN, properly separating inspector auth from metrics auth.
  3. Adds three new unauthenticated framework HTTP routes (/, /health, /metadata) on every actor, with lightweight perf-timing tracing::debug! instrumentation added to start_actor.
  4. Removes the /inspector/metrics tests (because the metrics endpoint is now behind _RITET_METRICS_TOKEN and the old tests used a hardcoded secret that no longer matches).

Issues Found

1. Wrong domain in the root response (CLAUDE.md violation)

File: rivetkit-rust/packages/rivetkit-core/src/registry/http.rs

The response body reads "Learn more at https://rivetkit.org". The CLAUDE.md explicitly states: "ALWAYS use rivet.dev" and "ALWAYS use github.com/rivet-dev/rivet - NEVER use rivet-dev/rivetkit". The URL rivetkit.org is not the canonical docs URL — that is https://rivet.dev/docs. This should be corrected.

2. Sleep timer cancelled for unauthenticated health/metadata/root requests

The handle_fetch flow calls instance.ctx.cancel_sleep_timer() unconditionally before routing to framework_http_route. The new /health, /metadata, and / routes match framework_http_route and re-arm the timer afterward. These are probe/healthcheck endpoints that logically should not count as "activity" — an external load balancer pinging /health every 30 seconds will perpetually prevent the actor from sleeping. Consider routing these through an early-exit path before cancel_sleep_timer is called.

3. NODE_ENV environment variable used in Rust code

File: rivetkit-rust/packages/rivetkit-core/src/registry/http.rs

The metadata endpoint checks std::env::var("NODE_ENV").as_deref() == Ok("production") to determine runtime_type. NODE_ENV is a Node.js convention and is unreliable in a pure Rust actor runtime. In production deployments not using Node, this variable may not be set, causing the metadata endpoint to always report "type": "local". A Rivet-native environment variable would be more appropriate here.

4. Grammar error in updated docs

File: website/src/content/docs/actors/debugging.mdx

The text reads "Bearer the actor's inspector token in the Authorization header." — "Bearer" is misused as a verb. This should read something like: "Pass the actor's inspector token as a Bearer token in the Authorization header."

5. Tests deleted without replacement

File: rivetkit-typescript/packages/rivetkit/tests/driver/actor-inspector.test.ts

84 lines of tests covering /inspector/metrics (startup metrics, SQLite commit phase metrics) are deleted. The tests used a hardcoded Authorization: Bearer token header accepted by the old RIVET_INSPECTOR_TOKEN=token env var. These tests could be salvaged by setting _RIVET_METRICS_TOKEN=token in the test environment rather than deleted entirely. Deleting without replacement leaves the metrics endpoint without integration test coverage.

6. Perf log message format inconsistency

File: rivetkit-rust/packages/rivetkit-core/src/actor/task.rs

New tracing::debug! calls embed metric names in the message string (e.g., "perf internal: loadStateMs"). Per CLAUDE.md, log messages should be lowercase unless mentioning specific code symbols. If these metric names are meant to be machine-parsed, they should be structured fields, not embedded in the message string.

7. New HTTP routes undocumented

The PR adds three new public unauthenticated routes (/, /health, /metadata) on every actor but does not document them in the debugging docs or any other reference. Operators and developers have no documentation reference for these endpoints.


Positive Aspects

  • Separating inspector_token from metrics_token is architecturally correct -- the old code conflated two different auth mechanisms.
  • The _RIVET_TEST_INSPECTOR_TOKEN leading underscore convention clearly signals it is not a production mechanism.
  • The duration_ms_f64 helper is clean and avoids repetition.
  • timing_safe_equal correctly uses constant-time comparison.
  • The new method_not_allowed_response is consistent with the existing error response pattern.
  • Removing the getRivetkitInspectorToken TS export is correct cleanup.

Summary Table

Issue Severity Location
rivetkit.org URL (wrong domain per CLAUDE.md) Medium registry/http.rs
Health/metadata routes cancel sleep timer unnecessarily Medium registry/http.rs
NODE_ENV used in Rust as runtime type signal Low registry/http.rs
Grammar: "Bearer the actor's inspector token" Low debugging.mdx
84 lines of metrics tests deleted without replacement Medium actor-inspector.test.ts
Perf log messages embed metric keys in message string Low task.rs
New routes undocumented in endpoint reference Low debugging.mdx

@github-actions
Copy link
Copy Markdown
Contributor

Preview packages published to npm

Install with:

npm install rivetkit@pr-4724

All packages published as 0.0.0-pr.4724.ecbb955 with tag pr-4724.

Engine binary is shipped via @rivetkit/engine-cli on linux-x64-musl, linux-arm64-musl, darwin-x64, and darwin-arm64. Windows users should use the release installer or set RIVET_ENGINE_BINARY.

Docker images:

docker pull rivetdev/engine:slim-ecbb955
docker pull rivetdev/engine:full-ecbb955
Individual packages
npm install rivetkit@pr-4724
npm install @rivetkit/react@pr-4724
npm install @rivetkit/rivetkit-napi@pr-4724
npm install @rivetkit/workflow-engine@pr-4724

@NathanFlurry NathanFlurry force-pushed the chore_rivekit_actions branch from df59472 to 8f5aeb9 Compare April 24, 2026 09:52
@NathanFlurry NathanFlurry force-pushed the 04-23-chore_rivetkit_remove_legacy_metrics_prometheus_auth branch from 5366cce to 3a2efdf Compare April 24, 2026 09:52
@NathanFlurry NathanFlurry mentioned this pull request Apr 24, 2026
11 tasks
@NathanFlurry NathanFlurry force-pushed the chore_rivekit_actions branch from 8f5aeb9 to f8e2ea7 Compare April 24, 2026 10:19
@NathanFlurry NathanFlurry force-pushed the 04-23-chore_rivetkit_remove_legacy_metrics_prometheus_auth branch from 3a2efdf to d3104cb Compare April 24, 2026 10:19
@NathanFlurry NathanFlurry force-pushed the 04-23-chore_rivetkit_remove_legacy_metrics_prometheus_auth branch from d3104cb to c0106f8 Compare April 24, 2026 10:32
@NathanFlurry NathanFlurry force-pushed the chore_rivekit_actions branch from f8e2ea7 to bb871d1 Compare April 24, 2026 10:32
@NathanFlurry NathanFlurry force-pushed the 04-23-chore_rivetkit_remove_legacy_metrics_prometheus_auth branch from c0106f8 to 441610e Compare April 24, 2026 11:48
@NathanFlurry NathanFlurry force-pushed the chore_rivekit_actions branch 2 times, most recently from 338f37e to a1adf67 Compare April 24, 2026 12:14
@NathanFlurry NathanFlurry force-pushed the 04-23-chore_rivetkit_remove_legacy_metrics_prometheus_auth branch from 441610e to 30cba06 Compare April 24, 2026 12:14
@NathanFlurry NathanFlurry force-pushed the 04-23-chore_rivetkit_remove_legacy_metrics_prometheus_auth branch from 30cba06 to 605e002 Compare April 24, 2026 12:32
@NathanFlurry
Copy link
Copy Markdown
Member Author

Landed in main via stack-merge fast-forward push. Commits are in main; closing to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant