telemetry: fail closed when remote config can't be fetched or parsed#72
Merged
telemetry: fail closed when remote config can't be fetched or parsed#72
Conversation
download_config previously defaulted to {"enabled": True} when the
remote config endpoint was unreachable, returned a non-200, or returned
an unparseable body. That meant a client behind a restrictive network
(firewall, air-gapped environment, transient outage) would silently
fall back to enabling telemetry, even when the operator had no way to
reach the public configuration to opt out.
Flip the default to {"enabled": False} for all three failure modes:
- requests.get raises (timeout, DNS, ConnectionError, etc.)
- response status != 200
- response body is not valid JSON
Behaviour when the endpoint is reachable and the JSON parses is
unchanged: telemetry follows the server-side configuration as before.
Adds tests covering each failure mode plus the env-var URL override
and the happy path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Contributor
There was a problem hiding this comment.
Code Review
This pull request transitions the telemetry configuration to a fail-closed approach, ensuring telemetry is disabled by default if the remote configuration cannot be retrieved or parsed. It also adds a new test suite to verify this behavior across various failure conditions. Review feedback recommends more robust validation of the JSON response to prevent potential crashes and suggests using immutable types for cached configurations to avoid unintended side effects.
The @ttl_cache wrapper exposes __wrapped__ at runtime via functools.wraps, but the type stubs for FunctionType don't include it. Add narrow type-ignore comments on the two test-fixture lines that need it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resp.json() can return any JSON value (list, string, number, null) and even a dict response may not contain the "enabled" key. With the fail-closed pattern in place, a malformed-shape response should fall back to disabled rather than letting the downstream ProductTelemetry.telemetry_enabled call raise TypeError or KeyError on the host process. Validate that the parsed JSON is a dict containing "enabled" before returning it; otherwise return the fail-closed default. Adds tests covering list, string, int, null, dict-without-enabled, and empty-dict shapes. Addresses the high-severity review comment on PR #72 from gemini-code-assist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
{\"enabled\": False}(instead of{\"enabled\": True}) whendownload_config()cannot fetch or parse the remote telemetry configuration.Why
Previously, a client that couldn't reach
https://storage.googleapis.com/prior-labs-tabpfn-public/config/telemetry.json(firewall, air-gapped environment, transient GCS outage, DNS failure) would silently fall back to telemetry-enabled. The operator had no way to even reach the public config to opt out, but events would still be queued.Fail-closed is the standard pattern for SDK telemetry that depends on a remote enable/disable signal. It respects users on restrictive networks and ensures events are only emitted on an explicit, server-confirmed enable.
Behaviour
enabled: Trueenabled: Falseenabled: Trueenabled: FalseValueErrorenabled: FalseNo change to the cache TTL or the env-var override.
Test plan
tests/telemetry/core/test_config.pycovers happy path, network exception, timeout, four non-200 status codes, malformed JSON, and theTABPFN_TELEMETRY_CONFIG_URLoverride.pytestrun passes (163 tests).🤖 Generated with Claude Code