Skip to content

telemetry: fail closed when remote config can't be fetched or parsed#72

Merged
noahho merged 3 commits intomainfrom
telemetry/fail-closed-default
May 3, 2026
Merged

telemetry: fail closed when remote config can't be fetched or parsed#72
noahho merged 3 commits intomainfrom
telemetry/fail-closed-default

Conversation

@noahho
Copy link
Copy Markdown
Collaborator

@noahho noahho commented May 3, 2026

Summary

  • Default to {\"enabled\": False} (instead of {\"enabled\": True}) when download_config() cannot fetch or parse the remote telemetry configuration.
  • Adds JSON-parse failure as a third explicit fail-closed path alongside the existing network-exception and non-200 paths.
  • Adds tests for all four paths (200 happy path, network exception, non-200, malformed JSON) plus the env-var URL override.

Why

Previously, a client that couldn't reach https://storage.googleapis.com/prior-labs-tabpfn-public/config/telemetry.json (firewall, air-gapped environment, transient GCS outage, DNS failure) would silently fall back to telemetry-enabled. The operator had no way to even reach the public config to opt out, but events would still be queued.

Fail-closed is the standard pattern for SDK telemetry that depends on a remote enable/disable signal. It respects users on restrictive networks and ensures events are only emitted on an explicit, server-confirmed enable.

Behaviour

Scenario Before After
Config endpoint reachable, valid JSON Follow server config Follow server config (unchanged)
Config endpoint unreachable (timeout, DNS, firewall) Fall back to enabled: True Fall back to enabled: False
Config endpoint returns non-200 Fall back to enabled: True Fall back to enabled: False
Config endpoint returns unparseable body Raises ValueError Fall back to enabled: False

No change to the cache TTL or the env-var override.

Test plan

  • New tests/telemetry/core/test_config.py covers happy path, network exception, timeout, four non-200 status codes, malformed JSON, and the TABPFN_TELEMETRY_CONFIG_URL override.
  • Full pytest run passes (163 tests).

🤖 Generated with Claude Code

download_config previously defaulted to {"enabled": True} when the
remote config endpoint was unreachable, returned a non-200, or returned
an unparseable body. That meant a client behind a restrictive network
(firewall, air-gapped environment, transient outage) would silently
fall back to enabling telemetry, even when the operator had no way to
reach the public configuration to opt out.

Flip the default to {"enabled": False} for all three failure modes:

  - requests.get raises (timeout, DNS, ConnectionError, etc.)
  - response status != 200
  - response body is not valid JSON

Behaviour when the endpoint is reachable and the JSON parses is
unchanged: telemetry follows the server-side configuration as before.

Adds tests covering each failure mode plus the env-var URL override
and the happy path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request transitions the telemetry configuration to a fail-closed approach, ensuring telemetry is disabled by default if the remote configuration cannot be retrieved or parsed. It also adds a new test suite to verify this behavior across various failure conditions. Review feedback recommends more robust validation of the JSON response to prevent potential crashes and suggests using immutable types for cached configurations to avoid unintended side effects.

Comment thread src/tabpfn_common_utils/telemetry/core/config.py
Comment thread src/tabpfn_common_utils/telemetry/core/config.py
noahho and others added 2 commits May 3, 2026 20:17
The @ttl_cache wrapper exposes __wrapped__ at runtime via functools.wraps,
but the type stubs for FunctionType don't include it. Add narrow
type-ignore comments on the two test-fixture lines that need it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
resp.json() can return any JSON value (list, string, number, null) and
even a dict response may not contain the "enabled" key. With the
fail-closed pattern in place, a malformed-shape response should fall
back to disabled rather than letting the downstream
ProductTelemetry.telemetry_enabled call raise TypeError or KeyError on
the host process.

Validate that the parsed JSON is a dict containing "enabled" before
returning it; otherwise return the fail-closed default.

Adds tests covering list, string, int, null, dict-without-enabled, and
empty-dict shapes.

Addresses the high-severity review comment on PR #72 from
gemini-code-assist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@noahho noahho merged commit 59c0afe into main May 3, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant