Skip to content

Credential-resolution failures are logged as errors (and reported to Sentry) when most aren't faults #4814

@stuartc

Description

@stuartc

Summary

Credential-resolution failures are logged with Logger.error from two sites —
Lightning.Credentials.Resolver and LightningWeb.Channels.RunChannel
(handle_credential_error/5). Because Sentry's LoggerHandler runs with
capture_log_messages: true and level: :error
(lib/lightning/application.ex:21), each one becomes a Sentry error event.

Most of these aren't faults. They're user-actionable (re-authorise a credential,
fix a project environment) or transient (OAuth provider 429/503). The driving
example is OAuth refresh token expiry (run_channel.ex:543), which accounts
for a decent share of our Sentry events. When it fires there's nothing for us to
fix in code: the user's token has expired and they need to re-authorise. The
detail is already captured in the audit log, so reporting it to Sentry as an error
mostly just adds noise.

This fits the wider effort to cut Sentry noise, so it's worth reining in this
logging as a group rather than just the one OAuth line.

Two things going on

1. LevelLogger.error is being used for expected/user-actionable
conditions. Sentry's handler only captures :error, so dropping to :info/
:warning stops the Sentry report while keeping the line visible for logs-only
operators (Lightning is self-hosted in environments without Sentry; the plain-text
formatter at config/config.exs:158 already includes :run_id).

2. Some conditions are logged twice — three of the five are logged once in the
resolver and again in the channel after the error propagates up (the resolver's
bare-atom errors are wrapped into 2-tuples at resolver.ex:85,88 to match the
channel clauses). So fixing only the channel would leave the resolver firing.

Condition Resolver RunChannel Net Suggested level
environment_not_configured resolver.ex:150 run_channel.ex:489 2× error downgrade or keep — log once
project_not_found resolver.ex:154 run_channel.ex:506 2× error keep error — log once
environment_mismatch resolver.ex:108 run_channel.ex:522 2× error downgrade — log once
reauthorization_required run_channel.ex:543 1× error info (or drop)
temporary_failure run_channel.ex:560 1× error downgrade (transient)

Suggested direction

For each condition, settle on a single log site (resolver or channel) and a
level along the lines of the table. A bit of reasoning behind the suggestions:

  • reauthorization_requiredinfo rather than warning. The token's
    standing with the IdP is out of our hands — it's user/credential state, not a
    system condition, and there's nothing an operator acts on here. It's already
    captured in the audit log (Audit.oauth_token_refresh_failed_event,
    credentials.ex:1340, written before the error returns), so not logging it at
    all is a reasonable option too; info just keeps it cheap and visible for
    logs-only operators.
  • temporary_failure → transient (429/503), retry-able.
  • environment_mismatch → user-actionable config issue.
  • project_not_found → probably worth keeping at :error — a run pointing at
    a missing project looks like a genuine invariant violation rather than something
    a user can fix — but still de-duplicate it to one site.

Might also be worth a short contributor note (CLAUDE.md or .claude/guidelines/):
info/warning for user-actionable/transient events, error for genuine faults,
and log a given condition at one site only.

Related (separate)

Sentry events for these conditions also carry inconsistent metadata (run_id
reaches Sentry via a duplicate Sentry.Context.set_extra_context call alongside
Logger.metadata). That's a separate follow-up — unifying metadata propagation
and adding :tags_from_metadata — tracked apart from this work.

Note

A related condition exists worker-side, tracked separately — RE: OpenFn/kit#1429.

Metadata

Metadata

Assignees

No one assigned

    Labels

    MonitoringSentrycredentialsRelating to credentials, how they are created and used in relation to JobselixirPull requests that update Elixir codemaintenancecode maintenance

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    In review

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions