Skip to content

fix(ruby): keep OTel auto-instrumentation working on Rails 7.0#643

Draft
ccschmitz-launchdarkly wants to merge 6 commits into
feat/ruby-rails-boot-instrumentationfrom
feat/ruby-old-rails-compat
Draft

fix(ruby): keep OTel auto-instrumentation working on Rails 7.0#643
ccschmitz-launchdarkly wants to merge 6 commits into
feat/ruby-rails-boot-instrumentationfrom
feat/ruby-old-rails-compat

Conversation

@ccschmitz-launchdarkly

Copy link
Copy Markdown
Contributor

Summary

The observability plugin depended on opentelemetry-instrumentation-all, whose Rails-family members raised their minimum to Rails 7.1. On Rails 7.0 every Rails-family instrumentation fails its runtime compatible? check, logs a flurry of "... failed to install" warnings, and never attaches — so the app loses all Rails auto-instrumentation (no HTTP server spans, DB spans, etc.) and gets only manual instrumentation. The meta-gem couples every instrumentation to a single version, so the fix is to stop using it.

Changes

  • Drop opentelemetry-instrumentation-all for individual gems (instrumentations.rb). The Rails family is capped below each member's Rails-7.1-enforcing release (rails <0.42, action_pack <0.18, action_view/active_record <0.13, active_support <0.12, active_job <0.12, action_mailer <0.8, active_storage <0.5); everything else tracks the latest. These capped releases still work on Rails 7.1+, so modern apps are unaffected and keep the latest non-Rails instrumentations. use_all still activates any extra instrumentation gem a consumer adds.
  • InstrumentationLogFilter replaces the per-instrumentation "failed to install" flurry with one actionable summary. It wraps the whole SDK.configure call, since the SDK installs instrumentations after the configure block returns.
  • Repro app e2e/ruby/rails/demo-rails70 (Rails 7.0) with a pure-Ruby in-process OTLP protobuf sink that asserts traces (a server span), a log, and a captured exception are exported over the wire — no Docker/Node, runs under bundle exec rake. (The JS mock-otel-server is JSON-only and gated on highlight.session_id, so it can't verify Ruby's protobuf export.)
  • CI: e2e-rails-legacy job in ruby-plugin.yml — a regression guard, since the other e2e apps run Rails 7.2 (above the floor) and can't catch this class of break.
  • README compatibility section + CHANGELOG.

⚠️ Stacked on #582

Built on top of #582 (boot auto-instrumentation) — base branch is feat/ruby-rails-boot-instrumentation, so the diff here is just the 6 old-Rails commits. It references install_instrumentation_only from #582 and should merge after it; GitHub will retarget to main once #582 lands.

Context

CardFlight (the original report) has since upgraded Rails, so this no longer blocks them — but it remains a valuable general regression guard so a future instrumentation-gem bump can't silently re-break Rails 7.0.

Testing

Red→green with the CI job's exact command (bundle exec rake in demo-rails70):

  • pre-fix (instrumentation-all → instrumentation-rails 0.42): 5 runs, 9 assertions, 4 failures (ActionPack not installed, no server span)
  • fixed: 5 runs, 16 assertions, 0 failures, 0 "failed to install"

Modern apps verified: demo (7.2) and api-only (7.2) pass and still resolve the latest non-Rails instrumentations (rack 0.31, redis 0.29, pg 0.36, …). Gem unit tests + rubocop green.

🤖 Generated with Claude Code

…lure (red)

Reproduces the CardFlight failure: on Rails 7.0 with the current gem,
opentelemetry-instrumentation-all resolves to the latest (0.94 ->
instrumentation-rails 0.42), whose Rails-family members raised their floor to
Rails 7.1. They fail their runtime compatible? check, log a flurry of
'failed to install' warnings, and never attach, so no autoinstrumented HTTP
server span is produced.

- e2e/ruby/rails/demo-rails70: copy of demo pinned to rails ~> 7.0 (Ruby 3.3.4
  kept so bundler still resolves the latest instrumentation, faithfully matching
  a current-Ruby Rails 7.0 customer).
- test/support/otlp_sink.rb: in-process OTLP/protobuf sink (pure Ruby, decodes
  via the exporter gems' proto classes) so the e2e test asserts telemetry over
  the real export path under bundle exec rake (no Docker/Node).
- test/integration/otlp_export_e2e_test.rb: asserts traces (server span), a log
  record, and a captured exception all reach the sink. RED here.

Boot logs show 8 Rails-family 'failed to install' warnings; the new test and the
existing observability_instrumentation_test both fail (missing server span).
The observability plugin depended on opentelemetry-instrumentation-all, whose
Rails-family members raised their floor to Rails 7.1. On Rails 7.0 every
Rails-family instrumentation failed its compatible? check, logged a flurry of
'failed to install' warnings, and never attached — so apps lost all Rails
auto-instrumentation (no HTTP server spans, DB spans, etc.) and got only manual
instrumentation. The meta gem couples versions, so the only fix is to stop using
it.

- Replace opentelemetry-instrumentation-all with individual instrumentation gems
  (lib/launchdarkly_observability/instrumentations.rb). The Rails family is
  capped below each member's Rails-7.1-enforcing release (rails <0.42,
  action_pack <0.18, active_record/action_view <0.13, active_support <0.12,
  active_job <0.12, action_mailer <0.8, active_storage <0.5); everything else
  tracks the latest. These capped releases still work on Rails 7.1+, so modern
  apps are unaffected, and use_all still activates any extra instrumentation gem
  a consumer adds.
- Replace the per-instrumentation 'failed to install' log flurry with a single
  actionable summary via a logger filter (InstrumentationLogFilter), naming the
  instrumentations that could not attach and how to resolve it.

Verified on the demo-rails70 repro: 0 'failed to install', all 13
instrumentations install, and traces (server span) + a log + a captured
exception are exported to the OTLP sink. 5 runs, 16 assertions, 0 failures.
…lures correctly

- Move the log-suppression filter into its own file
  (instrumentation_log_filter.rb) with class-level capture_failures/
  failure_warning helpers (keeps OpenTelemetryConfig under the class-length
  limit and is unit-testable).
- Wrap the whole OpenTelemetry::SDK.configure call rather than just use_all:
  the SDK installs instrumentations AFTER the configure block returns, so the
  earlier placement around use_all never saw the install logging. Now the
  per-instrumentation 'failed to install' / 'successfully installed' chatter is
  actually suppressed and failures are collected for the single summary.
- rails.rb: Railtie#otel_logger_provider_available? now returns an explicit
  boolean instead of nil when the logs constant is absent. Fixes a pre-existing
  order-dependent failure in rails_railtie_test (verified failing identically on
  the unmodified gem at the same seed).

Updated e2e Gemfile.locks for the new individual-gem resolution. demo (7.2) and
api-only (7.2) suites pass; demo-rails70 (7.0) passes with 0 'failed to install'.
Adds e2e-rails-legacy to ruby-plugin.yml, running e2e/ruby/rails/demo-rails70
(Rails 7.0) under bundle exec rake. The existing e2e jobs run Rails 7.2 — above
the floor the OTel Rails-family instrumentations enforce — so they cannot catch
an old-Rails compatibility break. This job fails if Rails-family
auto-instrumentation stops attaching on Rails 7.0.

Demonstrated red-then-green with the job's exact command:
- pre-fix gemspec (instrumentation-all -> instrumentation-rails 0.42.0):
  5 runs, 9 assertions, 4 failures (ActionPack not installed, no server span)
- fixed gemspec (instrumentation-rails 0.41.0): 5 runs, 16 assertions, 0 failures
- README: replace the opentelemetry-instrumentation-all dependency note with a
  'Ruby & Rails compatibility' section (Ruby >= 3.0, Rails >= 7.0 matrix, the
  individual-gem rationale, the single-warning behavior, and the per-gem pin
  escape hatch). Update the auto-instrumentation list to the curated set.
- CHANGELOG: add an Unreleased Bug Fixes entry for the Rails 7.0 fix.
- demo-rails70/README: explain what the repro proves and how to run it.
…ncies)

rubocop's Gemspec/OrderedDependencies (run by the build CI job) requires
alphabetical order within each section. No functional change.

@abelonogov-ld abelonogov-ld left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Test app features

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants