fix(ruby): keep OTel auto-instrumentation working on Rails 7.0#643
fix(ruby): keep OTel auto-instrumentation working on Rails 7.0#643ccschmitz-launchdarkly wants to merge 12 commits into
Conversation
abelonogov-ld
left a comment
There was a problem hiding this comment.
Cool! Test app features
The Railtie's `attach_otel_log_bridge` delegated to a module method (`LaunchDarklyObservability.otel_logger_provider_available?`) defined in the `class << self` block of launchdarkly_observability.rb. That block runs *after* the `require_relative '.../rails'` near the top of the file. When the gem is required lazily after Rails has booted, the Railtie's `config.after_initialize` hook fires synchronously during the require — before the module method exists — so the bridge attach failed with: Could not attach log bridge to Rails.logger: undefined method `otel_logger_provider_available?' for module LaunchDarklyObservability Inline the availability check in the Railtie so it no longer depends on load order, and move the `rails.rb` require to the bottom of the main file (after the module body is fully defined) so future Railtie code can't reintroduce the same class of bug. Adds a regression test that removes the module method to simulate the load-order state. Co-Authored-By: Claude <noreply@anthropic.com>
…rsions The default instrumentation config passed `enable_recognize_route: true` to the Rails instrumentation and `db_statement: :include` to ActiveRecord. Neither option exists on those instrumentations (verified against the pinned gem versions: rails 0.39.1, active_record 0.11.1, rack 0.29.0), so they were no-ops that emitted a warning on every boot: Instrumentation ... ignored the following unknown configuration options [...] Remove them. Route-based span naming (http.route) is handled automatically by the ActionPack instrumentation, and SQL capture comes from the database adapter instrumentations (Mysql2, PG, ...) which already obfuscate statements by default. Co-Authored-By: Claude <noreply@anthropic.com>
The Rails e2e demo app only verified that a controller responds; nothing checked
that the OTel Rails-family instrumentations actually attached. That gap is why a
recent customer report ("Instrumentation: ... failed to install") went unnoticed
— the app initializes the plugin during Rails boot (the happy path) and never
asserted the result.
Add an integration test that asserts Rack, ActionPack, ActiveRecord,
ActiveSupport, and Rails report `installed? == true` after boot (the exact
inverse of the "failed to install" warning), and that an HTTP request produces a
server span. The plugin only configures OpenTelemetry when the LD client
registers it, which needs a non-empty SDK key, so test_helper sets a dummy key
before boot (invalid, so the client never connects).
Co-Authored-By: Claude <noreply@anthropic.com>
The OTel Rails-family instrumentations (ActionPack, ActiveRecord, ...) patch via ActiveSupport.on_load hooks that fire while Rails boots. The plugin configured OpenTelemetry from Plugin#register (at LDClient.new), so when an app creates the client lazily — e.g. from a model on first request, after Rails has booted — those hooks had already fired and every Rails instrumentation logged "Instrumentation: ... failed to install". Fix it in two parts: 1. Ship a hyphenated entry point (lib/launchdarkly-observability.rb) matching the gem name so Bundler.require auto-loads the gem — and its Railtie — during boot. Previously the gem was only loadable as 'launchdarkly_observability' (underscore), so Bundler couldn't auto-require it and users loaded it manually in an initializer, too late for the Railtie. 2. Add a Railtie initializer that installs auto-instrumentation during boot, decoupled from the LD client. project_id for the resource is resolved from LAUNCHDARKLY_SDK_KEY, which is present in the environment at boot in the common case even when the client object is built lazily. At register time the plugin then only attaches exporters to the existing provider instead of reconfiguring it (which would drop the boot-time instrumentation). It runs `after: :load_config_initializers` and no-ops if a provider is already configured, so boot-time-init apps keep their own configuration untouched. When the client is registered after boot and boot-time install did not run (e.g. no SDK key in the environment at boot), the plugin now logs a single actionable warning instead of the upstream flood of "failed to install" lines. Co-Authored-By: Claude <noreply@anthropic.com>
Add a lazy client-initialization mode to the Rails demo app (LD_LAZY_INIT=1): the initializer skips creating the client at boot, and a LazyLdClient model creates it on first use — mirroring apps that build the client after Rails has booted. A new integration test boots a separate Rails process in this mode and asserts the Rails-family instrumentations are still installed, which only holds because the Railtie installs them during boot. Verified that removing the boot-time install makes the test fail with "failed to install". Co-Authored-By: Claude <noreply@anthropic.com>
…lure (red) Reproduces the CardFlight failure: on Rails 7.0 with the current gem, opentelemetry-instrumentation-all resolves to the latest (0.94 -> instrumentation-rails 0.42), whose Rails-family members raised their floor to Rails 7.1. They fail their runtime compatible? check, log a flurry of 'failed to install' warnings, and never attach, so no autoinstrumented HTTP server span is produced. - e2e/ruby/rails/demo-rails70: copy of demo pinned to rails ~> 7.0 (Ruby 3.3.4 kept so bundler still resolves the latest instrumentation, faithfully matching a current-Ruby Rails 7.0 customer). - test/support/otlp_sink.rb: in-process OTLP/protobuf sink (pure Ruby, decodes via the exporter gems' proto classes) so the e2e test asserts telemetry over the real export path under bundle exec rake (no Docker/Node). - test/integration/otlp_export_e2e_test.rb: asserts traces (server span), a log record, and a captured exception all reach the sink. RED here. Boot logs show 8 Rails-family 'failed to install' warnings; the new test and the existing observability_instrumentation_test both fail (missing server span).
The observability plugin depended on opentelemetry-instrumentation-all, whose Rails-family members raised their floor to Rails 7.1. On Rails 7.0 every Rails-family instrumentation failed its compatible? check, logged a flurry of 'failed to install' warnings, and never attached — so apps lost all Rails auto-instrumentation (no HTTP server spans, DB spans, etc.) and got only manual instrumentation. The meta gem couples versions, so the only fix is to stop using it. - Replace opentelemetry-instrumentation-all with individual instrumentation gems (lib/launchdarkly_observability/instrumentations.rb). The Rails family is capped below each member's Rails-7.1-enforcing release (rails <0.42, action_pack <0.18, active_record/action_view <0.13, active_support <0.12, active_job <0.12, action_mailer <0.8, active_storage <0.5); everything else tracks the latest. These capped releases still work on Rails 7.1+, so modern apps are unaffected, and use_all still activates any extra instrumentation gem a consumer adds. - Replace the per-instrumentation 'failed to install' log flurry with a single actionable summary via a logger filter (InstrumentationLogFilter), naming the instrumentations that could not attach and how to resolve it. Verified on the demo-rails70 repro: 0 'failed to install', all 13 instrumentations install, and traces (server span) + a log + a captured exception are exported to the OTLP sink. 5 runs, 16 assertions, 0 failures.
…lures correctly - Move the log-suppression filter into its own file (instrumentation_log_filter.rb) with class-level capture_failures/ failure_warning helpers (keeps OpenTelemetryConfig under the class-length limit and is unit-testable). - Wrap the whole OpenTelemetry::SDK.configure call rather than just use_all: the SDK installs instrumentations AFTER the configure block returns, so the earlier placement around use_all never saw the install logging. Now the per-instrumentation 'failed to install' / 'successfully installed' chatter is actually suppressed and failures are collected for the single summary. - rails.rb: Railtie#otel_logger_provider_available? now returns an explicit boolean instead of nil when the logs constant is absent. Fixes a pre-existing order-dependent failure in rails_railtie_test (verified failing identically on the unmodified gem at the same seed). Updated e2e Gemfile.locks for the new individual-gem resolution. demo (7.2) and api-only (7.2) suites pass; demo-rails70 (7.0) passes with 0 'failed to install'.
Adds e2e-rails-legacy to ruby-plugin.yml, running e2e/ruby/rails/demo-rails70 (Rails 7.0) under bundle exec rake. The existing e2e jobs run Rails 7.2 — above the floor the OTel Rails-family instrumentations enforce — so they cannot catch an old-Rails compatibility break. This job fails if Rails-family auto-instrumentation stops attaching on Rails 7.0. Demonstrated red-then-green with the job's exact command: - pre-fix gemspec (instrumentation-all -> instrumentation-rails 0.42.0): 5 runs, 9 assertions, 4 failures (ActionPack not installed, no server span) - fixed gemspec (instrumentation-rails 0.41.0): 5 runs, 16 assertions, 0 failures
- README: replace the opentelemetry-instrumentation-all dependency note with a 'Ruby & Rails compatibility' section (Ruby >= 3.0, Rails >= 7.0 matrix, the individual-gem rationale, the single-warning behavior, and the per-gem pin escape hatch). Update the auto-instrumentation list to the curated set. - CHANGELOG: add an Unreleased Bug Fixes entry for the Rails 7.0 fix. - demo-rails70/README: explain what the repro proves and how to run it.
…ncies) rubocop's Gemspec/OrderedDependencies (run by the build CI job) requires alphabetical order within each section. No functional change.
d039e6b to
aaf9f8d
Compare
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified FilesNo covered modified files...
|
| root: <%= Rails.root.join("tmp/storage") %> | ||
|
|
||
| local: | ||
| service: Disk |
There was a problem hiding this comment.
High severity vulnerability may affect your project—review required:
Line 6 lists a dependency (activestorage) with a known High severity vulnerability.
ℹ️ Why this matters
Affected versions of activestorage are vulnerable to Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal'). Active Storage DiskService builds filesystem paths from blob keys without containing them to the storage root; a key containing traversal sequences can read, write, or delete arbitrary files if that key reaches disk storage (for example when custom keys are derived from untrusted input)
To resolve this comment:
Check if you are using ActiveStorage::Service::DiskService in your Ruby application.
- If you're affected, upgrade this dependency to at least version 7.2.3.1 at e2e/ruby/rails/demo-rails70/Gemfile.lock.
- If you're not affected, comment
/fp we don't use this [condition]
💬 Ignore this finding
To ignore this, reply with:
/fp <comment>for false positive/ar <comment>for acceptable risk/other <comment>for all other reasons
You can view more details on this finding in the Semgrep AppSec Platform here.
| @@ -0,0 +1,33 @@ | |||
| test: | |||
| service: Disk | |||
There was a problem hiding this comment.
High severity vulnerability may affect your project—review required:
Line 2 lists a dependency (activestorage) with a known High severity vulnerability.
ℹ️ Why this matters
Affected versions of activestorage are vulnerable to Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal'). Active Storage DiskService builds filesystem paths from blob keys without containing them to the storage root; a key containing traversal sequences can read, write, or delete arbitrary files if that key reaches disk storage (for example when custom keys are derived from untrusted input)
To resolve this comment:
Check if you are using ActiveStorage::Service::DiskService in your Ruby application.
- If you're affected, upgrade this dependency to at least version 7.2.3.1 at e2e/ruby/rails/demo-rails70/Gemfile.lock.
- If you're not affected, comment
/fp we don't use this [condition]
💬 Ignore this finding
To ignore this, reply with:
/fp <comment>for false positive/ar <comment>for acceptable risk/other <comment>for all other reasons
You can view more details on this finding in the Semgrep AppSec Platform here.
|
Semgrep found 1 Risk: Affected versions of vite and vite-plus are vulnerable to Exposure of Sensitive Information to an Unauthorized Actor / Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal'). Vite's server.fs.deny blocklist—which protects sensitive files such as .env and certificate files from being served—can be bypassed on Windows using alternate path representations (NTFS Alternate Data Stream syntax like Manual Review Advice: A vulnerability from this advisory is reachable if you expose the Vite dev server or vite-plus to the network by configuring a non-loopback address using the --host CLI flag on Windows Fix: Upgrade this library to at least version 6.4.3 at observability-sdk/yarn.lock:47256. Reference(s): GHSA-fx2h-pf6j-xcff |
Summary
The observability plugin depended on
opentelemetry-instrumentation-all, whose Rails-family members raised their minimum to Rails 7.1. On Rails 7.0 every Rails-family instrumentation fails its runtimecompatible?check, logs a flurry of"... failed to install"warnings, and never attaches — so the app loses all Rails auto-instrumentation (no HTTP server spans, DB spans, etc.) and gets only manual instrumentation. The meta-gem couples every instrumentation to a single version, so the fix is to stop using it.Self-contained against
main: this PR includes the boot-instrumentation foundation (previously split out as #582) plus the old-Rails compatibility fix. Supersedes #582.Changes
Boot auto-instrumentation (formerly #582)
Old-Rails compatibility
opentelemetry-instrumentation-allfor individual gems. The Rails family is capped below each member's Rails-7.1-enforcing release (rails <0.42,action_pack <0.18,action_view/active_record<0.13,active_support <0.12,active_job <0.12,action_mailer <0.8,active_storage <0.5); everything else tracks the latest. Capped releases still work on Rails 7.1+, so modern apps keep the latest non-Rails instrumentations.use_allstill activates any extra instrumentation gem a consumer adds.InstrumentationLogFilterreplaces the per-instrumentation "failed to install" flurry with one actionable summary (wraps the wholeSDK.configurecall, since the SDK installs instrumentations after the configure block returns).e2e/ruby/rails/demo-rails70(Rails 7.0) with a pure-Ruby in-process OTLP protobuf sink asserting traces (a server span), a log, and a captured exception are exported over the wire — runs underbundle exec rake, no Docker/Node.e2e-rails-legacyjob inruby-plugin.ymlas a regression guard (the other e2e apps run Rails 7.2, above the floor).Context
CardFlight (the original report) has since upgraded Rails, so this no longer blocks them — but it remains a valuable general regression guard so a future instrumentation-gem bump can't silently re-break Rails 7.0.
Testing
Red→green with the CI job's command (
bundle exec rakeindemo-rails70):5 runs, 9 assertions, 4 failures5 runs, 16 assertions, 0 failures, 0 "failed to install"demo(7.2) andapi-only(7.2) pass and still resolve the latest non-Rails instrumentations (rack 0.31, redis 0.29, pg 0.36, …). Gem unit tests + rubocop green. Rebased ontomain(0.2.2).Notes
rails_railtie_teststrict-boolean de-flake here also appears in fix(ruby): silence observability plugin warnings (log bridge, nil context id, instrumentation options) #581 — coordinate so only one lands.🤖 Generated with Claude Code