-
Notifications
You must be signed in to change notification settings - Fork 9
PAYMENTS-11567 Resque latency metrics #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,65 @@ | ||||||
| # frozen_string_literal: true | ||||||
|
|
||||||
| # Copyright (c) 2019-present, BigCommerce Pty. Ltd. All rights reserved | ||||||
| # | ||||||
| # Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | ||||||
| # documentation files (the "Software"), to deal in the Software without restriction, including without limitation the | ||||||
| # rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit | ||||||
| # persons to whom the Software is furnished to do so, subject to the following conditions: | ||||||
| # | ||||||
| # The above copyright notice and this permission notice shall be included in all copies or substantial portions of the | ||||||
| # Software. | ||||||
| # | ||||||
| # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE | ||||||
| # WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR | ||||||
| # COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR | ||||||
| # OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | ||||||
| # | ||||||
| require 'time' | ||||||
|
|
||||||
| module Bigcommerce | ||||||
| module Prometheus | ||||||
| module Integrations | ||||||
| class Resque | ||||||
| ## | ||||||
| # Payload fields for an ActiveJob-shaped Resque job, read from the | ||||||
| # inner hash at `args[0]`. ActiveJob's JobWrapper stamps the three | ||||||
| # fields the per-job metrics consume: | ||||||
| # | ||||||
| # * job_class — the user's actual job class name; used as the | ||||||
| # metric label. | ||||||
| # * enqueued_at — ISO 8601 string; queue-latency anchor when | ||||||
| # scheduled_at is absent. | ||||||
| # * scheduled_at — ISO 8601 string; preferred over enqueued_at | ||||||
| # when present (e.g. retries-with-backoff, so the | ||||||
| # intentional wait isn't counted as latency). | ||||||
| class ActiveJobPayload | ||||||
| # @return [String] the user's actual job class name | ||||||
| attr_reader :job_class | ||||||
|
|
||||||
| # @return [Time, nil] the queue-latency anchor; nil when both | ||||||
| # timestamps are absent or unparseable | ||||||
| attr_reader :anchor_time | ||||||
|
|
||||||
| # @param [Hash] inner the ActiveJob-shaped hash at `args[0]`; | ||||||
| # JobPayload.for guarantees a truthy 'job_class' | ||||||
| def initialize(inner) | ||||||
| @job_class = inner['job_class'] | ||||||
| @anchor_time = parse_time(inner['scheduled_at']) || parse_time(inner['enqueued_at']) | ||||||
| end | ||||||
|
|
||||||
| private | ||||||
|
|
||||||
| def parse_time(value) | ||||||
| return value if value.is_a?(Time) | ||||||
| return nil if value.nil? || value.to_s.empty? | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
🍺 not strictly necessary as |
||||||
|
|
||||||
| Time.iso8601(value.to_s) | ||||||
| rescue ArgumentError | ||||||
| nil | ||||||
| end | ||||||
| end | ||||||
| end | ||||||
| end | ||||||
| end | ||||||
| end | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,157 @@ | ||
| # frozen_string_literal: true | ||
|
|
||
| # Copyright (c) 2019-present, BigCommerce Pty. Ltd. All rights reserved | ||
| # | ||
| # Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | ||
| # documentation files (the "Software"), to deal in the Software without restriction, including without limitation the | ||
| # rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit | ||
| # persons to whom the Software is furnished to do so, subject to the following conditions: | ||
| # | ||
| # The above copyright notice and this permission notice shall be included in all copies or substantial portions of the | ||
| # Software. | ||
| # | ||
| # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE | ||
| # WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR | ||
| # COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR | ||
| # OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | ||
| # | ||
| module Bigcommerce | ||
| module Prometheus | ||
| module Integrations | ||
| class Resque | ||
| ## | ||
| # Per-Resque-job histogram metrics, recorded from the parent worker process. | ||
| # Hooked via a prepend around Resque::Worker#perform_with_fork. | ||
| # Queue latency is captured before super, perform duration after. | ||
| # | ||
| # Off unless PROMETHEUS_RESQUE_PER_JOB_METRICS_ENABLED=1 | ||
| # Emits one histogram observation per job per worker process, which can be high cardinality at scale. | ||
| # | ||
| # NOTE: queue_latency is supported for jobs enqueued via ActiveJob | ||
| # The gem reads three fields from | ||
| # `payload['args'][0]` (which must be a Hash): | ||
| # | ||
| # * job_class — the user's actual job class name; used as the | ||
| # metric label. | ||
| # * enqueued_at — ISO 8601 string; used as the queue-latency | ||
| # anchor when scheduled_at is absent. | ||
| # * scheduled_at — ISO 8601 string; preferred over enqueued_at | ||
| # when present (e.g. retries-with-backoff, so | ||
| # the intentional wait isn't counted as latency). | ||
| # | ||
| # ActiveJob produces this shape natively — the payload is wrapped by | ||
| # ActiveJob::QueueAdapters::ResqueAdapter::JobWrapper, which stamps | ||
| # the three fields above into `args[0]`. | ||
| # | ||
| # Vanilla Resque jobs enqueued via Resque.enqueue carry no enqueue timestamps. | ||
| # class MyJob | ||
| # @queue = :foo; | ||
| # def self.perform; | ||
| # end | ||
| # Their args are raw primitive values, not a wrapping hash. | ||
| # For these jobs, queue_latency silently no-ops. | ||
| # perform_duration works for both styles regardless. | ||
| # | ||
| # Payloads that replicate the three fields above are read the same way. | ||
| # Detection is by shape, not by wrapper class name. | ||
| # This means a vanilla job can opt in to queue_latency either by | ||
| # - converting to ActiveJob | ||
| # - enqueueing through a small wrapper class that stamps these fields into args[0]. | ||
| # | ||
| module JobMetrics | ||
| class << self | ||
| ## | ||
| # Install the parent-side hooks if the per-job metrics feature is enabled. | ||
| # Idempotent: safe to call multiple times. | ||
| # | ||
| # @param [PrometheusExporter::Client] client | ||
| # | ||
| def start(client:) | ||
| return unless ::Bigcommerce::Prometheus.resque_per_job_metrics_enabled | ||
|
|
||
| @client = client | ||
| install_hooks | ||
| end | ||
|
|
||
| ## | ||
| # Push the queue-latency observation for a job that's about to be picked up by a worker. | ||
| # Anchors on scheduled_at if present so retries-with-backoff don't show the intentional wait as latency. | ||
| # Falls back to enqueued_at if scheduled_at isn't present. | ||
| # | ||
| # @param [ActiveJobPayload, VanillaResquePayload] payload | ||
| # | ||
| def record_queue_latency(payload) | ||
| anchor = payload.anchor_time | ||
| return unless anchor | ||
|
|
||
| # Clock skew between the enqueuer/scheduler and the worker can put the anchor in the future. | ||
| # Clamp to zero so the histogram never records a negative latency. | ||
| latency = (Time.now - anchor).to_f.clamp(0.0..) | ||
|
|
||
| @client.send_json( | ||
| type: 'resque_job', | ||
| metric: 'queue_latency', | ||
| value: latency, | ||
| custom_labels: { job_class: payload.job_class } | ||
| ) | ||
| rescue StandardError => e | ||
| ::Bigcommerce::Prometheus.logger&.warn( | ||
| "[bigcommerce-prometheus] resque_job queue_latency push failed: #{e.message}" | ||
| ) | ||
| end | ||
|
|
||
| ## | ||
| # Push the perform-duration observation for a completed job. | ||
| # Called from the `Resque::Worker#perform_with_fork` prepend, so it measures the full child lifetime: | ||
| # fork + reconnect + perform + exit | ||
| # | ||
| # The duration is computed here, not at the call site: the caller invokes this from an | ||
| # ensure block, which must never raise over an exception already propagating. Keeping | ||
| # the arithmetic inside this rescue absorbs every recording failure — including a nil | ||
| # started_at when a catastrophic error fired before timing began. | ||
| # | ||
| # @param [ActiveJobPayload, VanillaResquePayload] payload | ||
| # @param [Float] started_at monotonic timestamp taken just before the fork | ||
| # | ||
| def record_perform_duration(payload, started_at) | ||
| @client.send_json( | ||
| type: 'resque_job', | ||
| metric: 'perform_duration', | ||
| value: Process.clock_gettime(Process::CLOCK_MONOTONIC) - started_at, | ||
| custom_labels: { job_class: payload.job_class } | ||
| ) | ||
| rescue StandardError => e | ||
| ::Bigcommerce::Prometheus.logger&.warn( | ||
| "[bigcommerce-prometheus] resque_job perform_duration push failed: #{e.message}" | ||
| ) | ||
| end | ||
|
|
||
| private | ||
|
|
||
| def install_hooks | ||
| return if @hooks_installed | ||
|
|
||
| ::Resque::Worker.prepend(WorkerInstrumentation) | ||
| @hooks_installed = true | ||
| end | ||
| end | ||
|
|
||
| ## | ||
| # Prepended onto Resque::Worker to capture for every job that goes through perform_with_fork: | ||
| # - queue latency: before super | ||
| # - perform duration: after super | ||
| module WorkerInstrumentation | ||
| def perform_with_fork(job, &block) | ||
| payload = JobPayload.for(job) | ||
| JobMetrics.record_queue_latency(payload) | ||
| started_at = Process.clock_gettime(Process::CLOCK_MONOTONIC) | ||
| super | ||
| ensure | ||
| JobMetrics.record_perform_duration(payload, started_at) | ||
| end | ||
|
cursor[bot] marked this conversation as resolved.
|
||
| end | ||
| end | ||
| end | ||
| end | ||
| end | ||
| end | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| # frozen_string_literal: true | ||
|
|
||
| # Copyright (c) 2019-present, BigCommerce Pty. Ltd. All rights reserved | ||
| # | ||
| # Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated | ||
| # documentation files (the "Software"), to deal in the Software without restriction, including without limitation the | ||
| # rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit | ||
| # persons to whom the Software is furnished to do so, subject to the following conditions: | ||
| # | ||
| # The above copyright notice and this permission notice shall be included in all copies or substantial portions of the | ||
| # Software. | ||
| # | ||
| # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE | ||
| # WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR | ||
| # COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR | ||
| # OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | ||
| # | ||
| module Bigcommerce | ||
| module Prometheus | ||
| module Integrations | ||
| class Resque | ||
| ## | ||
| # Classifies a Resque::Job's payload and builds the matching | ||
| # shape-specific payload object for per-job metrics. | ||
| # | ||
| # A payload is ActiveJob-shaped when `args[0]` is a Hash carrying a | ||
| # truthy 'job_class' — the shape | ||
| # ActiveJob::QueueAdapters::ResqueAdapter::JobWrapper produces | ||
| # natively. Detection is by shape rather than by wrapper class name: | ||
| # the fields are ActiveJob's stable serialization format (persisted | ||
| # payloads must survive Rails upgrades), while the wrapper's class | ||
| # name is a private Rails constant — matching on it would silently | ||
| # kill the metric if Rails ever moved it. Payloads that replicate | ||
| # these fields are read the same way, by mechanism. Everything | ||
| # else — vanilla Resque jobs with primitive args, nil or non-Hash | ||
| # payloads, `args` not being an Array — is treated as vanilla. | ||
| # | ||
| # Both payload classes expose the same interface: #job_class | ||
| # (String) and #anchor_time (Time or nil). | ||
| # | ||
| module JobPayload | ||
| class << self | ||
| ## | ||
| # Never raises: instrumentation must not break job execution, so a payload object is always returned. | ||
| # Unexpected failures degrade to a vanilla payload labelled 'unknown'. | ||
| # | ||
| # @param [Resque::Job] resque_job | ||
| # @return [ActiveJobPayload, VanillaResquePayload] | ||
| # | ||
| def for(resque_job) | ||
| payload = resque_job.payload | ||
| payload = {} unless payload.is_a?(Hash) | ||
|
|
||
| inner = activejob_inner(payload) | ||
| inner ? ActiveJobPayload.new(inner) : VanillaResquePayload.new(payload) | ||
| rescue StandardError | ||
| VanillaResquePayload.new({}) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should be logged here or we're hiding it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Payload parse errors not loggedMedium Severity
Reviewed by Cursor Bugbot for commit 1d3dbc7. Configure here. |
||
| end | ||
|
|
||
| private | ||
|
|
||
| def activejob_inner(payload) | ||
| args = payload['args'] | ||
| first = args.is_a?(Array) ? args.first : nil | ||
| return nil unless first.is_a?(Hash) && first['job_class'] | ||
|
|
||
| first | ||
| end | ||
| end | ||
| end | ||
| end | ||
| end | ||
| end | ||
| end | ||


Uh oh!
There was an error while loading. Please reload this page.