Skip to content

feat: Add OTel SDK setup#46

Open
christinaexyou wants to merge 1 commit into
trustyai-explainability:developfrom
christinaexyou:add-otel-sdk-setup
Open

feat: Add OTel SDK setup#46
christinaexyou wants to merge 1 commit into
trustyai-explainability:developfrom
christinaexyou:add-otel-sdk-setup

Conversation

@christinaexyou
Copy link
Copy Markdown

@christinaexyou christinaexyou commented Apr 28, 2026

Description

Supports OpenTelemetry SDK configuration to capture traces, metrics, and logs from the NeMo Guardrails server. Makes the following changes:

  • scripts/otel/entrypoint.py - checks that the OTel SDK is enabled before starting the NeMo server
  • scripts/otel/otel.py - handles configuration of the OTel SDK

Usage

Ensure that the OTel collector and TempoStack query frontend services are running

Enable tracing and metrics on the config.yml:

models:
...
tracing:
  enabled: true
  adapters:
    - name: OpenTelemetry
  span_format: opentelemetry
  enable_content_capture: false
metrics:
  enabled: true

Start the server via scripts/otel/otel.py:

python scripts/otel/entrypoint.py --config openshift --port 9000 --verbose

Send a request to the NeMo server:

curl -s --max-time 30 ${NEMO_URL} \
  -H "Content-Type: application/json" \
  -d '{"model":"phi3","messages":[{"role":"user","content":"My credit card is 4111-1111-1111-1111 and SSN is 123-45-6789"}]}'

To view metrics, open a new browser window and go to localhost:9000/metrics or $NEMO_URL/metrics:

# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 14085.0
python_gc_objects_collected_total{generation="1"} 3726.0
python_gc_objects_collected_total{generation="2"} 150.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 484.0
python_gc_collections_total{generation="1"} 44.0
python_gc_collections_total{generation="2"} 3.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="13",patchlevel="2",version="3.13.2"} 1.0
# HELP target_info Target metadata
# TYPE target_info gauge
target_info{service_name="nemo-guardrails",telemetry_sdk_language="python",telemetry_sdk_name="opentelemetry",telemetry_sdk_version="1.40.0"} 1.0
# HELP guardrails_requests_total Total guardrails requests handled
# TYPE guardrails_requests_total counter
guardrails_requests_total 1.0
# HELP guardrails_request_duration_seconds End-to-end guardrails request duration
# TYPE guardrails_request_duration_seconds histogram
guardrails_request_duration_seconds_bucket{le="0.005"} 0.0
guardrails_request_duration_seconds_bucket{le="0.01"} 0.0
guardrails_request_duration_seconds_bucket{le="0.025"} 0.0
guardrails_request_duration_seconds_bucket{le="0.05"} 0.0
guardrails_request_duration_seconds_bucket{le="0.075"} 0.0
guardrails_request_duration_seconds_bucket{le="0.1"} 0.0
guardrails_request_duration_seconds_bucket{le="0.25"} 0.0
guardrails_request_duration_seconds_bucket{le="0.5"} 0.0
guardrails_request_duration_seconds_bucket{le="0.75"} 0.0
guardrails_request_duration_seconds_bucket{le="1.0"} 0.0
guardrails_request_duration_seconds_bucket{le="2.5"} 0.0
guardrails_request_duration_seconds_bucket{le="5.0"} 0.0
guardrails_request_duration_seconds_bucket{le="7.5"} 0.0
guardrails_request_duration_seconds_bucket{le="10.0"} 0.0
guardrails_request_duration_seconds_bucket{le="+Inf"} 1.0
guardrails_request_duration_seconds_count 1.0
guardrails_request_duration_seconds_sum 12.145453459001146
# HELP otel_sdk_span_started_total The number of created spans.
# TYPE otel_sdk_span_started_total counter
otel_sdk_span_started_total{otel_span_parent_origin="none",otel_span_sampling_result="RECORD_AND_SAMPLE"} 1.0
otel_sdk_span_started_total{otel_span_parent_origin="local",otel_span_sampling_result="RECORD_AND_SAMPLE"} 4.0
# HELP otel_sdk_span_live The number of created spans with `recording=true` for which the end operation has not been called yet.
# TYPE otel_sdk_span_live gauge
otel_sdk_span_live{otel_span_sampling_result="RECORD_AND_SAMPLE"} 0.0

To view traces, open a new browser window and go to http://localhost:16686:
Screenshot 2026-04-28 at 1 43 50 PM

@christinaexyou christinaexyou changed the title feat: Add OTel SDK setup [WIP] feat: Add OTel SDK setup Apr 28, 2026
app.mount("/metrics/", metrics_app)

log_level = "debug" if args.verbose else "info"
uvicorn.run(app, host="0.0.0.0", port=args.port, log_level=log_level)
@christinaexyou christinaexyou changed the title [WIP] feat: Add OTel SDK setup feat: Add OTel SDK setup May 4, 2026
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@03cd8ba). Learn more about missing BASE report.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop      #46   +/-   ##
==========================================
  Coverage           ?   76.91%           
==========================================
  Files              ?      200           
  Lines              ?    20473           
  Branches           ?        0           
==========================================
  Hits               ?    15747           
  Misses             ?     4726           
  Partials           ?        0           
Flag Coverage Δ
python 76.91% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Collaborator

@m-misiura m-misiura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging, it would be useful to consider the following:

  1. currently it is possible to get traces via auto-instrumentation, e.g. see here, what is the advantage of this approach instead of using opentelemetry-instrument auto-instrumentation?
  2. there seems to be regression as there is no input validation inside entrypoint.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants