Skip to content

feat: Add OTel SDK setup#46

Open
christinaexyou wants to merge 1 commit into
trustyai-explainability:developfrom
christinaexyou:add-otel-sdk-setup
Open

feat: Add OTel SDK setup#46
christinaexyou wants to merge 1 commit into
trustyai-explainability:developfrom
christinaexyou:add-otel-sdk-setup

Conversation

@christinaexyou

@christinaexyou christinaexyou commented Apr 28, 2026

Copy link
Copy Markdown

Description

Supports OpenTelemetry SDK configuration to capture traces, metrics, and logs from the NeMo Guardrails server. Makes the following changes:

  • scripts/otel/entrypoint.py - checks that the OTel SDK is enabled before starting the NeMo server
  • scripts/otel/otel.py - handles configuration of the OTel SDK

Usage

Ensure that the OTel collector and TempoStack query frontend services are running

Enable tracing and metrics on the config.yml:

models:
...
tracing:
  enabled: true
  adapters:
    - name: OpenTelemetry
  span_format: opentelemetry
  enable_content_capture: false
metrics:
  enabled: true

Start the server via scripts/otel/otel.py:

python scripts/otel/entrypoint.py --config openshift --port 9000 --verbose

Send a request to the NeMo server:

curl -s --max-time 30 ${NEMO_URL} \
  -H "Content-Type: application/json" \
  -d '{"model":"phi3","messages":[{"role":"user","content":"My credit card is 4111-1111-1111-1111 and SSN is 123-45-6789"}]}'

To view metrics, open a new browser window and go to localhost:9000/metrics or $NEMO_URL/metrics:

# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 14085.0
python_gc_objects_collected_total{generation="1"} 3726.0
python_gc_objects_collected_total{generation="2"} 150.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 484.0
python_gc_collections_total{generation="1"} 44.0
python_gc_collections_total{generation="2"} 3.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="13",patchlevel="2",version="3.13.2"} 1.0
# HELP target_info Target metadata
# TYPE target_info gauge
target_info{service_name="nemo-guardrails",telemetry_sdk_language="python",telemetry_sdk_name="opentelemetry",telemetry_sdk_version="1.40.0"} 1.0
# HELP guardrails_requests_total Total guardrails requests handled
# TYPE guardrails_requests_total counter
guardrails_requests_total 1.0
# HELP guardrails_request_duration_seconds End-to-end guardrails request duration
# TYPE guardrails_request_duration_seconds histogram
guardrails_request_duration_seconds_bucket{le="0.005"} 0.0
guardrails_request_duration_seconds_bucket{le="0.01"} 0.0
guardrails_request_duration_seconds_bucket{le="0.025"} 0.0
guardrails_request_duration_seconds_bucket{le="0.05"} 0.0
guardrails_request_duration_seconds_bucket{le="0.075"} 0.0
guardrails_request_duration_seconds_bucket{le="0.1"} 0.0
guardrails_request_duration_seconds_bucket{le="0.25"} 0.0
guardrails_request_duration_seconds_bucket{le="0.5"} 0.0
guardrails_request_duration_seconds_bucket{le="0.75"} 0.0
guardrails_request_duration_seconds_bucket{le="1.0"} 0.0
guardrails_request_duration_seconds_bucket{le="2.5"} 0.0
guardrails_request_duration_seconds_bucket{le="5.0"} 0.0
guardrails_request_duration_seconds_bucket{le="7.5"} 0.0
guardrails_request_duration_seconds_bucket{le="10.0"} 0.0
guardrails_request_duration_seconds_bucket{le="+Inf"} 1.0
guardrails_request_duration_seconds_count 1.0
guardrails_request_duration_seconds_sum 12.145453459001146
# HELP otel_sdk_span_started_total The number of created spans.
# TYPE otel_sdk_span_started_total counter
otel_sdk_span_started_total{otel_span_parent_origin="none",otel_span_sampling_result="RECORD_AND_SAMPLE"} 1.0
otel_sdk_span_started_total{otel_span_parent_origin="local",otel_span_sampling_result="RECORD_AND_SAMPLE"} 4.0
# HELP otel_sdk_span_live The number of created spans with `recording=true` for which the end operation has not been called yet.
# TYPE otel_sdk_span_live gauge
otel_sdk_span_live{otel_span_sampling_result="RECORD_AND_SAMPLE"} 0.0

To view traces, open a new browser window and go to http://localhost:16686:
Screenshot 2026-04-28 at 1 43 50 PM

@christinaexyou christinaexyou changed the title feat: Add OTel SDK setup [WIP] feat: Add OTel SDK setup Apr 28, 2026
Comment thread scripts/otel/entrypoint.py Fixed
@christinaexyou christinaexyou changed the title [WIP] feat: Add OTel SDK setup feat: Add OTel SDK setup May 4, 2026
@codecov-commenter

codecov-commenter commented May 4, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.91%. Comparing base (e3dfba6) to head (84231d5).
⚠️ Report is 1 commits behind head on develop.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop      #46   +/-   ##
========================================
  Coverage    76.91%   76.91%           
========================================
  Files          200      200           
  Lines        20473    20473           
========================================
  Hits         15747    15747           
  Misses        4726     4726           
Flag Coverage Δ
python 76.91% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@m-misiura m-misiura left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging, it would be useful to consider the following:

  1. currently it is possible to get traces via auto-instrumentation, e.g. see here, what is the advantage of this approach instead of using opentelemetry-instrument auto-instrumentation?
  2. there seems to be regression as there is no input validation inside entrypoint.py

@christinaexyou

Copy link
Copy Markdown
Author

@m-misiura thanks for the feedback !

  1. We certainly can get traces with auto-instrumentation but the main issue is that auto-instrumentation pulls in every fastapi span. we need finer control over what traces we collect since iorails pulls in spans specific to the NeMo guardrail request/response path. if we keep auto-instrumentation, we'd record the same span twice (via iorails + auto-instrumentation) and we would need to somehow correlate the 2 together
  2. Ack, added input validation for the default_config_id

Comment thread scripts/otel/entrypoint.py Outdated

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should our main entrypoint be in the otel/ dir? This seems a little confusing for a non-otel usecase where you'd still use otel/entrypoint

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - moved the file toscripts/observability/

Comment thread Dockerfile.server Outdated
COPY examples/bots/ ./examples/bots/

RUN chmod +x ./scripts/entrypoint.sh
RUN chmod +x ./scripts/otel/entrypoint.py

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment re. moving our entrypoint to the otel/ dir

@christinaexyou christinaexyou force-pushed the add-otel-sdk-setup branch 2 times, most recently from d65ea80 to afb9c26 Compare June 3, 2026 19:48
Comment thread scripts/entrypoint.py
)
parser.add_argument(
"--host",
default=os.environ.get("HOST", "0.0.0.0"),
Comment thread poetry.lock
@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.8.3 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.8.2 and should not be changed by hand.
Comment thread poetry.lock
@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.8.3 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.8.2 and should not be changed by hand.
Comment thread poetry.lock
@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.8.3 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.8.2 and should not be changed by hand.
Comment thread poetry.lock
@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.8.3 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.8.2 and should not be changed by hand.
Comment thread poetry.lock
@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.8.3 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.8.2 and should not be changed by hand.
Comment thread poetry.lock
@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.8.3 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.8.2 and should not be changed by hand.
Comment thread poetry.lock
@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.8.3 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.8.2 and should not be changed by hand.
Comment thread poetry.lock
@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.8.3 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.8.2 and should not be changed by hand.
Comment thread poetry.lock
@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.8.3 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.8.2 and should not be changed by hand.
Comment thread poetry.lock
@@ -1,4 +1,4 @@
# This file is automatically @generated by Poetry 1.8.3 and should not be changed by hand.
# This file is automatically @generated by Poetry 1.8.2 and should not be changed by hand.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants