Skip to content

Commit b05567b

Browse files
authored
Merge pull request lightspeed-core#1582 from major/feat/sentry-support
RSPEED-2928: Add optional Sentry error tracking integration
2 parents c229e51 + 2bf81c8 commit b05567b

8 files changed

Lines changed: 534 additions & 16 deletions

File tree

docs/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ See the full documentation at [`../README.md`](../README.md) or browse sub-pages
4747

4848
[Providers](https://lightspeed-core.github.io/lightspeed-stack/providers.html)
4949

50+
[Sentry error tracking](https://lightspeed-core.github.io/lightspeed-stack/sentry.html)
51+
5052
[User data collection](https://lightspeed-core.github.io/lightspeed-stack/user_data_collection.html)
5153

5254
[Database structure](https://lightspeed-core.github.io/lightspeed-stack/DB/index.html)

docs/sentry.md

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
# Sentry Error Tracking Integration
2+
3+
Sentry integration is **completely optional** and **disabled by default**. Most deployments will not use it. The service starts and runs normally with no Sentry configuration present.
4+
5+
When enabled, Sentry captures unhandled exceptions and performance traces from the FastAPI application and sends them to your Sentry project for monitoring and alerting.
6+
7+
## Overview
8+
9+
Enabling Sentry gives you:
10+
11+
- **Error tracking** - unhandled exceptions are captured and reported with full stack traces
12+
- **Performance tracing** - a sample of POST requests are traced end-to-end
13+
- **Release tagging** - errors are tagged with the running service version for easier triage
14+
15+
The integration is initialized at service startup and flushes any pending events on shutdown (2-second timeout).
16+
17+
## Configuration
18+
19+
Sentry is configured entirely through environment variables. No changes to `lightspeed-stack.yaml` are required.
20+
21+
### Environment Variables
22+
23+
| Variable | Required | Default | Description |
24+
|----------|----------|---------|-------------|
25+
| `SENTRY_DSN` | Yes (to enable) | - | Data Source Name from your Sentry project. Setting this variable enables Sentry. The DSN value is never written to logs to prevent credential exposure. |
26+
| `SENTRY_ENVIRONMENT` | No | `development` | Environment tag attached to all events. Set this explicitly in production deployments. Use values like `production`, `stage`, or `dev` to distinguish clusters or deployment stages. |
27+
| `SENTRY_CA_CERTS` | No | - | Path to a CA certificate bundle file. Only needed when your Sentry instance uses a private or internal CA. If the file is missing at startup, the SDK proceeds without custom certificates and logs a warning. |
28+
29+
### Enabling Sentry
30+
31+
Set `SENTRY_DSN` to the DSN string from your Sentry project settings:
32+
33+
```bash
34+
export SENTRY_DSN="https://examplePublicKey@o0.ingest.sentry.io/0"
35+
```
36+
37+
The service logs `Sentry initialized` on startup when the DSN is present, or `Sentry DSN not configured, skipping initialization` when it is not.
38+
39+
## Behavior Details
40+
41+
### FastAPI Integration
42+
43+
The Sentry FastAPI integration is configured to capture only `POST` requests. `GET` requests (health checks, model listings, etc.) are not traced.
44+
45+
### Trace Sampling
46+
47+
Of the captured POST requests, the following routes are always excluded from tracing regardless of the sample rate:
48+
49+
- `/readiness`
50+
- `/liveness`
51+
- `/metrics`
52+
- `/` (root)
53+
54+
The remaining eligible requests are sampled at 25% for performance tracing. This keeps trace volume low and avoids noise from health check and metrics scrape traffic.
55+
56+
### Privacy
57+
58+
`send_default_pii` is set to `False`. Sentry will not attach user IP addresses, HTTP headers, or other personally identifiable information to events.
59+
60+
### Release Tagging
61+
62+
Every event is tagged with the running service version in the format `lightspeed-stack@{version}` (for example, `lightspeed-stack@0.5.0`). This makes it straightforward to correlate errors with specific releases in the Sentry UI.
63+
64+
### Shutdown Behavior
65+
66+
When the service shuts down, it flushes any buffered Sentry events before exiting. The flush has a 2-second timeout to avoid delaying shutdown.
67+
68+
## OpenShift Deployment
69+
70+
### Setting the DSN via a Secret
71+
72+
Store the Sentry DSN in an OpenShift Secret rather than hardcoding it in a Deployment manifest:
73+
74+
```bash
75+
oc create secret generic sentry-credentials \
76+
--from-literal=dsn="https://examplePublicKey@o0.ingest.sentry.io/0"
77+
```
78+
79+
Reference the Secret in your Deployment or Pod spec:
80+
81+
```yaml
82+
env:
83+
- name: SENTRY_DSN
84+
valueFrom:
85+
secretKeyRef:
86+
name: sentry-credentials
87+
key: dsn
88+
```
89+
90+
### Setting the Environment Tag
91+
92+
Use `SENTRY_ENVIRONMENT` to label events by cluster or deployment stage. This makes it easy to filter events in the Sentry UI:
93+
94+
```yaml
95+
env:
96+
- name: SENTRY_DSN
97+
valueFrom:
98+
secretKeyRef:
99+
name: sentry-credentials
100+
key: dsn
101+
- name: SENTRY_ENVIRONMENT
102+
value: "production"
103+
```
104+
105+
Set this to `stage`, `dev`, or any label that matches your deployment topology.
106+
107+
### Private CA Certificates (Enterprise Sentry Instances)
108+
109+
Most deployers will not need this. If your Sentry instance is hosted internally and uses a certificate signed by a private CA, the SDK will fail to connect without the CA bundle.
110+
111+
Mount the CA bundle into the pod using a ConfigMap or Secret, then point `SENTRY_CA_CERTS` at the mount path.
112+
113+
**Using a ConfigMap:**
114+
115+
```bash
116+
oc create configmap sentry-ca --from-file=ca-bundle.crt=/path/to/your/ca-bundle.crt
117+
```
118+
119+
```yaml
120+
volumes:
121+
- name: sentry-ca
122+
configMap:
123+
name: sentry-ca
124+
125+
containers:
126+
- name: lightspeed-stack
127+
volumeMounts:
128+
- name: sentry-ca
129+
mountPath: /etc/sentry-ca
130+
readOnly: true
131+
env:
132+
- name: SENTRY_DSN
133+
valueFrom:
134+
secretKeyRef:
135+
name: sentry-credentials
136+
key: dsn
137+
- name: SENTRY_CA_CERTS
138+
value: "/etc/sentry-ca/ca-bundle.crt"
139+
```
140+
141+
If the file is not present at the path specified by `SENTRY_CA_CERTS`, the service logs a warning and continues without custom CA certificates. It will not fail to start.
142+
143+
## Troubleshooting
144+
145+
### Events Not Appearing in Sentry
146+
147+
1. Confirm `SENTRY_DSN` is set and the value is correct. Check the service logs for `Sentry initialized` at startup.
148+
2. Verify network connectivity from the pod to the Sentry ingest endpoint. DNS resolution failures and firewall rules are common causes.
149+
3. If using a private Sentry instance, confirm `SENTRY_CA_CERTS` points to a valid CA bundle and the file is readable by the service process.
150+
4. Check that the DSN belongs to the correct Sentry project and organization.
151+
152+
### Warning: CA Cert File Not Found
153+
154+
```text
155+
CA cert file specified by SENTRY_CA_CERTS not found at /etc/sentry-ca/ca-bundle.crt; proceeding without custom CA certs
156+
```
157+
158+
The path set in `SENTRY_CA_CERTS` does not exist. Verify the ConfigMap or Secret is mounted correctly and the mount path matches the environment variable value.
159+
160+
### Events Appear in Wrong Environment
161+
162+
Check the value of `SENTRY_ENVIRONMENT`. If it is not set, events are tagged as `development` by default. Set the variable explicitly to match your deployment stage.

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,8 @@ dependencies = [
7373
# To be able to fix multiple CVEs, also LCORE-1117
7474
"requests>=2.33.0",
7575
"datasets>=4.7.0",
76+
# Used for error tracking and monitoring
77+
"sentry-sdk[fastapi]>=2.58.0",
7678
]
7779

7880

src/app/main.py

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from collections.abc import AsyncIterator
55
from contextlib import asynccontextmanager
66

7+
import sentry_sdk # pyright: ignore[reportMissingImports]
78
from fastapi import FastAPI, HTTPException
89
from fastapi.middleware.cors import CORSMiddleware
910
from fastapi.responses import JSONResponse
@@ -22,6 +23,7 @@
2223
from configuration import configuration
2324
from log import get_logger
2425
from models.responses import InternalServerErrorResponse
26+
from sentry import initialize_sentry
2527
from utils.common import register_mcp_servers_async
2628
from utils.llama_stack_version import check_llama_stack_version
2729

@@ -44,6 +46,8 @@ async def lifespan(_app: FastAPI) -> AsyncIterator[None]:
4446
"""
4547
configuration.load_configuration(os.environ["LIGHTSPEED_STACK_CONFIG_PATH"])
4648

49+
initialize_sentry()
50+
4751
azure_config = configuration.configuration.azure_entra_id
4852
if azure_config is not None:
4953
AzureEntraIDManager().set_config(azure_config)
@@ -81,8 +85,13 @@ async def lifespan(_app: FastAPI) -> AsyncIterator[None]:
8185
yield
8286

8387
# Cleanup resources on shutdown
84-
await shutdown_background_topic_summary_tasks()
85-
await A2AStorageFactory.cleanup()
88+
try:
89+
await shutdown_background_topic_summary_tasks()
90+
await A2AStorageFactory.cleanup()
91+
finally:
92+
# Flush pending Sentry events after cleanup so any errors during
93+
# shutdown are captured before the process exits.
94+
sentry_sdk.flush(timeout=2)
8695
logger.info("App shutdown complete")
8796

8897

src/constants.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,3 +249,27 @@
249249
RLSAPI_V1_QUESTION_MAX_LENGTH: Final[int] = 32_768
250250
# Maximum character length for the serialized /v1/responses request body (64 KiB)
251251
RESPONSES_REQUEST_MAX_SIZE: Final[int] = 65_536
252+
253+
# Sentry configuration constants
254+
# Environment variable name for the Sentry DSN (Data Source Name)
255+
SENTRY_DSN_ENV_VAR: Final[str] = "SENTRY_DSN"
256+
# Environment variable name for the Sentry environment tag
257+
SENTRY_ENVIRONMENT_ENV_VAR: Final[str] = "SENTRY_ENVIRONMENT"
258+
# Default Sentry environment when SENTRY_ENVIRONMENT is not set
259+
SENTRY_DEFAULT_ENVIRONMENT: Final[str] = "development"
260+
# Default trace sample rate (fraction of transactions to capture)
261+
SENTRY_DEFAULT_TRACES_SAMPLE_RATE: Final[float] = 0.25
262+
# Routes excluded from Sentry trace sampling (health checks, metrics, root).
263+
# Note: health and metrics routers are mounted WITHOUT a /v1 prefix
264+
# (see the setup_routers function in src/app/routers.py), so ASGI paths are
265+
# /readiness, /liveness, /metrics.
266+
SENTRY_EXCLUDED_ROUTES: Final[tuple[str, ...]] = (
267+
"/readiness",
268+
"/liveness",
269+
"/metrics",
270+
"/",
271+
)
272+
# Environment variable name for the Sentry CA certificate bundle path.
273+
# Set this to a file path (e.g. /etc/pki/tls/certs/ca-bundle.crt) when
274+
# connecting to a Sentry instance that uses a private or internal CA.
275+
SENTRY_CA_CERTS_ENV_VAR: Final[str] = "SENTRY_CA_CERTS"

src/sentry.py

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
"""Sentry error tracking initialization and configuration."""
2+
3+
import os
4+
5+
import sentry_sdk # pyright: ignore[reportMissingImports]
6+
from sentry_sdk.integrations.fastapi import ( # pyright: ignore[reportMissingImports]
7+
FastApiIntegration,
8+
)
9+
10+
import version
11+
from constants import (
12+
SENTRY_CA_CERTS_ENV_VAR,
13+
SENTRY_DEFAULT_ENVIRONMENT,
14+
SENTRY_DEFAULT_TRACES_SAMPLE_RATE,
15+
SENTRY_DSN_ENV_VAR,
16+
SENTRY_ENVIRONMENT_ENV_VAR,
17+
SENTRY_EXCLUDED_ROUTES,
18+
)
19+
from log import get_logger
20+
21+
logger = get_logger(__name__)
22+
23+
24+
def sentry_traces_sampler(tracing_context: dict) -> float:
25+
"""
26+
Determine the trace sample rate for a given request.
27+
28+
Excludes health check, metrics, and root routes from trace sampling to
29+
reduce noise. All other routes use the default sample rate.
30+
31+
Parameters:
32+
----------
33+
tracing_context (dict): The Sentry tracing context containing ASGI
34+
scope information, including the request path.
35+
36+
Returns:
37+
-------
38+
float: 0.0 for excluded routes (no sampling), or
39+
SENTRY_DEFAULT_TRACES_SAMPLE_RATE for all other routes.
40+
"""
41+
asgi_scope = tracing_context.get("asgi_scope", {})
42+
path = asgi_scope.get("path") if isinstance(asgi_scope, dict) else None
43+
44+
if path is not None:
45+
if path == "/":
46+
return 0.0
47+
if any(
48+
route != "/" and path.endswith(route) for route in SENTRY_EXCLUDED_ROUTES
49+
):
50+
return 0.0
51+
52+
return SENTRY_DEFAULT_TRACES_SAMPLE_RATE
53+
54+
55+
def initialize_sentry() -> None:
56+
"""
57+
Initialize Sentry error tracking if a DSN is configured.
58+
59+
Reads the SENTRY_DSN environment variable. If not set or empty, logs an
60+
informational message and returns without initializing Sentry. When a DSN
61+
is present, initializes the Sentry SDK with custom trace sampling, FastAPI
62+
integration, and optional CA certificate configuration.
63+
64+
When SENTRY_CA_CERTS is set to a file path, that certificate bundle is
65+
passed to the SDK for Sentry instances using private or internal CAs.
66+
67+
The DSN value is never logged to prevent accidental credential exposure.
68+
69+
Parameters:
70+
----------
71+
None
72+
73+
Returns:
74+
-------
75+
None
76+
"""
77+
dsn = os.environ.get(SENTRY_DSN_ENV_VAR)
78+
79+
if not dsn:
80+
logger.info("Sentry DSN not configured, skipping initialization")
81+
return
82+
83+
ca_certs = None
84+
ca_certs_path = os.environ.get(SENTRY_CA_CERTS_ENV_VAR)
85+
if ca_certs_path:
86+
if os.path.exists(ca_certs_path):
87+
ca_certs = ca_certs_path
88+
else:
89+
logger.warning(
90+
"CA cert file specified by %s not found at %s; "
91+
"proceeding without custom CA certs",
92+
SENTRY_CA_CERTS_ENV_VAR,
93+
ca_certs_path,
94+
)
95+
96+
environment = os.environ.get(SENTRY_ENVIRONMENT_ENV_VAR, SENTRY_DEFAULT_ENVIRONMENT)
97+
98+
try:
99+
sentry_sdk.init(
100+
dsn=dsn,
101+
environment=environment,
102+
traces_sampler=sentry_traces_sampler,
103+
send_default_pii=False,
104+
ca_certs=ca_certs,
105+
integrations=[FastApiIntegration(http_methods_to_capture=("POST",))],
106+
release=f"lightspeed-stack@{version.__version__}",
107+
)
108+
logger.info("Sentry initialized")
109+
except Exception: # pylint: disable=broad-exception-caught
110+
logger.exception(
111+
"Failed to initialize Sentry, continuing without error tracking"
112+
)

0 commit comments

Comments
 (0)