Skip to content

Commit 40b19fa

Browse files
committed
RSPEED-2326: feat(rlsapi): integrate Splunk telemetry into v1 /infer endpoint
- Add _get_rh_identity_context() to extract org_id/system_id from request.state - Add _queue_splunk_event() to build and queue telemetry events via BackgroundTasks - Add timing measurement around inference calls - Queue infer_with_llm events on success, infer_error on failure - Add unit tests for RH Identity context extraction and Splunk integration - Update integration tests for new endpoint signature - Add user-facing docs (docs/splunk.md) and developer docs (src/observability/README.md) Signed-off-by: Major Hayden <major@redhat.com>
1 parent 53db9c1 commit 40b19fa

5 files changed

Lines changed: 648 additions & 22 deletions

File tree

docs/splunk.md

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# Splunk HEC Integration
2+
3+
Lightspeed Core Stack can send inference telemetry events to Splunk via the HTTP Event Collector (HEC) protocol for monitoring and analytics.
4+
5+
## Overview
6+
7+
When enabled, the service sends telemetry events for:
8+
9+
- **Successful inference requests** (`infer_with_llm` sourcetype)
10+
- **Failed inference requests** (`infer_error` sourcetype)
11+
12+
Events are sent asynchronously in the background and never block or affect the main request flow.
13+
14+
## Configuration
15+
16+
Add the `splunk` section to your `lightspeed-stack.yaml`:
17+
18+
```yaml
19+
splunk:
20+
enabled: true
21+
url: "https://splunk.corp.example.com:8088/services/collector"
22+
token_path: "/var/secrets/splunk-hec-token"
23+
index: "rhel_lightspeed"
24+
source: "lightspeed-stack"
25+
timeout: 5
26+
verify_ssl: true
27+
28+
deployment_environment: "production"
29+
```
30+
31+
### Configuration Options
32+
33+
| Field | Type | Required | Default | Description |
34+
|-------|------|----------|---------|-------------|
35+
| `enabled` | bool | No | `false` | Enable/disable Splunk integration |
36+
| `url` | string | Yes* | - | Splunk HEC endpoint URL |
37+
| `token_path` | string | Yes* | - | Path to file containing HEC token |
38+
| `index` | string | Yes* | - | Target Splunk index |
39+
| `source` | string | No | `lightspeed-stack` | Event source identifier |
40+
| `timeout` | int | No | `5` | HTTP timeout in seconds |
41+
| `verify_ssl` | bool | No | `true` | Verify SSL certificates |
42+
43+
*Required when `enabled: true`
44+
45+
### Token File
46+
47+
Store your HEC token in a file (not directly in the config):
48+
49+
```bash
50+
echo "your-hec-token-here" > /var/secrets/splunk-hec-token
51+
chmod 600 /var/secrets/splunk-hec-token
52+
```
53+
54+
The token is read from file on each request, supporting rotation without service restart.
55+
56+
## Event Format
57+
58+
Events follow the rlsapi telemetry format for consistency with existing analytics.
59+
60+
### HEC Envelope
61+
62+
```json
63+
{
64+
"time": 1737470400,
65+
"host": "pod-lcs-abc123",
66+
"source": "lightspeed-stack (v1.0.0)",
67+
"sourcetype": "infer_with_llm",
68+
"index": "rhel_lightspeed",
69+
"event": { ... }
70+
}
71+
```
72+
73+
### Event Payload
74+
75+
```json
76+
{
77+
"question": "How do I configure SSH?",
78+
"refined_questions": [],
79+
"context": "",
80+
"response": "To configure SSH, edit /etc/ssh/sshd_config...",
81+
"inference_time": 2.34,
82+
"model": "granite-3-8b-instruct",
83+
"deployment": "production",
84+
"org_id": "12345678",
85+
"system_id": "abc-def-123",
86+
"total_llm_tokens": 0,
87+
"request_id": "req_xyz789",
88+
"cla_version": "CLA/0.4.0",
89+
"system_os": "RHEL",
90+
"system_version": "9.3",
91+
"system_arch": "x86_64"
92+
}
93+
```
94+
95+
### Field Descriptions
96+
97+
| Field | Description |
98+
|-------|-------------|
99+
| `question` | User's original question |
100+
| `refined_questions` | Reserved for RAG (empty array) |
101+
| `context` | Reserved for RAG (empty string) |
102+
| `response` | LLM-generated response text |
103+
| `inference_time` | Time in seconds for LLM inference |
104+
| `model` | Model identifier from configuration |
105+
| `deployment` | Value of `deployment_environment` config |
106+
| `org_id` | Organization ID from RH Identity, or `auth_disabled` |
107+
| `system_id` | System CN from RH Identity, or `auth_disabled` |
108+
| `total_llm_tokens` | Reserved for token counting (currently `0`) |
109+
| `request_id` | Unique request identifier |
110+
| `cla_version` | Client User-Agent header |
111+
| `system_os` | Client operating system |
112+
| `system_version` | Client OS version |
113+
| `system_arch` | Client CPU architecture |
114+
115+
## Endpoints
116+
117+
Currently, Splunk telemetry is enabled for:
118+
119+
| Endpoint | Sourcetype (Success) | Sourcetype (Error) |
120+
|----------|---------------------|-------------------|
121+
| `/rlsapi/v1/infer` | `infer_with_llm` | `infer_error` |
122+
123+
## Graceful Degradation
124+
125+
The Splunk client is designed for resilience:
126+
127+
- **Disabled by default**: No impact when not configured
128+
- **Non-blocking**: Events sent via FastAPI BackgroundTasks
129+
- **Fail-safe**: HTTP errors logged as warnings, never raise exceptions
130+
- **Missing config**: Silently skips when required fields are missing
131+
132+
## Troubleshooting
133+
134+
### Events Not Appearing in Splunk
135+
136+
1. Verify `splunk.enabled: true` in config
137+
2. Check token file exists and is readable
138+
3. Verify HEC endpoint URL is correct
139+
4. Check service logs for warning messages:
140+
```
141+
Splunk HEC request failed with status 403: Invalid token
142+
```
143+
144+
### Connection Timeouts
145+
146+
Increase the timeout value:
147+
148+
```yaml
149+
splunk:
150+
timeout: 10
151+
```
152+
153+
### SSL Certificate Errors
154+
155+
For development/testing with self-signed certs:
156+
157+
```yaml
158+
splunk:
159+
verify_ssl: false
160+
```
161+
162+
**Warning**: Do not disable SSL verification in production.
163+
164+
## Extending to Other Endpoints
165+
166+
See [src/observability/README.md](../src/observability/README.md) for developer documentation on adding Splunk telemetry to additional endpoints.

src/app/endpoints/rlsapi_v1.py

Lines changed: 118 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,18 @@
55
"""
66

77
import logging
8+
import time
89
from typing import Annotated, Any, cast
910

10-
from fastapi import APIRouter, Depends, HTTPException
11+
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException, Request
1112
from llama_stack_api.openai_responses import OpenAIResponseObject
1213
from llama_stack_client import APIConnectionError, APIStatusError, RateLimitError
1314

1415
import constants
1516
import metrics
1617
from authentication import get_auth_dependency
1718
from authentication.interface import AuthTuple
19+
from authentication.rh_identity import RHIdentityData
1820
from authorization.middleware import authorize
1921
from client import AsyncLlamaStackClientHolder
2022
from configuration import configuration
@@ -29,12 +31,41 @@
2931
)
3032
from models.rlsapi.requests import RlsapiV1InferRequest, RlsapiV1SystemInfo
3133
from models.rlsapi.responses import RlsapiV1InferData, RlsapiV1InferResponse
34+
from observability import InferenceEventData, build_inference_event, send_splunk_event
3235
from utils.responses import extract_text_from_response_output_item
3336
from utils.suid import get_suid
3437

3538
logger = logging.getLogger(__name__)
3639
router = APIRouter(tags=["rlsapi-v1"])
3740

41+
# Default values when RH Identity auth is not configured
42+
AUTH_DISABLED = "auth_disabled"
43+
44+
45+
def _get_rh_identity_context(request: Request) -> tuple[str, str]:
46+
"""Extract org_id and system_id from RH Identity request state.
47+
48+
When RH Identity authentication is configured, the auth dependency stores
49+
the RHIdentityData object in request.state.rh_identity_data. This function
50+
extracts the org_id and system_id for telemetry purposes.
51+
52+
Args:
53+
request: The FastAPI request object.
54+
55+
Returns:
56+
Tuple of (org_id, system_id). Returns ("auth_disabled", "auth_disabled")
57+
when RH Identity auth is not configured or data is unavailable.
58+
"""
59+
rh_identity: RHIdentityData | None = getattr(
60+
request.state, "rh_identity_data", None
61+
)
62+
if rh_identity is None:
63+
return AUTH_DISABLED, AUTH_DISABLED
64+
65+
org_id = rh_identity.get_org_id() or AUTH_DISABLED
66+
system_id = rh_identity.get_user_id() or AUTH_DISABLED
67+
return org_id, system_id
68+
3869

3970
infer_responses: dict[int | str, dict[str, Any]] = {
4071
200: RlsapiV1InferResponse.openapi_response(),
@@ -148,10 +179,52 @@ async def retrieve_simple_response(question: str, instructions: str) -> str:
148179
)
149180

150181

182+
def _get_cla_version(request: Request) -> str:
183+
"""Extract CLA version from User-Agent header."""
184+
return request.headers.get("User-Agent", "")
185+
186+
187+
def _queue_splunk_event( # pylint: disable=too-many-arguments,too-many-positional-arguments
188+
background_tasks: BackgroundTasks,
189+
infer_request: RlsapiV1InferRequest,
190+
request: Request,
191+
request_id: str,
192+
response_text: str,
193+
inference_time: float,
194+
sourcetype: str,
195+
) -> None:
196+
"""Build and queue a Splunk telemetry event for background sending."""
197+
org_id, system_id = _get_rh_identity_context(request)
198+
systeminfo = infer_request.context.systeminfo
199+
200+
event_data = InferenceEventData(
201+
question=infer_request.question,
202+
response=response_text,
203+
inference_time=inference_time,
204+
model=(
205+
(configuration.inference.default_model or "")
206+
if configuration.inference
207+
else ""
208+
),
209+
org_id=org_id,
210+
system_id=system_id,
211+
request_id=request_id,
212+
cla_version=_get_cla_version(request),
213+
system_os=systeminfo.os,
214+
system_version=systeminfo.version,
215+
system_arch=systeminfo.arch,
216+
)
217+
218+
event = build_inference_event(event_data)
219+
background_tasks.add_task(send_splunk_event, event, sourcetype)
220+
221+
151222
@router.post("/infer", responses=infer_responses)
152223
@authorize(Action.RLSAPI_V1_INFER)
153224
async def infer_endpoint(
154225
infer_request: RlsapiV1InferRequest,
226+
request: Request,
227+
background_tasks: BackgroundTasks,
155228
auth: Annotated[AuthTuple, Depends(get_auth_dependency())],
156229
) -> RlsapiV1InferResponse:
157230
"""Handle rlsapi v1 /infer requests for stateless inference.
@@ -163,6 +236,8 @@ async def infer_endpoint(
163236
164237
Args:
165238
infer_request: The inference request containing question and context.
239+
request: The FastAPI request object for accessing headers and state.
240+
background_tasks: FastAPI background tasks for async Splunk event sending.
166241
auth: Authentication tuple from the configured auth provider.
167242
168243
Returns:
@@ -174,7 +249,6 @@ async def infer_endpoint(
174249
# Authentication enforced by get_auth_dependency(), authorization by @authorize decorator.
175250
_ = auth
176251

177-
# Generate unique request ID
178252
request_id = get_suid()
179253

180254
logger.info("Processing rlsapi v1 /infer request %s", request_id)
@@ -185,35 +259,77 @@ async def infer_endpoint(
185259
"Request %s: Combined input source length: %d", request_id, len(input_source)
186260
)
187261

262+
start_time = time.monotonic()
188263
try:
189264
response_text = await retrieve_simple_response(input_source, instructions)
265+
inference_time = time.monotonic() - start_time
190266
except APIConnectionError as e:
267+
inference_time = time.monotonic() - start_time
191268
metrics.llm_calls_failures_total.inc()
192269
logger.error(
193270
"Unable to connect to Llama Stack for request %s: %s", request_id, e
194271
)
272+
_queue_splunk_event(
273+
background_tasks,
274+
infer_request,
275+
request,
276+
request_id,
277+
str(e),
278+
inference_time,
279+
"infer_error",
280+
)
195281
response = ServiceUnavailableResponse(
196282
backend_name="Llama Stack",
197283
cause=str(e),
198284
)
199285
raise HTTPException(**response.model_dump()) from e
200286
except RateLimitError as e:
287+
inference_time = time.monotonic() - start_time
201288
metrics.llm_calls_failures_total.inc()
202289
logger.error("Rate limit exceeded for request %s: %s", request_id, e)
290+
_queue_splunk_event(
291+
background_tasks,
292+
infer_request,
293+
request,
294+
request_id,
295+
str(e),
296+
inference_time,
297+
"infer_error",
298+
)
203299
response = QuotaExceededResponse(
204300
response="The quota has been exceeded", cause=str(e)
205301
)
206302
raise HTTPException(**response.model_dump()) from e
207303
except APIStatusError as e:
304+
inference_time = time.monotonic() - start_time
208305
metrics.llm_calls_failures_total.inc()
209306
logger.exception("API error for request %s: %s", request_id, e)
307+
_queue_splunk_event(
308+
background_tasks,
309+
infer_request,
310+
request,
311+
request_id,
312+
str(e),
313+
inference_time,
314+
"infer_error",
315+
)
210316
response = InternalServerErrorResponse.generic()
211317
raise HTTPException(**response.model_dump()) from e
212318

213319
if not response_text:
214320
logger.warning("Empty response from LLM for request %s", request_id)
215321
response_text = constants.UNABLE_TO_PROCESS_RESPONSE
216322

323+
_queue_splunk_event(
324+
background_tasks,
325+
infer_request,
326+
request,
327+
request_id,
328+
response_text,
329+
inference_time,
330+
"infer_with_llm",
331+
)
332+
217333
logger.info("Completed rlsapi v1 /infer request %s", request_id)
218334

219335
return RlsapiV1InferResponse(

0 commit comments

Comments
 (0)