Skip to content

Commit 74a9f5c

Browse files
committed
Merge branch 'main' into JslYoon-safety-shield-config
Signed-off-by: Lucas <lyoon@redhat.com>
1 parent 08153d9 commit 74a9f5c

25 files changed

Lines changed: 729 additions & 151 deletions

CONTRIBUTING.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,12 @@
2222
* [Pylint](#pylint)
2323
* [Security checks](#security-checks)
2424
* [Code style](#code-style)
25+
* [Function Standards](#function-standards)
26+
* [Documentation](#documentation)
27+
* [Type annotations](#type-annotations)
28+
* [Naming conventions](#naming-conventions)
29+
* [Async functions](#async-functions)
30+
* [Error handling](#error-handling)
2531
* [Formatting rules](#formatting-rules)
2632
* [Docstrings style](#docstrings-style)
2733

@@ -227,6 +233,33 @@ make security-check
227233

228234
## Code style
229235

236+
### Function Standards
237+
238+
#### Documentation
239+
240+
All functions require docstrings with brief descriptions
241+
242+
#### Type annotations
243+
244+
Use complete type annotations for parameters and return types
245+
246+
- Use `typing_extensions.Self` for model validators
247+
- Union types: `str | int` (modern syntax)
248+
- Optional: `Optional[Type]`
249+
250+
#### Naming conventions
251+
252+
Use snake_case with descriptive, action-oriented names (get_, validate_, check_)
253+
254+
#### Async functions
255+
256+
Use `async def` for I/O operations and external API calls
257+
258+
#### Error handling
259+
260+
- Use FastAPI `HTTPException` with appropriate status codes for API endpoints
261+
- Handle `APIConnectionError` from Llama Stack where appropriate
262+
230263
### Formatting rules
231264

232265
Code formatting rules are checked by __Black__. More info can be found on [https://black.readthedocs.io/en/stable/](https://black.readthedocs.io/en/stable/).

README.md

Lines changed: 44 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -340,8 +340,9 @@ Each MCP server requires two fields:
340340
- `name`: Unique identifier for the MCP server
341341
- `url`: The endpoint where the MCP server is running
342342

343-
And one optional field:
343+
And optional fields:
344344
- `provider_id`: MCP provider identification (defaults to `"model-context-protocol"`)
345+
- `headers`: List of HTTP header names to automatically forward from the incoming request to this MCP server (see [Automatic Header Propagation](#5-automatic-header-propagation-for-gateway-injected-headers))
345346

346347
**Minimal Example:**
347348

@@ -436,6 +437,41 @@ mcp_servers:
436437

437438
When no token is provided for an OAuth-configured server, the service may respond with **401 Unauthorized** and a **`WWW-Authenticate`** header (probed from the MCP server). Clients can use this to drive an OAuth flow and then retry with the token in `MCP-HEADERS`.
438439

440+
##### 5. Automatic Header Propagation (For Gateway-Injected Headers)
441+
442+
Use the `headers` field to automatically forward specific headers from the incoming HTTP request to an MCP server. This is designed for environments where infrastructure components (e.g. API gateways) inject headers that MCP servers need but clients cannot provide.
443+
444+
**HCC Use Case:** In Hybrid Cloud Console (HCC), the gateway strips the client's `Authorization` header and replaces it with `x-rh-identity` (a base64-encoded user identity). Backend services use `x-rh-identity` to identify users. Since clients never see this header, the existing `MCP-HEADERS` mechanism cannot be used. Instead, configure `headers` to automatically forward it:
445+
446+
```yaml
447+
mcp_servers:
448+
- name: "rbac"
449+
url: "http://rbac-service:8080"
450+
headers:
451+
- x-rh-identity
452+
- x-rh-insights-request-id
453+
```
454+
455+
When a request arrives at Lightspeed with these headers, they are automatically extracted and forwarded to the `rbac` MCP server. No client-side configuration is needed.
456+
457+
**Key behaviors:**
458+
459+
- **Case-insensitive matching**: Header names in the allowlist are matched case-insensitively against the incoming request.
460+
- **Missing headers are skipped**: If a header in the allowlist is not present on the incoming request, it is silently skipped. The MCP server is **not** skipped (unlike `authorization_headers` behavior).
461+
- **Additive with other methods**: Propagated headers can be combined with `authorization_headers` and `MCP-HEADERS`. If the same header name appears in both `authorization_headers` and `headers`, the `authorization_headers` value takes precedence.
462+
463+
**Combined example:**
464+
465+
```yaml
466+
mcp_servers:
467+
- name: "notifications"
468+
url: "http://notifications-service:8080"
469+
headers:
470+
- x-rh-identity # From incoming request
471+
authorization_headers:
472+
X-API-Key: "/var/secrets/notifications-key" # Static service credential
473+
```
474+
439475
##### Client-Authenticated MCP Servers Discovery
440476
441477
To help clients determine which MCP servers require client-provided tokens, use the **MCP Client Auth Options** endpoint:
@@ -492,13 +528,13 @@ mcp_servers:
492528
493529
##### Authentication Method Comparison
494530
495-
| Method | Use Case | Configuration | Token Scope | Example |
496-
|-----------------|-----------------------------|----------------------------------|-------------------------------|------------------------|
497-
| **Static File** | Service tokens, API keys | File path in config | Global (all users) | `"/var/secrets/token"` |
498-
| **Kubernetes** | K8s service accounts | `"kubernetes"` keyword | Per-user (from auth) | `"kubernetes"` |
499-
| **Client** | User-specific tokens | `"client"` keyword + HTTP header | Per-request | `"client"` |
500-
| **OAuth** | OAuth-protected MCP servers | `"oauth"` keyword + HTTP header | Per-request (from OAuth flow) | `"oauth"` |
501-
531+
| Method | Use Case | Configuration | Token Scope | Example |
532+
|------------------------|----------------------------------|----------------------------------|------------------------------------|-------------------------------|
533+
| **Static File** | Service tokens, API keys | File path in config | Global (all users) | `"/var/secrets/token"` |
534+
| **Kubernetes** | K8s service accounts | `"kubernetes"` keyword | Per-user (from auth) | `"kubernetes"` |
535+
| **Client** | User-specific tokens | `"client"` keyword + HTTP header | Per-request | `"client"` |
536+
| **OAuth** | OAuth-protected MCP servers | `"oauth"` keyword + HTTP header | Per-request (from OAuth flow) | `"oauth"` |
537+
| **Header Propagation** | Gateway-injected headers (HCC) | `headers` list | Per-request (from incoming request)| `headers: [x-rh-identity]` |
502538
503539
##### Important: Automatic Server Skipping
504540

docs/config.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -373,6 +373,7 @@ Useful resources:
373373
| provider_id | string | MCP provider identification |
374374
| url | string | URL of the MCP server |
375375
| authorization_headers | object | Headers to send to the MCP server. The map contains the header name and the path to a file containing the header value (secret). There are 3 special cases: 1. Usage of the kubernetes token in the header — use the string 'kubernetes' instead of the file path. 2. Usage of the client-provided token in the header — use the string 'client' instead of the file path. 3. Usage of OAuth token (resolved at request time or 401 with WWW-Authenticate) — use the string 'oauth' instead of the file path. |
376+
| headers | array | List of HTTP header names to automatically forward from the incoming request to this MCP server. Headers listed here are extracted from the original client request and included when calling the MCP server. This is useful when infrastructure components (e.g. API gateways) inject headers that MCP servers need, such as x-rh-identity in HCC. Header matching is case-insensitive. These headers are additive with authorization_headers and MCP-HEADERS. |
376377
| timeout | integer | Timeout in seconds for requests to the MCP server. If not specified, the default timeout from Llama Stack will be used. Note: This field is reserved for future use when Llama Stack adds timeout support. |
377378

378379

docs/contributing_guide.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,12 @@
2222
* [Pylint](#pylint)
2323
* [Security checks](#security-checks)
2424
* [Code style](#code-style)
25+
* [Function Standards](#function-standards)
26+
* [Documentation](#documentation)
27+
* [Type annotations](#type-annotations)
28+
* [Naming conventions](#naming-conventions)
29+
* [Async functions](#async-functions)
30+
* [Error handling](#error-handling)
2531
* [Formatting rules](#formatting-rules)
2632
* [Docstrings style](#docstrings-style)
2733

@@ -227,6 +233,33 @@ make security-check
227233

228234
## Code style
229235

236+
### Function Standards
237+
238+
#### Documentation
239+
240+
All functions require docstrings with brief descriptions
241+
242+
#### Type annotations
243+
244+
Use complete type annotations for parameters and return types
245+
246+
- Use `typing_extensions.Self` for model validators
247+
- Union types: `str | int` (modern syntax)
248+
- Optional: `Optional[Type]`
249+
250+
#### Naming conventions
251+
252+
Use snake_case with descriptive, action-oriented names (get_, validate_, check_)
253+
254+
#### Async functions
255+
256+
Use `async def` for I/O operations and external API calls
257+
258+
#### Error handling
259+
260+
- Use FastAPI `HTTPException` with appropriate status codes for API endpoints
261+
- Handle `APIConnectionError` from Llama Stack where appropriate
262+
230263
### Formatting rules
231264

232265
Code formatting rules are checked by __Black__. More info can be found on [https://black.readthedocs.io/en/stable/](https://black.readthedocs.io/en/stable/).

examples/lightspeed-stack-mcp-servers.yaml

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,4 +46,20 @@ mcp_servers:
4646
url: "http://url.com:6"
4747
authorization_headers:
4848
Authorization: "client" # Special value to forward the client's token
49-
timeout: 30 # Optional: timeout in seconds (future Llama Stack feature)
49+
timeout: 30 # Optional: timeout in seconds (future Llama Stack feature)
50+
# Example with automatic header propagation from incoming request (HCC use case)
51+
# Headers listed here are automatically extracted from the incoming HTTP request
52+
# and forwarded to this MCP server. Useful when infrastructure components (e.g.
53+
# HCC Gateway) inject headers that MCP servers need for user identification.
54+
- name: "rbac"
55+
url: "http://rbac-service:8080"
56+
headers:
57+
- x-rh-identity
58+
- x-rh-insights-request-id
59+
# Headers can be combined with authorization_headers (additive)
60+
- name: "notifications"
61+
url: "http://notifications-service:8080"
62+
headers:
63+
- x-rh-identity
64+
authorization_headers:
65+
X-API-Key: "/var/secrets/notifications-api-key"

src/app/endpoints/a2a.py

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import asyncio
44
import json
55
import uuid
6+
from collections.abc import Mapping
67
from datetime import datetime, timezone
78
from typing import Annotated, Any, AsyncIterator, MutableMapping, Optional
89

@@ -46,7 +47,7 @@
4647
from models.requests import QueryRequest
4748
from utils.mcp_headers import mcp_headers_dependency, McpHeaders
4849
from utils.responses import (
49-
extract_text_from_output_item,
50+
extract_text_from_response_item,
5051
prepare_responses_params,
5152
)
5253
from utils.suid import normalize_conversation_id
@@ -107,7 +108,7 @@ def _convert_responses_content_to_a2a_parts(output: list[Any]) -> list[Part]:
107108
parts: list[Part] = []
108109

109110
for output_item in output:
110-
text = extract_text_from_output_item(output_item)
111+
text = extract_text_from_response_item(output_item)
111112
if text:
112113
parts.append(Part(root=TextPart(text=text)))
113114

@@ -184,15 +185,22 @@ class A2AAgentExecutor(AgentExecutor):
184185
routing queries to the LLM backend using the Responses API.
185186
"""
186187

187-
def __init__(self, auth_token: str, mcp_headers: Optional[McpHeaders] = None):
188+
def __init__(
189+
self,
190+
auth_token: str,
191+
mcp_headers: Optional[McpHeaders] = None,
192+
request_headers: Optional[Mapping[str, str]] = None,
193+
):
188194
"""Initialize the A2A agent executor.
189195
190196
Args:
191197
auth_token: Authentication token for the request
192198
mcp_headers: MCP headers for context propagation
199+
request_headers: Incoming HTTP request headers for allowlist propagation
193200
"""
194201
self.auth_token: str = auth_token
195202
self.mcp_headers: McpHeaders = mcp_headers or {}
203+
self.request_headers: Optional[Mapping[str, str]] = request_headers
196204

197205
async def execute(
198206
self,
@@ -326,6 +334,7 @@ async def _process_task_streaming( # pylint: disable=too-many-locals
326334
self.mcp_headers,
327335
stream=True,
328336
store=True,
337+
request_headers=self.request_headers,
329338
)
330339
# Stream response from LLM using the Responses API
331340
stream = await client.responses.create(**responses_params.model_dump())
@@ -649,17 +658,26 @@ async def get_agent_card( # pylint: disable=unused-argument
649658
raise
650659

651660

652-
async def _create_a2a_app(auth_token: str, mcp_headers: McpHeaders) -> Any:
661+
async def _create_a2a_app(
662+
auth_token: str,
663+
mcp_headers: McpHeaders,
664+
request_headers: Optional[Mapping[str, str]] = None,
665+
) -> Any:
653666
"""Create an A2A Starlette application instance with auth context.
654667
655668
Args:
656669
auth_token: Authentication token for the request
657670
mcp_headers: MCP headers for context propagation
671+
request_headers: Incoming HTTP request headers for allowlist propagation
658672
659673
Returns:
660674
A2A Starlette ASGI application
661675
"""
662-
agent_executor = A2AAgentExecutor(auth_token=auth_token, mcp_headers=mcp_headers)
676+
agent_executor = A2AAgentExecutor(
677+
auth_token=auth_token,
678+
mcp_headers=mcp_headers,
679+
request_headers=request_headers,
680+
)
663681
task_store = await _get_task_store()
664682

665683
request_handler = DefaultRequestHandler(
@@ -713,7 +731,7 @@ async def handle_a2a_jsonrpc( # pylint: disable=too-many-locals,too-many-statem
713731
auth_token = ""
714732

715733
# Create A2A app with auth context
716-
a2a_app = await _create_a2a_app(auth_token, mcp_headers)
734+
a2a_app = await _create_a2a_app(auth_token, mcp_headers, request.headers)
717735

718736
# Detect if this is a streaming request by checking the JSON-RPC method
719737
is_streaming_request = False

src/app/endpoints/query.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,7 @@ async def query_endpoint_handler(
179179
mcp_headers,
180180
stream=False,
181181
store=True,
182+
request_headers=request.headers,
182183
)
183184

184185
# Handle Azure token refresh if needed
@@ -245,7 +246,8 @@ async def query_endpoint_handler(
245246
started_at=started_at,
246247
completed_at=completed_at,
247248
summary=turn_summary,
248-
query_request=query_request,
249+
query=query_request.query,
250+
attachments=query_request.attachments,
249251
skip_userid_check=_skip_userid_check,
250252
topic_summary=topic_summary,
251253
)
@@ -288,13 +290,14 @@ async def retrieve_response( # pylint: disable=too-many-locals
288290
Returns:
289291
TurnSummary: Summary of the LLM response content
290292
"""
293+
response: Optional[OpenAIResponseObject] = None
291294
try:
292295
moderation_result = await run_shield_moderation(
293296
client, responses_params.input, shield_ids
294297
)
295-
if moderation_result.blocked:
298+
if moderation_result.decision == "blocked":
296299
# Handle shield moderation blocking
297-
violation_message = moderation_result.message or ""
300+
violation_message = moderation_result.message
298301
await append_turn_to_conversation(
299302
client,
300303
responses_params.conversation,

src/app/endpoints/rlsapi_v1.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
from observability import InferenceEventData, build_inference_event, send_splunk_event
3636
from utils.query import handle_known_apistatus_errors
3737
from utils.responses import (
38-
extract_text_from_output_items,
38+
extract_text_from_response_items,
3939
get_mcp_tools,
4040
)
4141
from utils.suid import get_suid
@@ -192,7 +192,7 @@ async def retrieve_simple_response(
192192
)
193193
response = cast(OpenAIResponseObject, response)
194194

195-
return extract_text_from_output_items(response.output)
195+
return extract_text_from_response_items(response.output)
196196

197197

198198
def _get_cla_version(request: Request) -> str:
@@ -307,7 +307,7 @@ async def infer_endpoint(
307307
input_source = infer_request.get_input_source()
308308
instructions = _build_instructions(infer_request.context.systeminfo)
309309
model_id = _get_default_model_id()
310-
mcp_tools = await get_mcp_tools()
310+
mcp_tools = await get_mcp_tools(request_headers=request.headers)
311311
logger.debug(
312312
"Request %s: Combined input source length: %d", request_id, len(input_source)
313313
)

0 commit comments

Comments
 (0)