Skip to content

Commit daf6dd3

Browse files
committed
Skip model authentication against Azure
1 parent 30b82a6 commit daf6dd3

22 files changed

Lines changed: 325 additions & 368 deletions

Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,9 @@ clean-llama-stack: remove-llama-stack-container ## Remove container and image
101101
echo "Removing llama-stack image..."; \
102102
$(CONTAINER_RUNTIME) rmi $(LLAMA_STACK_IMAGE); \
103103
fi
104+
run-llama-stack: ## Start Llama Stack with enriched config (for local service mode)
105+
uv run src/llama_stack_configuration.py -c $(CONFIG) -i $(LLAMA_STACK_CONFIG) -o $(LLAMA_STACK_CONFIG) && \
106+
uv run llama stack run $(LLAMA_STACK_CONFIG)
104107

105108
test-unit: ## Run the unit tests
106109
@echo "Running unit tests..."

docs/providers.md

Lines changed: 17 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -92,51 +92,44 @@ azure_entra_id:
9292

9393
#### Llama Stack Configuration Requirements
9494

95-
Because Lightspeed builds on top of Llama Stack, certain configuration fields are required to satisfy the base Llama Stack schema. The config block for the Azure inference provider **must** include `api_key`, `api_base`, and `api_version` — Llama Stack will fail to start if any of these are missing.
95+
Because Lightspeed builds on top of Llama Stack, certain configuration fields are required to satisfy the base Llama Stack schema. The config block for the Azure inference provider **must** include `base_url` and `api_version`. When using Entra ID authentication, `api_key` is not required to be configured, since the API key is acquired and passed automatically at runtime.
9696

97-
**Important:** The `api_key` field must be set to `${env.AZURE_API_KEY}` exactly as shown below. This is not optional — Lightspeed uses this specific environment variable name as a placeholder for injection of the Entra ID access token. Using a different variable name will break the authentication flow.
97+
When `azure_entra_id` is configured in Lightspeed, config enrichment automatically sets `model_validation: false` on the `remote::azure` provider so Llama Stack can start without validating models against Azure at startup.
9898

9999
```yaml
100100
inference:
101101
- provider_id: azure
102102
provider_type: remote::azure
103103
config:
104-
api_key: ${env.AZURE_API_KEY} # Must be exactly this - placeholder for Entra ID token
105-
api_base: ${env.AZURE_API_BASE}
104+
# api_key: ${env.AZURE_API_KEY} # Can be omitted when Entra ID configured in LCORE
105+
base_url: ${env.AZURE_API_BASE}
106106
api_version: 2025-01-01-preview
107+
model_validation: false # added automatically by Lightspeed enrichment
107108
```
108109
109-
**How it works:** At startup, Lightspeed acquires an Entra ID access token and stores it in the `AZURE_API_KEY` environment variable. When Llama Stack initializes, it reads the config, substitutes `${env.AZURE_API_KEY}` with the token value, and uses it to authenticate with Azure OpenAI. Llama Stack also calls `models.list()` during initialization to validate provider connectivity, which is why the token must be available before client initialization.
110+
**How it works:** Llama Stack defers Azure authentication to inference time. Lightspeed acquires Entra ID tokens at runtime and passes them via the `X-LlamaStack-Provider-Data` header (`azure_api_key`, `azure_api_base`).
110111

111112
#### Access Token Lifecycle and Management
112113

113-
**Library mode startup:**
114+
**Lightspeed startup (library and service mode):**
114115
1. Lightspeed reads your Entra ID configuration
115-
2. Acquires an initial access token from Microsoft Entra ID
116-
3. Stores the token in the `AZURE_API_KEY` environment variable
117-
4. **Then** initializes the Llama Stack library client
116+
2. Does not acquire or cache access tokens at startup—authentication is deferred until request time
117+
3. Initializes the Llama Stack client without Azure credentials; credentials are supplied later via `X-LlamaStack-Provider-Data` when an Azure model is used
118118

119-
This ordering is critical because Llama Stack calls `models.list()` during initialization to validate provider connectivity. If the token is not set before client initialization, Azure requests will fail with authentication errors.
120-
121-
**Service mode startup:**
122-
123-
When running Llama Stack as a separate service, Lightspeed runs a pre-startup script that:
124-
1. Reads the Entra ID configuration
125-
2. Acquires an initial access token
126-
3. Writes the token to the `AZURE_API_KEY` environment variable
127-
4. **Then** Llama Stack service starts
128-
129-
This initial token is used solely for the `models.list()` validation call during Llama Stack startup. After startup, Lightspeed manages token refresh independently and passes fresh tokens via request headers.
119+
**Llama Stack service startup (container mode):**
120+
1. Config enrichment sets `model_validation: false` on the Azure provider
121+
2. Llama Stack starts without authenticating models against Azure
122+
3. Lightspeed connects to this service at startup without Azure credentials; tokens are added only for Azure inference requests
130123

131124
**During inference requests:**
132125
1. Before each request, Lightspeed checks if the token has expired
133-
2. If expired, a new token is automatically acquired and the environment variable is updated
134-
3. For library mode: the Llama Stack client is reloaded to pick up the new token
135-
4. For service mode: the token is passed via `X-LlamaStack-Provider-Data` request headers
126+
2. If expired, a new token is automatically acquired and cached in memory
127+
3. The token is passed via `X-LlamaStack-Provider-Data` (library and service mode)
136128

137129
**Token security:**
138130
- Access tokens are wrapped in `SecretStr` to prevent accidental logging
139-
- Tokens are stored only in the `AZURE_API_KEY` environment variable (single source of truth)
131+
- Tokens are cached in `AzureEntraIDManager` singleton class
132+
- Inference uses `X-LlamaStack-Provider-Data` headers
140133
- Each Uvicorn worker maintains its own token lifecycle independently
141134

142135
**Token validity:**

docs/rag_guide.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,6 @@ The script reads your `lightspeed-stack.yaml` configuration and enriches a base
8383
- `-c, --config`: Lightspeed config file (default: `lightspeed-stack.yaml`)
8484
- `-i, --input`: Input Llama Stack config (default: `run.yaml`)
8585
- `-o, --output`: Output enriched config (default: `run_.yaml`)
86-
- `-e, --env-file`: Path to .env file for AZURE_API_KEY (default: `.env`)
8786

8887
> [!TIP]
8988
> Use this script to generate your initial `run.yaml` configuration, then manually customize as needed for your specific setup.

examples/azure-run.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ providers:
2222
- provider_id: azure
2323
provider_type: remote::azure
2424
config:
25-
api_key: ${env.AZURE_API_KEY}
2625
base_url: https://ols-test.openai.azure.com/openai/v1
2726
api_version: 2024-02-15-preview
27+
model_validation: false
2828
- provider_id: openai
2929
provider_type: remote::openai
3030
config:

scripts/llama-stack-entrypoint.sh

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ set -e
77
INPUT_CONFIG="${LLAMA_STACK_CONFIG:-/opt/app-root/run.yaml}"
88
ENRICHED_CONFIG="/opt/app-root/run.yaml"
99
LIGHTSPEED_CONFIG="${LIGHTSPEED_CONFIG:-/opt/app-root/lightspeed-stack.yaml}"
10-
ENV_FILE="/opt/app-root/.env"
1110

1211
# Enrich config if lightspeed config exists
1312
if [ -f "$LIGHTSPEED_CONFIG" ]; then
@@ -16,14 +15,7 @@ if [ -f "$LIGHTSPEED_CONFIG" ]; then
1615
python3 /opt/app-root/llama_stack_configuration.py \
1716
-c "$LIGHTSPEED_CONFIG" \
1817
-i "$INPUT_CONFIG" \
19-
-o "$ENRICHED_CONFIG" \
20-
-e "$ENV_FILE" 2>&1 || ENRICHMENT_FAILED=1
21-
22-
# Source .env if generated (contains AZURE_API_KEY)
23-
if [ -f "$ENV_FILE" ]; then
24-
# shellcheck source=/dev/null
25-
set -a && . "$ENV_FILE" && set +a
26-
fi
18+
-o "$ENRICHED_CONFIG" 2>&1 || ENRICHMENT_FAILED=1
2719

2820
if [ -f "$ENRICHED_CONFIG" ] && [ "$ENRICHMENT_FAILED" -eq 0 ]; then
2921
echo "Using enriched config: $ENRICHED_CONFIG"

src/app/endpoints/query.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,6 @@
5454
is_context_length_error,
5555
prepare_input,
5656
store_query_results,
57-
update_azure_token,
5857
validate_attachments_metadata,
5958
validate_model_provider_override,
6059
)
@@ -204,7 +203,7 @@ async def query_endpoint_handler(
204203
and AzureEntraIDManager().is_token_expired
205204
and AzureEntraIDManager().refresh_token()
206205
):
207-
client = await update_azure_token(client)
206+
client = await AsyncLlamaStackClientHolder().update_azure_token()
208207

209208
# Retrieve response using Responses API
210209
turn_summary = await retrieve_response(

src/app/endpoints/responses.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,6 @@
7878
handle_known_apistatus_errors,
7979
is_context_length_error,
8080
store_query_results,
81-
update_azure_token,
8281
validate_model_provider_override,
8382
)
8483
from utils.quota import check_tokens_available, get_available_quotas
@@ -405,7 +404,7 @@ async def responses_endpoint_handler(
405404
and AzureEntraIDManager().is_token_expired
406405
and AzureEntraIDManager().refresh_token()
407406
):
408-
client = await update_azure_token(client)
407+
client = await AsyncLlamaStackClientHolder().update_azure_token()
409408

410409
input_text = (
411410
original_request.input

src/app/endpoints/streaming_query.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,6 @@
9292
is_context_length_error,
9393
prepare_input,
9494
store_query_results,
95-
update_azure_token,
9695
update_conversation_topic_summary,
9796
validate_attachments_metadata,
9897
validate_model_provider_override,
@@ -262,7 +261,7 @@ async def streaming_query_endpoint_handler( # pylint: disable=too-many-locals
262261
and AzureEntraIDManager().is_token_expired
263262
and AzureEntraIDManager().refresh_token()
264263
):
265-
client = await update_azure_token(client)
264+
client = await AsyncLlamaStackClientHolder().update_azure_token()
266265

267266
request_id = get_suid()
268267

src/app/main.py

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -77,15 +77,6 @@ async def lifespan(_app: FastAPI) -> AsyncIterator[None]:
7777

7878
initialize_sentry()
7979

80-
azure_config = configuration.configuration.azure_entra_id
81-
if azure_config is not None:
82-
AzureEntraIDManager().set_config(azure_config)
83-
if not AzureEntraIDManager().refresh_token():
84-
logger.warning(
85-
"Failed to refresh Azure token at startup. "
86-
"Token refresh will be retried on next Azure request."
87-
)
88-
8980
llama_stack_config = configuration.configuration.llama_stack
9081
await AsyncLlamaStackClientHolder().load(llama_stack_config)
9182
client = AsyncLlamaStackClientHolder().get_client()
@@ -104,6 +95,11 @@ async def lifespan(_app: FastAPI) -> AsyncIterator[None]:
10495
)
10596
raise
10697

98+
azure_entra_id_config = configuration.configuration.azure_entra_id
99+
if azure_entra_id_config is not None:
100+
AzureEntraIDManager().set_config(azure_entra_id_config)
101+
azure_base_url = await AsyncLlamaStackClientHolder().get_azure_base_url()
102+
AzureEntraIDManager().set_base_url(azure_base_url)
107103
logger.info("Registering MCP servers")
108104
await register_mcp_servers_async(logger, configuration.configuration)
109105
logger.info("App startup complete")

src/authorization/azure_token_manager.py

Lines changed: 26 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
"""Azure Entra ID token manager for Azure OpenAI authentication."""
22

3-
import os
43
import time
54
from typing import Optional
65

@@ -34,7 +33,13 @@ class AzureEntraIDManager(metaclass=Singleton):
3433
def __init__(self) -> None:
3534
"""Initialize the token manager with empty state."""
3635
self._expires_on: int = 0
36+
self._access_token: SecretStr = SecretStr("")
3737
self._entra_id_config: Optional[AzureEntraIdConfiguration] = None
38+
self._azure_base_url: Optional[str] = None
39+
40+
def set_base_url(self, base_url: Optional[str]) -> None:
41+
"""Set the Azure API base."""
42+
self._azure_base_url = base_url
3843

3944
def set_config(self, azure_config: AzureEntraIdConfiguration) -> None:
4045
"""Set the Azure Entra ID configuration."""
@@ -53,8 +58,24 @@ def is_token_expired(self) -> bool:
5358

5459
@property
5560
def access_token(self) -> SecretStr:
56-
"""Return the access token from environment variable as SecretStr."""
57-
return SecretStr(os.environ.get("AZURE_API_KEY", ""))
61+
"""Return the cached access token."""
62+
return self._access_token
63+
64+
@property
65+
def azure_base_url(self) -> Optional[str]:
66+
"""Return the cached Azure API base."""
67+
return self._azure_base_url
68+
69+
def build_azure_provider_data(self) -> Optional[dict[str, str]]:
70+
"""Build azure_api_key and azure_base_url entries for provider data.
71+
72+
Returns:
73+
Provider data dict when a token and base_url are available.
74+
"""
75+
token = self.access_token.get_secret_value()
76+
if not token or self.azure_base_url is None:
77+
return None
78+
return {"azure_api_key": token, "azure_api_base": self.azure_base_url}
5879

5980
def refresh_token(self) -> bool:
6081
"""Refresh the cached Azure access token.
@@ -76,9 +97,9 @@ def refresh_token(self) -> bool:
7697
return False
7798

7899
def _update_access_token(self, token: str, expires_on: int) -> None:
79-
"""Update the token in env var and track expiration time."""
100+
"""Update the cached token and track expiration time."""
101+
self._access_token = SecretStr(token)
80102
self._expires_on = expires_on - TOKEN_EXPIRATION_LEEWAY
81-
os.environ["AZURE_API_KEY"] = token
82103
expiry_time = time.strftime(
83104
"%Y-%m-%d %H:%M:%S", time.localtime(self._expires_on)
84105
)

0 commit comments

Comments
 (0)