Skip to content

Commit ff20a06

Browse files
Authenticate Relay with Managed Identity token (#103)
In production, replace locally-stored SAS token in websockets URL with with Azure-generated Managed Identity JWT token in HTTP header. In non-production environments, use a SAS token if set in the environment, otherwise default to DefaultAzureCredential.
1 parent 20c50df commit ff20a06

7 files changed

Lines changed: 236 additions & 53 deletions

File tree

.env.development

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Azure Relay Configuration
22
AZURE_RELAY_NAMESPACE=manbrs-gateway-dev.servicebus.windows.net
33
AZURE_RELAY_HYBRID_CONNECTION=name-of-your-choice-relay-test-hc
4+
# Optional: set these to use SAS token auth locally instead of managed identity / az login
45
AZURE_RELAY_KEY_NAME=RootManageSharedAccessKey
56
AZURE_RELAY_SHARED_ACCESS_KEY=YOUR_SHARED_ACCESS_KEY_HERE
67

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# ADR-005: Use Managed Identity for Azure Relay authentication
2+
3+
Date: 2026-04-28
4+
5+
Status: Accepted
6+
7+
## Context
8+
9+
The gateway connects to Azure Relay Hybrid Connections to receive worklist actions from Manage Breast Screening. A connection must be authenticated to Azure Relay.
10+
11+
The initial implementation used Shared Access Signature (SAS) tokens. These are HMAC-SHA256 signatures computed from a shared secret key, embedded in the WebSocket connection URL as a query parameter (`sb-hc-token`). This required:
12+
13+
- A shared access key to be provisioned and stored as an environment variable (`AZURE_RELAY_SHARED_ACCESS_KEY`)
14+
- The key name to be configured separately (`AZURE_RELAY_KEY_NAME`)
15+
- Manual key rotation when keys needed to change
16+
17+
As the gateway runs inside the hospital network but is provisioned via Azure Arc, it can be assigned a managed identity through Arc-enabled infrastructure. Storing a long-lived shared secret in the environment is therefore unnecessary operational overhead and a potential security risk.
18+
19+
That said, setting up a working Relay connection locally is already complex. Mandating managed identity for all environments would add further friction for developers, who would need Azure CLI credentials with a Relay Listener role assignment before they could run the service.
20+
21+
## Decision
22+
23+
In **production** (`ENVIRONMENT=prod`), the gateway uses `ManagedIdentityCredential` exclusively. The SAS token path is unavailable regardless of what environment variables are set. The gateway's managed identity must be assigned the **Azure Relay Listener** role on the hybrid connection resource in Azure.
24+
25+
In **non-production** environments, the auth method is determined by whether `AZURE_RELAY_SHARED_ACCESS_KEY` is set:
26+
27+
- If set, a SAS token is generated and embedded in the WebSocket URL (`sb-hc-token`), preserving the simpler local development setup.
28+
- If absent, `DefaultAzureCredential` is used, which works with Azure CLI credentials (`az login`) for developers who have the Listener role assigned to their identity.
29+
30+
The token is passed as an `Authorization: Bearer` HTTP header on the WebSocket upgrade request for managed identity paths. Azure Relay validates it against Azure AD.
31+
32+
A startup credential check (`verify_credentials()`) runs before the listen loop. In production this will raise `ClientAuthenticationError` immediately if the managed identity is not correctly configured. In non-production with a SAS key it logs the auth method and continues.
33+
34+
`ManagedIdentityCredential` is preferred over `DefaultAzureCredential` in production because it is predictable — it only attempts the IMDS endpoint and fails clearly, rather than traversing a credential chain that could succeed unexpectedly via another mechanism.
35+
36+
## Consequences
37+
38+
### Positive Consequences
39+
40+
- **No secrets in production:** No shared key to store, rotate, or accidentally leak in deployed environments
41+
- **Fail-fast on misconfiguration:** Startup validation raises `ClientAuthenticationError` immediately rather than failing silently in the reconnect loop
42+
- **Preserved local developer experience:** SAS tokens continue to work locally when `AZURE_RELAY_SHARED_ACCESS_KEY` is set
43+
- **Consistent with platform direction:** Aligns with how the gateway already authenticates to the DICOM API
44+
45+
### Negative Consequences
46+
47+
- **Azure infrastructure dependency in production:** The managed identity and its role assignment must exist before the service can start

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ dependencies = [
1616
"requests>=2.33.1",
1717
"python-dotenv>=1.2.1",
1818
"azure-monitor-opentelemetry>=1.8.7",
19+
"azure-identity>=1.23.0,<2.0.0",
1920
]
2021

2122
[dependency-groups]

src/relay_listener.py

Lines changed: 67 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,12 @@
1414
import time
1515
import urllib.parse
1616

17+
from azure.identity import DefaultAzureCredential, ManagedIdentityCredential
1718
from dotenv import load_dotenv
1819
from websockets.asyncio.client import connect
1920
from websockets.exceptions import ConnectionClosedError
20-
from websockets.frames import CloseCode
2121

22+
from environment import Environment
2223
from services.mwl.create_worklist_item import CreateWorklistItem
2324
from services.storage import MWLStorage
2425
from telemetry import configure_telemetry
@@ -28,10 +29,14 @@
2829
logger = logging.getLogger(__name__)
2930

3031
DB_PATH = os.getenv("MWL_DB_PATH", "/var/lib/pacs/worklist.db")
31-
EXPIRED_TOKEN = "ExpiredToken"
32+
AZURE_RELAY_SCOPE = "https://relay.azure.com/.default"
3233
SAS_TOKEN_EXPIRY_SECONDS = 3600
3334

3435

36+
class CredentialNotAvailableError(RuntimeError):
37+
pass
38+
39+
3540
class RelayListener:
3641
"""
3742
Socket Listener for Azure Relay.
@@ -40,9 +45,11 @@ class RelayListener:
4045
Environment variables:
4146
AZURE_RELAY_NAMESPACE: Azure Relay namespace (default: relay-test.servicebus.windows.net)
4247
AZURE_RELAY_HYBRID_CONNECTION: Azure Relay hybrid connection name (default: relay-test-hc)
43-
AZURE_RELAY_KEY_NAME: Azure Relay shared access key name (default: RootManageSharedAccessKey)
44-
AZURE_RELAY_SHARED_ACCESS_KEY: Azure Relay shared access key (default: none)
4548
MWL_DB_PATH: Path to the MWL SQLite database file (default: /var/lib/pacs/worklist.db)
49+
50+
Non-production only (SAS token fallback):
51+
AZURE_RELAY_KEY_NAME: Shared access policy name (default: RootManageSharedAccessKey)
52+
AZURE_RELAY_SHARED_ACCESS_KEY: Shared access key value
4653
"""
4754

4855
def __init__(self, storage: MWLStorage):
@@ -91,7 +98,11 @@ def process_action(self, payload: dict):
9198

9299
def _connect(self):
93100
"""Connect to Azure Relay."""
94-
return connect(self.relay_uri.connection_url(), compression=None)
101+
return connect(
102+
self.relay_uri.connection_url(),
103+
compression=None,
104+
additional_headers=self.relay_uri.auth_headers(),
105+
)
95106

96107

97108
class RelayURI:
@@ -100,9 +111,35 @@ def __init__(self):
100111
self.hybrid_connection_name = os.getenv("AZURE_RELAY_HYBRID_CONNECTION", "relay-test-hc")
101112
self.key_name = os.getenv("AZURE_RELAY_KEY_NAME", "RootManageSharedAccessKey")
102113
self.shared_access_key = os.getenv("AZURE_RELAY_SHARED_ACCESS_KEY", "")
114+
self._env = Environment()
115+
self._credential = None if self._use_sas() else self._build_credential()
116+
117+
def _use_sas(self) -> bool:
118+
return not self._env.production and bool(self.shared_access_key)
103119

104-
def create_sas_token(self, expiry_seconds: int = SAS_TOKEN_EXPIRY_SECONDS) -> str:
105-
"""Create SAS token for Azure Relay authentication."""
120+
def _build_credential(self):
121+
if self._env.production:
122+
return ManagedIdentityCredential()
123+
return DefaultAzureCredential()
124+
125+
def connection_url(self) -> str:
126+
base = f"wss://{self.relay_namespace}/$hc/{self.hybrid_connection_name}?sb-hc-action=listen"
127+
if self._use_sas():
128+
token = self._create_sas_token()
129+
return f"{base}&sb-hc-token={urllib.parse.quote_plus(token)}"
130+
return base
131+
132+
def auth_headers(self) -> dict:
133+
if self._use_sas():
134+
return {}
135+
if self._credential is None:
136+
raise CredentialNotAvailableError(
137+
"No credential available — _credential should never be None when not using SAS"
138+
)
139+
token = self._credential.get_token(AZURE_RELAY_SCOPE).token
140+
return {"Authorization": f"Bearer {token}"}
141+
142+
def _create_sas_token(self, expiry_seconds: int = SAS_TOKEN_EXPIRY_SECONDS) -> str:
106143
uri = f"http://{self.relay_namespace}/{self.hybrid_connection_name}"
107144
encoded_uri = urllib.parse.quote_plus(uri)
108145
expiry = str(int(time.time() + expiry_seconds))
@@ -115,12 +152,25 @@ def create_sas_token(self, expiry_seconds: int = SAS_TOKEN_EXPIRY_SECONDS) -> st
115152
f"&se={expiry}&skn={self.key_name}"
116153
)
117154

118-
def connection_url(self) -> str:
119-
token = self.create_sas_token()
120-
return (
121-
f"wss://{self.relay_namespace}/$hc/{self.hybrid_connection_name}"
122-
f"?sb-hc-action=listen&sb-hc-token={urllib.parse.quote_plus(token)}"
123-
)
155+
156+
def verify_credentials():
157+
"""
158+
Verify relay credentials are available at startup.
159+
160+
In production, raises ClientAuthenticationError if managed identity is not configured.
161+
In non-production with a SAS key present, logs the auth method and returns immediately.
162+
"""
163+
uri = RelayURI()
164+
if uri._use_sas():
165+
logger.info("Using SAS token authentication for Azure Relay.")
166+
else:
167+
if uri._credential is None:
168+
raise CredentialNotAvailableError(
169+
"No credential available — _credential should never be None when not using SAS"
170+
)
171+
uri._credential.get_token(AZURE_RELAY_SCOPE)
172+
credential_type = "ManagedIdentityCredential" if uri._env.production else "DefaultAzureCredential"
173+
logger.info(f"Azure Relay credentials verified ({credential_type}).")
124174

125175

126176
async def main():
@@ -131,6 +181,7 @@ async def main():
131181
configure_telemetry(service_name="relay-listener")
132182

133183
logger.info("Socket Listener Starting...")
184+
verify_credentials()
134185
storage = MWLStorage(db_path=DB_PATH)
135186

136187
while True:
@@ -142,13 +193,9 @@ async def main():
142193
except ConnectionClosedError as e:
143194
code = e.rcvd.code if e.rcvd else "N/A"
144195
reason = e.rcvd.reason if e.rcvd else "N/A"
145-
146-
if code == CloseCode.INTERNAL_ERROR.value and EXPIRED_TOKEN in reason:
147-
logger.info("SAS token expired, refreshing...")
148-
else:
149-
logger.warning(f"Connection closed with code {code}: {reason}")
150-
logger.warning("Retrying in 5 seconds...")
151-
await asyncio.sleep(5)
196+
logger.warning(f"Connection closed with code {code}: {reason}")
197+
logger.warning("Retrying in 5 seconds...")
198+
await asyncio.sleep(5)
152199
except Exception as e:
153200
logger.warning(f"Connection error: {e}")
154201
logger.warning("Retrying in 5 seconds...")

tests/conftest.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
import sys
33
from contextlib import contextmanager
44
from pathlib import Path
5-
from unittest.mock import AsyncMock, patch
5+
from unittest.mock import AsyncMock, MagicMock, patch
66

77
import numpy as np
88
import pytest
@@ -15,6 +15,17 @@
1515
sys.path.append(f"{Path(__file__).parent.parent}/src")
1616

1717

18+
@pytest.fixture(autouse=True)
19+
def mock_azure_credential():
20+
mock = MagicMock()
21+
mock.get_token.return_value.token = "test-token"
22+
with (
23+
patch("relay_listener.DefaultAzureCredential", return_value=mock),
24+
patch("relay_listener.ManagedIdentityCredential", return_value=mock),
25+
):
26+
yield mock
27+
28+
1829
@pytest.fixture
1930
def tmp_dir():
2031
return f"{Path(__file__).parent}/tmp"

0 commit comments

Comments
 (0)