Skip to content

Commit 3667f12

Browse files
sap-yuanYuan Huang
andauthored
DIARCHERS-1396: MCP-dedicated API layer with rate limiting, auth, and audit (#610)
* DIARCHERS-1396: add MCP-dedicated API layer with rate limiting, auth, and audit - DB migration 00046: mcp_token and mcp_access_log tables - api/handlers/mcp/auth.py: ib_mcp_* bearer token validation, project/trigger access checks - api/handlers/mcp/rate_limit.py: Redis sliding-window per-user per-endpoint rate limiter (fail-open) - api/handlers/mcp/audit.py: fire-and-forget audit logging to mcp_access_log - api/handlers/mcp/token_routes.py: token CRUD at /api/v1/mcp/tokens/* - api/handlers/mcp/routes/: /api/v1/mcp/* endpoints for projects, builds, jobs, artifacts, trigger - infrabox/test/api/mcp_test.py: 21 unit tests covering hash, access checks, rate limiter * fix: add mcp_token and mcp_access_log to test teardown TRUNCATE * fix: address code review findings on MCP API layer - ibflask.py: recognize ib_mcp_ prefix in get_token() to skip JWT decode; normalize_token() passes mcp type through unchanged - policies/mcp.rego: add OPA rules allowing mcp tokens on /api/v1/mcp/* data paths and session users on tokens management path - auth.py: fix per-project expiry check to handle naive vs aware datetime; treat malformed expiry as denied instead of granted - routes/builds.py: add LOCK TABLE before MAX(build_number)+1 INSERT to prevent duplicate build numbers under concurrency; move uuid import to top; drop premature audit 'attempt' before lock - routes/projects.py: add collaborator JOIN on MCP token path so revoked collaborators cannot enumerate project metadata - rate_limit.py: reset _redis_client to None on pipeline error so reconnect is attempted on the next request - token_routes.py: replace inline hashlib.sha256 with _hash_token(); move json/hashlib imports to top; fix MCPTokenTrigger.post docstring (permanent grant, not time-limited); remove dead body variable; add try/except around int(expires_days) conversion - audit.py: remove unused threading import; fix misleading docstring; move json import to top - 00046.sql: add retention comment for mcp_access_log pruning * fix: return dict/list from flask-restx Resource methods instead of jsonify() flask-restx's make_response wrapper cannot accept a Flask Response object as data — it tries to JSON-serialize it and raises TypeError. All mcp handler methods must return plain dict/list (+ optional status code), not jsonify() Response objects. * fix: use abort() instead of jsonify() in mcp_auth_required _reject() flask-restx make_response cannot accept Flask Response objects; _reject() must raise via abort() so the error handler formats the response, not the Resource wrapper. * fix: rename migration 00046.sql to 00047.sql to avoid version collision The test environment DB already has schema_version=46 from a prior development iteration (global_token columns) that was later consolidated into 00045.sql. Keeping mcp_token/mcp_access_log as 00046 causes the migration runner to skip it. Renaming to 00047 ensures it runs on any environment where 00046 is already marked as applied. mcp.rego is automatically included in the OPA image via the existing Dockerfile COPY instruction (src/openpolicyagent/policies → /policies), so no separate deployment step is needed. * feat: extend global token OPA policies to cover builds, jobs, and trigger Global tokens were only authorized for project list/detail endpoints. All project-scoped read endpoints (builds, jobs, console, testresults, archive, stats, etc.) returned 401 because no OPA allow rules existed for token.type = "global". This commit adds the missing allow rules across all affected policy files, following the same collaborator-check pattern already used for user tokens: - Read operations (builds, jobs and all sub-resources): allowed when global_token.user_id is a collaborator on the target project. - Write operations (restart, abort, rerun, cache-clear, trigger): additionally require global_token.scope_push = true. Also fixes a latent bug in projects_commits.rego and trigger.rego where the helper functions referenced an undefined `project` variable instead of the correct `project_id` parameter. New OPA test cases in global_viewer_test.rego cover: - builds list and single build allowed for collaborator - job console and stats allowed for collaborator - read denied for non-collaborator project - restart/trigger denied without scope_push - restart/trigger allowed with scope_push * fix: add missing source column to build table and cover trigger route with tests - Add ALTER TABLE build ADD COLUMN source VARCHAR(64) to 00047.sql; the MCPTrigger.post INSERT referenced this column but it was never created in any migration, causing a runtime DB error on every trigger call - Add TestMCPBuildsGet and TestMCPTriggerPost unit test classes to mcp_test.py covering happy path, access control, project type guard, and DB failure; test_trigger_inserts_source_column is a regression guard for the missing column * fix: fix list builds SQL — join commit/job tables for branch/status/dates * fix: remove dropped cpu/memory columns from job list SELECT * chore: trigger rebuild for build_3167 * 2606.18.0 --------- Co-authored-by: Yuan Huang <huangyuan01@sap.com>
1 parent 96d456a commit 3667f12

22 files changed

Lines changed: 1633 additions & 4 deletions

cfg/VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2606.10.0
1+
2606.18.0

infrabox/test/api/mcp_test.py

Lines changed: 378 additions & 0 deletions
Large diffs are not rendered by default.

infrabox/test/api/test_template.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ class ApiTestTemplate(unittest.TestCase):
88

99
def setUp(self):
1010
TestClient.execute(
11-
'TRUNCATE global_token_access_log, global_token, '
11+
'TRUNCATE mcp_access_log, mcp_token, '
12+
'global_token_access_log, global_token, '
1213
'collaborator, auth_token, secret, '
1314
'console, job_markup, job_badge, job, '
1415
'build, commit, repository, '

src/api/handlers/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@
77
import api.handlers.build
88
import api.handlers.job_api
99
import api.handlers.admin
10+
import api.handlers.mcp

src/api/handlers/mcp/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
import api.handlers.mcp.token_routes
2+
import api.handlers.mcp.routes.projects
3+
import api.handlers.mcp.routes.builds
4+
import api.handlers.mcp.routes.jobs

src/api/handlers/mcp/audit.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
"""
2+
Audit logging for MCP API calls.
3+
4+
Writes to the mcp_access_log table (best-effort, synchronous).
5+
Never raises — a logging failure must not break the request.
6+
"""
7+
import json
8+
import logging
9+
10+
from flask import g, request
11+
12+
logger = logging.getLogger('mcp_audit')
13+
14+
15+
def audit_mcp(action: str, outcome: str = 'attempt', details: dict = None, error: str = ''):
16+
"""Record one MCP audit entry synchronously on the request DB connection."""
17+
token_id = getattr(g, 'mcp_token_id', None)
18+
user_id = getattr(g, 'mcp_token_user_id', None)
19+
if not user_id:
20+
token = getattr(g, 'token', None)
21+
if token and 'user' in token:
22+
user_id = str(token['user'].get('id', ''))
23+
ip = request.remote_addr
24+
25+
try:
26+
db = getattr(g, 'db', None)
27+
if db is None:
28+
return
29+
30+
db.execute('''
31+
INSERT INTO mcp_access_log (token_id, user_id, action, outcome, details, error, ip)
32+
VALUES (%s, %s, %s, %s, %s, %s, %s)
33+
''', [
34+
token_id,
35+
user_id,
36+
action,
37+
outcome,
38+
_to_json(details),
39+
error or None,
40+
ip,
41+
])
42+
db.commit()
43+
except Exception as exc:
44+
logger.warning('MCP audit log failed: %s', exc)
45+
46+
47+
def _to_json(d):
48+
if d is None:
49+
return None
50+
try:
51+
return json.dumps(d)
52+
except Exception:
53+
return str(d)

src/api/handlers/mcp/auth.py

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
"""
2+
MCP token authentication middleware for InfraBox.
3+
4+
Token format: ib_mcp_<48 hex chars>
5+
Lookup key: first 16 chars of the 48-char hex suffix
6+
Hash: SHA-256 of the full raw token string (UTF-8)
7+
8+
MCP tokens are ONLY accepted on /api/v1/mcp/* paths.
9+
All other paths fall back to the normal OPA-based auth.
10+
"""
11+
import hashlib
12+
import logging
13+
from datetime import datetime, timezone
14+
from functools import wraps
15+
16+
from flask import g, request, abort
17+
18+
logger = logging.getLogger('mcp_auth')
19+
20+
_MCP_TOKEN_PREFIX = 'ib_mcp_'
21+
_MCP_PATH_PREFIX = '/api/v1/mcp/'
22+
23+
24+
def _utcnow():
25+
return datetime.now(timezone.utc)
26+
27+
28+
def _utcnow_naive():
29+
"""Naive UTC datetime for comparing against psycopg2 TIMESTAMP values."""
30+
return datetime.now(timezone.utc).replace(tzinfo=None)
31+
32+
33+
def _hash_token(raw_token: str) -> str:
34+
return hashlib.sha256(raw_token.encode('utf-8')).hexdigest()
35+
36+
37+
def _reject(status, message):
38+
abort(status, message)
39+
40+
41+
def mcp_auth_required(f):
42+
"""Decorator that validates ib_mcp_* Bearer tokens.
43+
44+
Sets on flask.g:
45+
g.mcp_token_id – token_id (16-char prefix)
46+
g.mcp_token_user_id – user uuid who owns the token
47+
g.mcp_enabled_projects – dict {project_id: expires_at_iso_or_None}
48+
g.mcp_allow_trigger – bool
49+
"""
50+
@wraps(f)
51+
def decorated(*args, **kwargs):
52+
auth = request.headers.get('Authorization', '')
53+
54+
if not auth.startswith('Bearer ' + _MCP_TOKEN_PREFIX):
55+
# not an MCP token — fall through to OPA auth (already done in before_request)
56+
return f(*args, **kwargs)
57+
58+
raw_token = auth[len('Bearer '):]
59+
60+
# MCP tokens are only valid on /api/v1/mcp/* paths
61+
if not request.path.startswith(_MCP_PATH_PREFIX):
62+
_reject(403, 'MCP token can only be used on /api/v1/mcp/* endpoints')
63+
64+
token_suffix = raw_token[len(_MCP_TOKEN_PREFIX):]
65+
if len(token_suffix) != 48:
66+
_reject(401, 'invalid MCP token format')
67+
68+
token_id = token_suffix[:16]
69+
token_hash = _hash_token(raw_token)
70+
71+
row = g.db.execute_one_dict('''
72+
SELECT token_id, user_id, enabled_projects, allow_trigger, expires_at, revoked_at
73+
FROM mcp_token
74+
WHERE token_id = %s AND token_hash = %s
75+
''', [token_id, token_hash])
76+
77+
if not row:
78+
_reject(401, 'invalid or unknown MCP token')
79+
80+
if row['revoked_at'] is not None:
81+
_reject(401, 'MCP token has been revoked')
82+
83+
if row['expires_at'] < _utcnow_naive():
84+
_reject(401, 'MCP token has expired')
85+
86+
# Update last_used_at (best-effort, non-fatal)
87+
try:
88+
g.db.execute(
89+
'UPDATE mcp_token SET last_used_at = NOW() WHERE token_id = %s',
90+
[token_id]
91+
)
92+
g.db.commit()
93+
except Exception as exc:
94+
logger.warning('failed to update last_used_at: %s', exc)
95+
96+
g.mcp_token_id = token_id
97+
g.mcp_token_user_id = str(row['user_id'])
98+
g.mcp_enabled_projects = row['enabled_projects'] or {}
99+
g.mcp_allow_trigger = bool(row['allow_trigger'])
100+
# Suppress OPA check for MCP-authed requests
101+
g.token = {'type': 'mcp', 'user': {'id': str(row['user_id']), 'role': 'user'}}
102+
103+
return f(*args, **kwargs)
104+
return decorated
105+
106+
107+
def check_project_access_mcp(project_id: str) -> bool:
108+
"""Return True if the current request may access project_id.
109+
110+
MCP token path: project must be in g.mcp_enabled_projects and not past
111+
its per-project expiry (if set).
112+
Session path: delegates to OPA (already checked in before_request).
113+
"""
114+
if not hasattr(g, 'mcp_enabled_projects'):
115+
# session / OPA path — access already verified
116+
return True
117+
118+
enabled = g.mcp_enabled_projects
119+
if project_id not in enabled:
120+
return False
121+
122+
per_project_expiry = enabled[project_id]
123+
if per_project_expiry:
124+
try:
125+
exp = datetime.fromisoformat(per_project_expiry)
126+
# fromisoformat() on a naive string (no UTC offset) returns a naive
127+
# datetime; compare against naive UTC to avoid TypeError.
128+
now = _utcnow_naive() if exp.tzinfo is None else _utcnow()
129+
if exp < now:
130+
return False
131+
except (ValueError, TypeError):
132+
# Malformed expiry — treat as expired rather than silently granting access.
133+
return False
134+
135+
return True
136+
137+
138+
def check_trigger_access_mcp() -> bool:
139+
"""Return True if the current request may trigger builds."""
140+
if not hasattr(g, 'mcp_allow_trigger'):
141+
return True
142+
return bool(g.mcp_allow_trigger)
143+
144+
145+
def get_mcp_user_id() -> str:
146+
"""Return user id string regardless of auth path."""
147+
if hasattr(g, 'mcp_token_user_id'):
148+
return g.mcp_token_user_id
149+
token = getattr(g, 'token', None)
150+
if token and 'user' in token:
151+
return str(token['user'].get('id', ''))
152+
return request.remote_addr or 'unknown'

src/api/handlers/mcp/rate_limit.py

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
"""
2+
Redis sliding-window rate limiter for MCP endpoints.
3+
4+
Key: ib_mcp_rl:{user_id}:{endpoint}
5+
Window: 60 seconds
6+
TTL: 70 seconds (slightly larger than window)
7+
8+
Fail-open: if Redis is unavailable, the request is allowed and a warning is logged.
9+
"""
10+
import logging
11+
import os
12+
import uuid
13+
from functools import wraps
14+
15+
from flask import g, jsonify, request
16+
17+
logger = logging.getLogger('mcp_rate_limit')
18+
19+
_WINDOW_MS = 60_000
20+
_KEY_TTL_S = 70
21+
22+
_ENDPOINT_LIMITS = {
23+
'get_job_log': int(os.environ.get('MCP_RATE_LIMIT_LOG_RPM', 10)),
24+
'list_job_artifacts': int(os.environ.get('MCP_RATE_LIMIT_ARTIFACT_RPM', 10)),
25+
'list_builds': int(os.environ.get('MCP_RATE_LIMIT_BUILDS_RPM', 30)),
26+
'list_jobs': int(os.environ.get('MCP_RATE_LIMIT_JOBS_RPM', 30)),
27+
'list_projects': int(os.environ.get('MCP_RATE_LIMIT_PROJECTS_RPM', 30)),
28+
'trigger_build': int(os.environ.get('MCP_RATE_LIMIT_TRIGGER_RPM', 5)),
29+
}
30+
_DEFAULT_RPM = int(os.environ.get('MCP_RATE_LIMIT_DEFAULT_RPM', 30))
31+
32+
_redis_client = None
33+
34+
35+
def _get_redis():
36+
global _redis_client
37+
if _redis_client is not None:
38+
return _redis_client
39+
40+
redis_url = os.environ.get('REDIS_URL', '')
41+
if not redis_url:
42+
return None
43+
44+
try:
45+
import redis
46+
_redis_client = redis.from_url(redis_url, socket_connect_timeout=1, socket_timeout=1)
47+
_redis_client.ping()
48+
except Exception as exc:
49+
logger.warning('MCP rate limiter: Redis unavailable (%s), fail-open', exc)
50+
_redis_client = None
51+
52+
return _redis_client
53+
54+
55+
def _check_rate_limit(user_id: str, endpoint: str) -> bool:
56+
"""Return True (allow) or False (deny). Fail-open on Redis errors."""
57+
rpm = _ENDPOINT_LIMITS.get(endpoint, _DEFAULT_RPM)
58+
59+
try:
60+
r = _get_redis()
61+
if r is None:
62+
return True
63+
64+
import time
65+
now_ms = int(time.time() * 1000)
66+
key = f'ib_mcp_rl:{user_id}:{endpoint}'
67+
68+
pipe = r.pipeline()
69+
pipe.zremrangebyscore(key, 0, now_ms - _WINDOW_MS)
70+
pipe.zadd(key, {str(uuid.uuid4()): now_ms})
71+
pipe.zcard(key)
72+
pipe.expire(key, _KEY_TTL_S)
73+
results = pipe.execute()
74+
75+
count = results[2]
76+
return count <= rpm
77+
78+
except Exception as exc:
79+
global _redis_client
80+
_redis_client = None # force reconnect attempt on next request
81+
logger.warning('MCP rate limiter Redis error (fail-open): %s', exc)
82+
return True
83+
84+
85+
def mcp_rate_limit(endpoint: str):
86+
"""Decorator factory: apply rate limiting for the given endpoint name."""
87+
def decorator(f):
88+
@wraps(f)
89+
def decorated(*args, **kwargs):
90+
from api.handlers.mcp.auth import get_mcp_user_id
91+
user_id = get_mcp_user_id()
92+
93+
if not _check_rate_limit(user_id, endpoint):
94+
return jsonify({
95+
'message': 'rate limit exceeded, please slow down',
96+
'status': 429,
97+
}), 429
98+
99+
return f(*args, **kwargs)
100+
return decorated
101+
return decorator

src/api/handlers/mcp/routes/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)