Skip to content

SQL-414: Do allowed_roles check on internal endpoints#37260

Open
SangJunBak wants to merge 10 commits into
MaterializeInc:mainfrom
SangJunBak:jun/fix-internal-authorization-bug
Open

SQL-414: Do allowed_roles check on internal endpoints#37260
SangJunBak wants to merge 10 commits into
MaterializeInc:mainfrom
SangJunBak:jun/fix-internal-authorization-bug

Conversation

@SangJunBak

@SangJunBak SangJunBak commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

There exists a security bug where self managed environments with password/oidc auth can have
any non-system user access our internal HTTP routes. These include injecting audit events,
leader promotion, and etc. This PR fixes this by extending the schema of our server listener configurations.

Motivation

Fixes https://linear.app/materializeinc/issue/SQL-414/http-endpoints-only-authenticate-but-dont-authorize

Description

To review, I'd recommend reading the plan in the commit "Outline the problem and solution"
769b7bd (this PR) to understand the problem. The solution proposed there is quite 1:1 with the implementation. The implementation aims to simplify the authorization code in our web server handling, so much of the code added in this PR are simple refactors. I'd recommend reviewing commit by commit.

Verification

We base it off the regression test Dennis initially created and make sure it passes our existing test suite.

@SangJunBak SangJunBak force-pushed the jun/fix-internal-authorization-bug branch 6 times, most recently from f3062db to 1294913 Compare June 26, 2026 07:30
@SangJunBak SangJunBak removed the request for review from alex-hunt-materialize June 26, 2026 07:35
@SangJunBak SangJunBak force-pushed the jun/fix-internal-authorization-bug branch from 1294913 to 7be0604 Compare June 26, 2026 07:52
@SangJunBak SangJunBak marked this pull request as ready for review June 26, 2026 07:53
@SangJunBak SangJunBak requested review from a team and ggevay as code owners June 26, 2026 07:53
@SangJunBak SangJunBak requested review from alex-hunt-materialize and jubrad and removed request for jubrad June 26, 2026 07:53

@def- def- left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nightly failures look very related! https://buildkite.com/materialize/nightly/builds/16892

# Older environmentd requires a listeners config but only
# parses the legacy bool-route schema, so swap to the matching file
# under `listener_configs/v0_147_0/` for those versions.
if image_version is not None and image_version < "v26.31.0-dev":

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're on v26.32.0-dev, rebase and bump

@def- def- Jun 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we ok with this file allowing full access to internal routes for the emulator? Edit:

diff --git a/test/http-auth/mzcompose.py b/test/http-auth/mzcompose.py
index 74cb1a914c..ba5f737f23 100644
--- a/test/http-auth/mzcompose.py
+++ b/test/http-auth/mzcompose.py
@@ -111,6 +111,12 @@ def workflow_default(c: Composition) -> None:

     check_livez_coordinator_coupling(c)

+    check_emulator_baked_in_password_config(c)
+
+
+def workflow_emulator_password_config(c: Composition) -> None:
+    check_emulator_baked_in_password_config(c)
+

 def check_livez_coordinator_coupling(c: Composition) -> None:
     """Regression test for SQL-343: previously having group_claim_for above
@@ -198,3 +204,36 @@ def check_internal_endpoints_require_authorization(
             f"/api/catalog/inject-audit-events: status={r.status_code}, "
             f"audit rows={forged_rows}"
         )
+
+
+def check_emulator_baked_in_password_config(c: Composition) -> None:
+    with c.override(
+        Materialized(
+            listeners_config_path=f"{MZ_ROOT}/src/materialized/ci/listener_configs/v26_31_0/password.json"
+        )
+    ):
+        c.up("materialized")
+        ext = f"http://localhost:{c.port('materialized', 6876)}"
+
+        # mz_system authenticates with the baked-in login password
+        # (MZ_EXTERNAL_LOGIN_PASSWORD_MZ_SYSTEM) to provision a normal login role.
+        admin = ("mz_system", "password")
+        normal = ("sql414_normal", "sql414_normal_pw")
+        r = requests.post(
+            f"{ext}/api/sql",
+            auth=admin,
+            json={"query": f"CREATE ROLE {normal[0]} LOGIN PASSWORD '{normal[1]}'"},
+        )
+        assert r.status_code == 200, f"role setup failed: {r.status_code}: {r.text}"
+
+        # `/api/catalog/dump` and `/api/coordinator/dump` are the `internal`
+        # group, `/prof/` the `profiling` group. Coordinator dump carries pgwire
+        # cancellation secret keys.
+        for path in ("/api/catalog/dump", "/api/coordinator/dump", "/prof/"):
+            slug = path.strip("/").replace("/", "_")
+            with c.test_case(f"emulator_password_{slug}_blocks_normal_user"):
+                r = requests.get(f"{ext}{path}", auth=normal)
+                assert r.status_code in (401, 403), (
+                    f"normal role {normal[0]} reached {path} on the external "
+                    f"emulator port: status={r.status_code}, {len(r.text)} bytes"
+                )

Running bin/mzcompose --find http-auth run emulator-password-config fails with:

==> Running test case emulator_password_prof_blocks_normal_user
==> mzcompose: test case emulator_password_prof_blocks_normal_user failed: builtins.AssertionError: normal role sql414_normal reached /prof/ on the external emulator port: status=200, 1564 bytes
Traceback (most recent call last):
  File "/home/deen/git/materialize/misc/python/materialize/mzcompose/composition.py", line 740, in test_case
    yield
  File "/home/deen/git/materialize/test/http-auth/mzcompose.py", line 250, in check_emulator_baked_in_password_config
    assert r.status_code in (401, 403), (
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: normal role sql414_normal reached /prof/ on the external emulator port: status=200, 1564 bytes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I can see what happens if I make it more strict, but I essentially copied over the same configuration as the old listener config for password.json.

@SangJunBak SangJunBak Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may follow up for this however in a followup PR given my changes doesn't change the existent behavior. Also I can see how it'd be convenient to relax the access control in the listener configs for our tests

@SangJunBak

Copy link
Copy Markdown
Contributor Author

Converting into draft to address nightlies

@SangJunBak SangJunBak marked this pull request as draft June 26, 2026 16:29
Copy over the test Dennis created for this bug
I've outlined the root cause of the bug. It would be easy to do a one-off fix and do an authorization check for the internal routes, however that would violate the semantics of the listener config and make our implementation more confusing than it already is. The solution I propose aims to make our server code simpler while preserving the config's semantics, at the cost of a bigger refactor.
We modify the schema of the listener config:
- Creates new RouteGroup struct rather than a bool representing if a group of routes is enabled
- Lowers `allowed_roles` to each RouteGroup
- Creates a separate JSON serialization representing Rust's associated enum type

We also implement a single middleware for all authorization and decouple it from authentication
- We take advantage that all authentication creates an AuthedUser. Since Authz/authorization is just a function of the user name and the route's policy, We can create a middleware to keep concerns separate. We already pass along the AuthedUser as an extension, so we can attach another extension to carry over the route's policy.
We update test utils and misc to follow the new schema
- Version gate the new listener config
- Implement a migration from the old listener config to the current one to avoid having to redefine one in orchestratord when gating. 
- This is the change that will actually have an effect for all deployments in Cloud and self managed
We now match what we'd deploy in production and set allowed_roles to Internal for the internal/profiling routes
Almost all of these changes are simple one line refactors / moving files around.

- For Password/SASL/OIDC listener configs, set internal from NormalAndInternal to Internal to maintain consistency
- Move v0_145_0 configs to its own folder, as well as v26_31_0
- Update materialized docker image to copy a version's folder
@SangJunBak SangJunBak force-pushed the jun/fix-internal-authorization-bug branch from 7be0604 to cec701a Compare June 26, 2026 23:54
@SangJunBak SangJunBak force-pushed the jun/fix-internal-authorization-bug branch from cec701a to c85efd3 Compare June 27, 2026 01:08
We create a separate function for resolving the listener config depending on the version / whether there's an upgrade. This is important such that our upgrade tests don't break.

For anything that above that overrides these listener configs, we override these too
We can now remove the plan. It was mostly for the purpose of documentation / helping the reviewer.
@SangJunBak SangJunBak force-pushed the jun/fix-internal-authorization-bug branch from c85efd3 to 9472d30 Compare June 27, 2026 01:14
@SangJunBak SangJunBak requested review from aljoscha and removed request for ggevay June 27, 2026 01:15
@SangJunBak

SangJunBak commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

@def- I looked into why many of the nightlies (scalability, feature, workload replay) are failing, and it seems to be because each are comparing the branch's version against current main. However because main is at the same dev version (with a different sha) but doesn't contain my listener config changes, my version guard fails and the "ancestor" version panics. It seems to me like this is self healing however once this PR gets merged since future dev versions of v26.32.0 will know how to support the new listener config.

For fast lookup, this my commit for version guarding my change in Materialized: 27f5f57 (this PR)

Other nightlies (like the platform checks) I've fixed as well.

@SangJunBak SangJunBak requested a review from def- June 27, 2026 02:23
@SangJunBak SangJunBak marked this pull request as ready for review June 27, 2026 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants