Matrix Stack Bugfixes & Lessons Learned

This document captures all the critical issues encountered during deployment and their solutions. These are things that are not clearly documented in the official documentation.

Cookie Domain on Public Suffix List
MAS Missing Assets Resource
MAS Not Fetching Userinfo
SSL Certificate Trust Issues
Authelia Redirect URI Configuration
Claims Template Compatibility
MAS Database Caching
MAS Discovery URL for Internal Communication
PostgreSQL Data Persistence Across Deployments

1. Cookie Domain on Public Suffix List

Problem

Authelia rejects cookie domains that are on the Public Suffix List, including:

.localhost
.local
.localdev

Error Message

level=error msg="Configuration: session: domain config #1 (domain '.localhost'): option 'domain' is not a valid cookie domain: the domain is part of the special public suffix list"

Solution

Use a fake TLD that's not on the public suffix list, such as:

example.test (recommended for local development)
example.internal
example.dev (but be aware .dev requires HTTPS)

Configuration

# authelia/config/configuration.yml
session:
  cookies:
    - domain: 'example.test'  # Not .example.test for subdomains!
      authelia_url: 'https://authelia.example.test'

Why Not Documented

The official Authelia docs mention the public suffix list but don't clearly list which common development TLDs are affected.

2. MAS Missing Assets Resource

Problem

MAS serves HTML pages but CSS/JS assets return 404, causing unstyled pages.

Error Message

WARN http.server.response GET-22 - "GET /assets/shared-CVCHz34K.css HTTP/1.1" 404 Not Found
WARN http.server.response GET-23 - "GET /assets/templates-CyDybuwN.css HTTP/1.1" 404 Not Found

Root Cause

The MAS HTTP listener configuration is missing the assets resource.

Solution

Add the assets resource to the MAS configuration:

# mas/config/config.yaml
http:
  listeners:
    - name: web
      resources:
        - name: discovery
        - name: human
        - name: oauth
        - name: compat
        - name: graphql
          playground: true
        - name: assets    # ← Critical: This is required!
        - name: adminapi  # ← Required for Element Admin panel
      binds:
        - address: '[::]:8080'

Verification

Test asset availability:

curl -I https://auth.example.test/assets/shared-CVCHz34K.css
# Should return: HTTP/2 200

Why Not Documented

The MAS documentation mentions the assets resource but doesn't emphasize it's mandatory for proper UI rendering. Many configuration examples omit it.

Assets Location

Container path: /usr/local/share/mas-cli/assets/
This path is automatically configured by MAS

3. MAS Not Fetching Userinfo

Problem

Templates render to empty strings even though claims are configured correctly.

Error Message

ERROR mas_handlers::upstream_oauth2::link:131 POST-102 - Template "{{ user.preferred_username }}" rendered to an empty string

Root Cause

MAS defaults to reading claims only from the ID token, not from the userinfo endpoint. Authelia provides most user claims via userinfo, not in the ID token.

Solution

Enable userinfo fetching in MAS upstream OAuth2 provider configuration:

# mas/config/config.yaml
upstream_oauth2:
  providers:
    - id: '01HQW90Z35CMXFJWQPHC3BGZGQ'
      issuer: 'https://authelia.example.test'
      client_id: 'mas-client'
      client_secret: 'your-secret'
      scope: 'openid profile email offline_access'
      token_endpoint_auth_method: 'client_secret_basic'
      fetch_userinfo: true    # ← Critical: Must be enabled!
      claims_imports:
        localpart:
          action: force
          template: '{{ user.preferred_username }}'

Why Not Documented

The MAS documentation doesn't clearly state that fetch_userinfo defaults to false and that most OIDC providers (including Authelia) serve user claims via userinfo, not in the ID token.

Testing

Check the database to verify the setting:

SELECT upstream_oauth_provider_id, fetch_userinfo FROM upstream_oauth_providers;

Should show t (true).

4. SSL Certificate Trust Issues

Problem

MAS cannot fetch Authelia's OIDC metadata when using self-signed certificates behind Caddy.

Error Message

ERROR mas_handlers::upstream_oauth2::cache - Failed to fetch provider metadata issuer=https://authelia.example.test error=invalid peer certificate: UnknownIssuer

Root Cause

MAS doesn't trust Caddy's self-signed CA certificate.

Solution (Local Development)

Extract Caddy's CA certificate:

docker compose exec caddy cat /data/caddy/pki/authorities/local/root.crt > mas/certs/caddy-ca.crt

Mount the certificate in the MAS container:

# docker-compose.local.yml
services:
  mas:
    environment:
      SSL_CERT_FILE: /certs/caddy-ca.crt
    volumes:
      - ./mas/certs:/certs:ro

Restart MAS to apply:

docker compose restart mas

Solution (Production with Let's Encrypt)

Not needed - production deployments use Let's Encrypt certificates which are already trusted.

Alternative (Local Development)

Use internal HTTP endpoint with discovery_url:

upstream_oauth2:
  providers:
    - issuer: 'https://authelia.example.test'
      discovery_url: 'http://authelia:9091/.well-known/openid-configuration'

Note: This only works if the Authelia OIDC issuer accepts HTTP for discovery.

Why Not Documented

The MAS documentation doesn't mention the SSL_CERT_FILE environment variable or how to handle self-signed certificates in development.

5. Authelia Redirect URI Configuration

Problem

OAuth flow fails with redirect URI mismatch error.

Error Message

Fehler: invalid_request
Beschreibung: The 'redirect_uri' parameter does not match any of the OAuth 2.0 Client's pre-registered 'redirect_uris'.

Root Cause

Authelia requires the exact redirect URI to be pre-registered, but MAS generates different callback URIs depending on context:

Standard: https://auth.example.test/callback
OAuth2: https://auth.example.test/oauth2/callback
Upstream provider: https://auth.example.test/upstream/callback/{provider_id}

The provider ID in the database may differ from the config file ID.

Solution

Add ALL possible redirect URIs to Authelia client configuration:

# authelia/config/configuration.yml
identity_providers:
  oidc:
    clients:
      - client_id: 'mas-client'
        redirect_uris:
          - 'https://auth.example.test/callback'
          - 'https://auth.example.test/oauth2/callback'
          - 'https://auth.example.test/upstream/callback/01HQW90Z35CMXFJWQPHC3BGZGQ'  # Config file ID
          - 'https://auth.example.test/upstream/callback/018df890-7c65-653a-f972-f68b06b87e17'  # Database ID

Finding the Provider ID

-- Connect to MAS database
docker compose exec postgres psql -U synapse -d mas

-- Get provider ID
SELECT upstream_oauth_provider_id FROM upstream_oauth_providers;

Why Not Documented

Neither MAS nor Authelia documentation clearly explains that MAS may generate different provider IDs between config and database, or that the upstream callback pattern requires the provider ID.

6. Claims Template Compatibility

Problem

Claims templates render to empty strings when using Authelia as the upstream provider.

Root Cause

Authelia provides different claims than expected. Testing revealed:

{{ user.name }} — not provided by Authelia
{{ user.preferred_username }} — works (contains username)
{{ user.email }} — works

Solution

Use preferred_username for localpart and displayname:

# mas/config/config.yaml
upstream_oauth2:
  providers:
    - claims_imports:
        localpart:
          action: force
          template: '{{ user.preferred_username }}'  # ← Use this
        displayname:
          action: suggest
          template: '{{ user.preferred_username }}'  # ← Not {{ user.name }}
        email:
          action: force
          template: '{{ user.email }}'
          set_email_verification: always

Testing Claims

To discover available claims, temporarily enable debug logging in MAS or check Authelia's userinfo endpoint:

curl -H "Authorization: Bearer YOUR_TOKEN" https://authelia.example.test/api/oidc/userinfo

Why Not Documented

The MAS documentation uses {{ user.name }} in examples, but this claim is not standardized in OIDC and many providers (including Authelia) don't provide it.

7. MAS Database Caching

Problem

After updating MAS configuration, changes to upstream OAuth2 providers don't take effect even after restart.

Root Cause

MAS caches provider configuration in PostgreSQL. Changes to config.yaml are only synced when:

MAS starts for the first time
The provider doesn't exist in the database
Explicit sync is forced

Solution

Delete the provider from the database to force a re-sync:

-- Connect to MAS database
docker compose exec postgres psql -U synapse -d mas

-- Find provider ID
SELECT upstream_oauth_provider_id FROM upstream_oauth_providers;

-- Delete provider (CASCADE deletes related records)
DELETE FROM upstream_oauth_authorization_sessions WHERE upstream_oauth_provider_id = 'provider-id-here';
DELETE FROM upstream_oauth_links WHERE upstream_oauth_provider_id = 'provider-id-here';
DELETE FROM upstream_oauth_providers WHERE upstream_oauth_provider_id = 'provider-id-here';

-- Exit and restart MAS
\q
docker compose restart mas

Verification

Check that the provider was re-created:

docker compose logs mas | grep "Adding provider"
# Should show: INFO mas_cli::sync:198 Adding provider provider.id=...

Why Not Documented

The MAS documentation doesn't clearly explain that provider configuration is cached in the database and must be manually deleted to apply config changes.

Alternative

Use MAS CLI to force sync (if available):

docker compose exec mas mas-cli config sync

8. MAS Discovery URL for Internal Communication

Problem

When MAS and Authelia are in the same Docker network, MAS tries to fetch OIDC metadata over HTTPS through the external reverse proxy, adding unnecessary latency and SSL complexity.

Solution

Use discovery_url to specify an internal HTTP endpoint:

# mas/config/config.yaml
upstream_oauth2:
  providers:
    - id: '01HQW90Z35CMXFJWQPHC3BGZGQ'
      issuer: 'https://authelia.example.test'  # Public issuer URL
      discovery_url: 'http://authelia:9091/.well-known/openid-configuration'  # Internal discovery
      client_id: 'mas-client'

Benefits

Faster metadata fetching (internal network)
No SSL certificate trust issues
Reduces external traffic through reverse proxy

Requirements

Authelia must be accessible via Docker network (authelia:9091)
The issuer claim in the discovery document must match the public issuer URL

Why Not Documented

The MAS documentation mentions discovery_url but doesn't emphasize its use for internal communication or bypassing SSL issues in development.

9. PostgreSQL Data Persistence Across Deployments

Severity

CRITICAL - Causes complete deployment failure

Problem

PostgreSQL data directories persist between deployments, even after cleanup attempts. When you re-run deploy.sh, it generates NEW passwords in .env, but PostgreSQL continues using the OLD password from when the data directory was first initialized. This causes authentication failures for all services (MAS, Synapse, Authelia).

Symptoms:

Error: could not connect to the database
Caused by:
    0: error returned from database: password authentication failed for user "synapse"

Root Cause

PostgreSQL only initializes the database on first run (when postgres/data is empty)
Once initialized, PostgreSQL ignores the POSTGRES_PASSWORD environment variable
User passwords are stored in PostgreSQL's internal authentication system
Even if you delete .env and configs, the postgres/data directory may survive
New deployment generates new passwords, but PostgreSQL still expects old passwords

Detection

Check the timestamps:

# Check when PostgreSQL data was created
ls -la postgres/data/

# Check when current .env was generated
head -2 .env

# If postgres/data is OLDER than .env, you have a mismatch!

Solution 1: Clean Slate (Recommended)

# Stop all services
docker compose down

# Delete all data directories
sudo rm -rf postgres/data synapse/data mas/data mas/certs caddy/data caddy/config

# Re-run deployment
./quickstart.sh   # or ./deploy.sh

Solution 2: Manual Password Update (Preserves Data)

# Get the new password from .env
source .env

# Connect to PostgreSQL
docker compose exec postgres psql -U synapse

# Update the password
ALTER USER synapse WITH PASSWORD 'new_password_from_env';
\q

# Restart all services
docker compose restart

Prevention

Both quickstart.sh and deploy.sh detect existing data directories and warn before proceeding. quickstart.sh explicitly wipes postgres/data when the user confirms, preventing the mismatch.

Why This Happens

Docker volumes and bind mounts persist even after docker compose down
sudo operations (used to fix permissions) may leave directories owned by root
Partial cleanup (e.g., deleting .env but not postgres/data) creates inconsistencies
PostgreSQL's security model treats the initial password as authoritative

Impact on All Deployment Variants

This issue affects:

Local deployment (fixed with data directory check)
Production deployment (fixed with data directory check)
With Authelia (affected)
Without Authelia (affected)

All deployment modes now include the pre-flight check to prevent this issue.

Official Documentation Gap

PostgreSQL documentation explains initialization behavior, but doesn't emphasize:

The persistence of data across container recreations
The implications for scripted deployments that generate dynamic passwords
The need to either preserve passwords OR ensure clean data directories

10. DNS Resolution and TLS Certificate Trust Issues

Severity

CRITICAL - Prevents authentication and causes complete login failure

Problem

Two related issues prevent proper HTTPS communication:

IPv6 DNS Priority: System DNS resolver returns IPv6 addresses for *.example.test domains instead of using /etc/hosts IPv4 (127.0.0.1) entries. This causes connections to route to external IPs instead of localhost.
Missing CA Certificate Trust: Synapse container cannot validate HTTPS connections to MAS because:
- Caddy uses self-signed certificates (local CA)
- Synapse doesn't have the Caddy CA certificate mounted
- Synapse doesn't have SSL_CERT_FILE environment variable set
- Synapse can't resolve domain names to reach Caddy from within Docker network

Root Cause

DNS Resolution Issue:

/etc/hosts only contained IPv4 (127.0.0.1) entries
System prefers IPv6 when available
DNS lookups for *.example.test return public IPv6 addresses
Connections timeout or fail when reaching external IPs

Certificate Trust Issue:

With MSC3861 enabled, Synapse must connect to MAS over HTTPS
MAS issuer URL is https://auth.example.test/
Synapse needs to fetch OIDC discovery metadata from MAS
Without CA certificate trust, Synapse gets SSL routines::tlsv1 alert internal error

Docker Network Resolution:

Containers use host's DNS resolver by default
Domain names resolve to external IPs from inside containers
Containers need extra_hosts to route domains back to host machine
Host machine forwards to Caddy via published port 443

Symptoms

User cannot log in via Element
Synapse logs show no auth-related errors (because issue happens during HTTPS connection)
MAS logs show no connection attempts from Synapse
Testing from host with curl https://matrix.example.test/ hangs/times out
Testing with curl --resolve matrix.example.test:443:127.0.0.1 works fine
Testing from Synapse container: curl https://auth.example.test/ fails with SSL error
getent hosts matrix.example.test returns IPv6 address instead of 127.0.0.1

Detection

# Check DNS resolution (should return 127.0.0.1 or ::1, not external IP)
getent hosts matrix.example.test

# Test from host (should work)
curl -k https://matrix.example.test/_matrix/client/versions

# Test from Synapse container (should work after fix)
docker exec matrix-synapse curl -sS https://auth.example.test/.well-known/openid-configuration

# Check if CA cert is mounted
docker exec matrix-synapse ls -la /certs/

# Check SSL_CERT_FILE environment variable
docker exec matrix-synapse env | grep SSL_CERT_FILE

Solution Applied

1. Fix /etc/hosts (Host Machine) Added IPv6 localhost entries alongside IPv4:

# /etc/hosts
127.0.0.1  matrix.example.test element.example.test auth.example.test authelia.example.test
::1  matrix.example.test element.example.test auth.example.test authelia.example.test

2. Mount Caddy CA Certificate in Synapse

# docker-compose.local.yml - synapse service
volumes:
  - ./synapse/data:/data
  - ./mas/certs:/certs:ro  # Mount CA certificate directory
environment:
  SYNAPSE_CONFIG_PATH: /data/homeserver.yaml
  SSL_CERT_FILE: /certs/caddy-ca.crt  # Trust Caddy's self-signed CA

3. Configure Domain Resolution in Synapse

# docker-compose.local.yml - synapse service
extra_hosts:
  - "auth.example.test:host-gateway"
  - "matrix.example.test:host-gateway"

This allows Synapse to:

Resolve auth.example.test to the host machine
Connect via host's port 443 (forwarded to Caddy)
Trust the connection using the mounted CA certificate

Impact on All Deployment Variants

This issue affects:

Local deployment (fixed with IPv6 hosts entries and Synapse CA config)
Production deployment (would need same fixes - IPv6 handled by real DNS, CA certs handled by Let's Encrypt)
With Authelia (affected - Synapse needs to reach MAS)
Without Authelia (affected - Synapse still needs to reach MAS)

Why This Happens

Local Development Environment: Uses self-signed certificates requiring explicit trust
Docker Networking: Containers don't automatically use host's /etc/hosts file
MSC3861 Architecture: Synapse MUST be able to reach MAS via HTTPS (issuer URL) to validate tokens
IPv6 Priority: Modern systems prefer IPv6 over IPv4 when both protocols are available

Production Deployment Notes

In production with real DNS and Let's Encrypt certificates:

IPv6 DNS resolution works correctly (points to your actual server)
Let's Encrypt certificates are trusted by default
extra_hosts not needed (real DNS works)
SSL_CERT_FILE not needed (system trusts Let's Encrypt CA)

This issue is specific to local development with:

Self-signed certificates
/etc/hosts-based DNS
Docker networking

Official Documentation Gap

Neither Matrix/Synapse nor Caddy documentation clearly explains:

The requirement for Synapse to trust the CA when using MSC3861
The need to configure DNS resolution from containers to host
The IPv6 priority behavior with /etc/hosts
The TLS requirements for MSC3861 delegated authentication

Configuration Changes Checklist

When making changes to the stack, follow this checklist to avoid common issues:

Changing Domains

Update Authelia cookie domain (no leading dot!)
Update all service URLs in MAS config
Update Authelia OIDC client redirect URIs
Update Element Web config.json
Update Synapse homeserver.yaml (MSC3861 issuer)
Update Caddyfile domains
Update /etc/hosts (local) or DNS records (production)
Restart all services

Changing OAuth Configuration

Update MAS config.yaml
Delete provider from MAS database
Restart MAS to re-sync
Verify provider was re-created in logs
Test authentication flow

Updating Claims Templates

Ensure fetch_userinfo: true is set
Use preferred_username not name for Authelia
Delete provider from MAS database
Restart MAS
Test registration/login flow

Debugging TLS Issues

Check if MAS has Caddy CA certificate mounted
Verify SSL_CERT_FILE environment variable
Consider using discovery_url with HTTP for internal calls
Check Caddy logs for SSL errors

Common Pitfalls

1. Forgetting to Add Assets Resource

Symptom: MAS pages load but have no styling Fix: Add - name: assets to MAS HTTP listener resources

2. Using `.localhost` Domain

Symptom: Authelia fails to start with cookie domain error Fix: Use example.test or another non-public-suffix domain

3. Not Enabling Userinfo Fetching

Symptom: Template renders to empty string error Fix: Add fetch_userinfo: true to MAS upstream provider

4. Not Restarting After Config Changes

Symptom: Changes don't take effect Fix: Always restart the affected service: docker compose restart service-name

5. Forgetting to Delete Cached Provider

Symptom: MAS still uses old configuration after restart Fix: Delete provider from database before restarting

6. Missing Redirect URIs in Authelia

Symptom: OAuth flow fails with invalid_request Fix: Add all possible redirect URI patterns to Authelia client config

7. Using `{{ user.name }}` Template

Symptom: Template renders to empty string Fix: Use {{ user.preferred_username }} instead for Authelia

8. SSL Certificate Trust Issues

Symptom: MAS can't fetch Authelia metadata Fix: Mount Caddy CA certificate or use internal discovery_url

Security Advisories

CVE-2025-49090 — Synapse room version default

Synapse versions before 1.130.0 default to room version 10, which has a known vulnerability. This stack sets the default to room version 12 in homeserver.yaml:

default_room_version: "12"

This is applied automatically by deploy.sh and quickstart.sh. If you have an existing homeserver.yaml generated before this fix, add the line manually and restart Synapse.

Updating images

The compose file uses latest for all images. To apply any security update:

docker compose pull
docker compose up -d

FilesExpand file tree

BUGFIXES.md

Latest commit

History

BUGFIXES.md

File metadata and controls

Matrix Stack Bugfixes & Lessons Learned

Table of Contents

1. Cookie Domain on Public Suffix List

Problem

Error Message

Solution

Configuration

Why Not Documented

2. MAS Missing Assets Resource

Problem

Error Message

Root Cause

Solution

Verification

Why Not Documented

Assets Location

3. MAS Not Fetching Userinfo

Problem

Error Message

Root Cause

Solution

Why Not Documented

Testing

4. SSL Certificate Trust Issues

Problem

Error Message

Root Cause

Solution (Local Development)

Solution (Production with Let's Encrypt)

Alternative (Local Development)

Why Not Documented

5. Authelia Redirect URI Configuration

Problem

Error Message

Root Cause

Solution

Finding the Provider ID

Why Not Documented

6. Claims Template Compatibility

Problem

Root Cause

Solution

Testing Claims

Why Not Documented

7. MAS Database Caching

Problem

Root Cause

Solution

Verification

Why Not Documented

Alternative

8. MAS Discovery URL for Internal Communication

Problem

Solution

Benefits

Requirements

Why Not Documented

9. PostgreSQL Data Persistence Across Deployments

Severity

Problem

Root Cause

Detection

Solution 1: Clean Slate (Recommended)

Solution 2: Manual Password Update (Preserves Data)

Prevention

Why This Happens

Impact on All Deployment Variants

Official Documentation Gap

10. DNS Resolution and TLS Certificate Trust Issues

Severity

Problem

Root Cause

Symptoms

Detection

2. Using `.localhost` Domain

7. Using `{{ user.name }}` Template