Skip to content

Deployment & Operations

Rain Zhang edited this page Nov 6, 2025 · 2 revisions

Deployment & Operations

Table of Contents

  1. Containerization Strategy
  2. Cloud Deployment with Render
  3. CI/CD Pipeline Configuration
  4. Operational Guidelines
  5. Infrastructure Requirements
  6. Scaling Considerations
  7. Disaster Recovery and Backup
  8. Deployment Examples

Containerization Strategy

The Post-Quantum WebAuthn Platform employs a multi-stage Docker build process to create optimized container images for deployment. The strategy focuses on security, performance, and efficient image size management while accommodating the specialized requirements of post-quantum cryptography operations.

The Docker build process consists of two distinct stages: a build stage and a runtime stage. The build stage uses the python:3.12-slim base image with additional development tools and dependencies required to compile and install the necessary packages. This includes build-essential, cmake, git, libssl-dev, and ninja-build. The prebuilt liboqs library bundle is copied to /opt/liboqs and configured in the system's library path to ensure proper linking for post-quantum cryptographic operations.

graph TD
A[Build Stage] --> B[Install Build Dependencies]
B --> C[Copy liboqs Bundle]
C --> D[Install Python Dependencies]
D --> E[Clean Build Artifacts]
E --> F[Runtime Stage]
F --> G[Install Runtime Dependencies]
G --> H[Copy liboqs Bundle]
H --> I[Copy Installed Packages]
I --> J[Copy Application Code]
J --> K[Configure Environment]
K --> L[Set Startup Command]
Loading

Diagram sources

  • Dockerfile

The runtime stage creates a minimal container by copying only the necessary installed packages and application code from the build stage, significantly reducing the attack surface and image size. The final image includes only the essential runtime dependencies, specifically libssl3, and the prebuilt liboqs library. The container is configured to run the application using Gunicorn as the WSGI server, binding to the port specified by the PORT environment variable (defaulting to 8000).

The container optimization strategy includes several key elements:

  • Multi-stage builds to separate build and runtime dependencies
  • Removal of build tools and development packages after installation
  • Use of slim base images to minimize footprint
  • Prebuilt liboqs binaries to avoid compilation in the container
  • Proper library path configuration for cryptographic operations

Section sources

  • Dockerfile

Cloud Deployment with Render

The platform is configured for deployment on the Render cloud platform using the render.yaml configuration file. This declarative configuration specifies the deployment parameters and runtime environment for the web service.

The deployment configuration defines a web service with the following specifications:

  • Service type: web
  • Name: python-fido2-webauthn-demo
  • Runtime: docker
  • Plan: free
  • Dockerfile path: ./Dockerfile
  • Docker context: .
  • Auto-deploy: enabled

The configuration leverages Docker deployment, allowing for consistent environments across development and production. The auto-deploy feature ensures that new deployments are automatically triggered when changes are pushed to the repository, facilitating continuous delivery.

graph TD
A[GitHub Repository] --> B{Push to Main Branch}
B --> C[Render Platform]
C --> D[Build Docker Image]
D --> E[Deploy Web Service]
E --> F[Running Application]
F --> G[External Access]
G --> H[Users]
Loading

Diagram sources

  • render.yaml

The deployment strategy on Render provides several operational benefits:

  • Free tier availability for development and testing
  • Automatic HTTPS and custom domain support
  • Zero-downtime deployments
  • Built-in logging and monitoring
  • Environment variable management
  • Global CDN for improved performance

For production deployments, the free plan would typically be upgraded to a standard or higher plan to ensure adequate resources and reliability. The Docker-based deployment ensures that the application runs in a consistent environment regardless of the underlying infrastructure.

Section sources

  • render.yaml

CI/CD Pipeline Configuration

The platform implements a continuous integration and deployment pipeline using Google Cloud Build, configured through the cloudbuild.yaml file. This automated pipeline handles the build, testing, and deployment process, ensuring consistent and reliable releases.

The CI/CD pipeline consists of three main steps:

  1. Build Docker Image: The pipeline uses Docker BuildKit to build the container image with detailed logging enabled. The image is tagged with the project ID for proper identification and versioning.

  2. Push Image: The built Docker image is pushed to Google Artifact Registry, a secure and private container registry. This step ensures that the built image is stored in a reliable repository and can be accessed by deployment targets.

  3. Deploy to Cloud Run: The pipeline deploys the container image to Google Cloud Run, a fully managed compute platform that automatically scales applications. The deployment configuration includes specific resource allocations and runtime parameters.

graph TD
A[Code Commit] --> B[Cloud Build Trigger]
B --> C[Build Docker Image]
C --> D[Push to Artifact Registry]
D --> E[Deploy to Cloud Run]
E --> F[Production Environment]
F --> G[Health Checks]
G --> H[Monitoring]
Loading

Diagram sources

  • cloudbuild.yaml

The Cloud Run deployment configuration includes the following parameters:

  • Image: gcr.io/$PROJECT_ID/pqc-webauthn
  • Region: us-central1
  • Platform: managed
  • CPU: 0.5
  • Memory: 512Mi
  • Port: 8080
  • Authentication: allow-unauthenticated
  • Scaling: min-instances=0, max-instances=3
  • Execution environment: gen2

The pipeline also includes a timeout configuration of 1500 seconds (25 minutes) to accommodate the build and deployment process. The use of environment variables like $PROJECT_ID ensures that the pipeline can be easily adapted to different Google Cloud projects.

This CI/CD strategy provides several advantages:

  • Automated testing and deployment
  • Consistent build environment
  • Versioned container images
  • Rollback capability through image versioning
  • Integration with Google Cloud's security and monitoring tools
  • Scalable deployment infrastructure

Section sources

  • cloudbuild.yaml

Operational Guidelines

Monitoring and Logging

The Post-Quantum WebAuthn Platform implements comprehensive monitoring and logging to ensure operational visibility and facilitate troubleshooting. The system leverages both application-level logging and platform-level monitoring to provide a complete operational picture.

Application logging is implemented through Python's logging module, with log messages output to standard output for collection by the container runtime. The logging configuration includes:

  • Structured log messages with timestamps and severity levels
  • Contextual information for debugging
  • Performance metrics for critical operations
  • Security-related events and authentication attempts

The platform supports optional credential logging to GitHub through the device_logs.py module. This feature can be enabled by setting the ENABLE_GITHUB_LOGGING environment variable. When enabled, registration and authentication events are uploaded to a GitHub repository for long-term storage and analysis.

graph TD
A[Application] --> B[Log Events]
B --> C{Logging Enabled?}
C --> |Yes| D[Format Log Data]
C --> |No| E[Discard Log]
D --> F[Upload to GitHub]
F --> G[GitHub Repository]
B --> H[Standard Output]
H --> I[Container Runtime]
I --> J[Log Aggregation]
Loading

Diagram sources

  • server/server/device_logs.py
  • server/server/github_client.py

Alerting

The platform does not include built-in alerting functionality but is designed to integrate with external monitoring and alerting systems. Key metrics that should be monitored include:

  • Request latency and response times
  • Error rates and failure patterns
  • Authentication success/failure ratios
  • Resource utilization (CPU, memory)
  • PQC algorithm performance metrics

These metrics can be collected through the application logs and container monitoring interfaces, then fed into external alerting systems like Google Cloud Monitoring, Prometheus with Alertmanager, or third-party services like Datadog or New Relic.

Runtime Configuration

The application supports several environment variables for runtime configuration:

  • FIDO_SERVER_SECRET_KEY: Secret key for Flask session management
  • FIDO_SERVER_RP_NAME: Relying party name
  • FIDO_SERVER_RP_ID: Relying party identifier
  • FIDO_SERVER_GCS_ENABLED: Enable Google Cloud Storage integration
  • FIDO_SERVER_GCS_BUCKET: GCS bucket name for credential storage
  • ENABLE_GITHUB_LOGGING: Enable credential logging to GitHub

The configuration system implements a fallback mechanism for the secret key, attempting to read from environment variables, files, or generating a random key if none are provided.

Section sources

  • server/server/config.py
  • server/server/device_logs.py
  • server/server/github_client.py

Infrastructure Requirements

Compute Resources for PQC Operations

The Post-Quantum WebAuthn Platform has specific compute requirements due to the intensive nature of post-quantum cryptographic operations. The prebuilt liboqs library provides optimized implementations of various PQC algorithms, but these operations still require more computational resources than classical cryptography.

The platform should be deployed on instances with the following minimum specifications:

  • CPU: 2 vCPUs (recommended for production)
  • Memory: 2GB RAM (minimum), 4GB recommended
  • Storage: SSD storage for optimal performance
  • Network: High-bandwidth, low-latency connectivity

The current Cloud Run configuration specifies 0.5 CPU and 512Mi memory, which is suitable for development and low-traffic environments but may need to be increased for production workloads with high authentication request volumes.

The liboqs library supports multiple post-quantum algorithms, including:

  • Key encapsulation mechanisms (KEM): Kyber, NTRU, Classic McEliece
  • Digital signatures: Dilithium, Falcon, SPHINCS+

Each algorithm has different performance characteristics and resource requirements. The system automatically detects available PQC algorithms and logs their selection during authentication operations.

graph TD
A[PQC Algorithm Selection] --> B{Available Algorithms}
B --> C[Kyber]
B --> D[NTRU]
B --> E[Classic McEliece]
B --> F[Dilithium]
B --> G[Falcon]
B --> H[SPHINCS+]
C --> I[Performance Metrics]
D --> I
E --> I
F --> I
G --> I
H --> I
I --> J[Logging]
Loading

Diagram sources

  • server/server/pqc.py
  • prebuilt_liboqs/linux-x86_64/include/oqs/

Network Configuration

The platform requires specific network configuration to ensure proper operation and security:

  • HTTPS/TLS termination (handled by Render or Cloud Run)
  • WebAuthn API endpoints accessible over secure connections
  • Outbound connectivity for metadata downloads (FIDO Alliance MDS)
  • Optional connectivity to Google Cloud Storage
  • Optional connectivity to GitHub API for credential logging

The application binds to port 8000 (configurable via the PORT environment variable) and should be accessed through HTTPS to comply with WebAuthn security requirements. Some browsers require secure contexts for WebAuthn operations, making HTTPS essential.

Storage Provisioning

The platform supports multiple storage backends for credential data:

  • Local file storage (default, for development)
  • Google Cloud Storage (recommended for production)

The storage configuration is controlled by environment variables:

  • FIDO_SERVER_GCS_ENABLED: Enables GCS integration
  • FIDO_SERVER_GCS_BUCKET: Specifies the GCS bucket name
  • FIDO_SERVER_GCS_CREDENTIALS_FILE: Path to service account key file
  • FIDO_SERVER_GCS_CREDENTIALS_JSON: JSON credentials as environment variable

The local storage backend saves credential data in the session-credentials directory relative to the application root. For production deployments, Google Cloud Storage is recommended for its durability, scalability, and built-in redundancy.

Section sources

  • server/server/storage.py
  • server/server/cloud_storage.py
  • server/server/pqc.py

Scaling Considerations

The Post-Quantum WebAuthn Platform is designed to handle varying volumes of authentication requests, with specific considerations for scaling in production environments.

Horizontal Scaling

The application is stateless, making it suitable for horizontal scaling. Multiple instances can be deployed behind a load balancer to distribute authentication requests. The current Cloud Run configuration supports automatic scaling from 0 to 3 instances based on traffic.

For high-volume deployments, the following scaling parameters should be adjusted:

  • Increase maximum instances to handle peak loads
  • Configure minimum instances to reduce cold start latency
  • Implement request queuing for burst protection
  • Use regional or multi-regional deployments for global availability

Performance Optimization

Several strategies can be employed to optimize performance for high-volume authentication:

  • Implement caching for frequently accessed data (metadata, public keys)
  • Optimize PQC algorithm selection based on performance characteristics
  • Use connection pooling for database/storage operations
  • Implement request batching where appropriate
  • Optimize TLS configuration for performance

The platform already includes some performance optimizations:

  • Metadata caching to reduce external API calls
  • Efficient credential storage and retrieval
  • Prebuilt liboqs library for optimized PQC operations
  • Gunicorn worker configuration for concurrent request handling

Load Testing and Capacity Planning

Before deploying to production, load testing should be conducted to determine the platform's capacity and identify bottlenecks. Key metrics to monitor during load testing include:

  • Requests per second (RPS)
  • Latency percentiles (p50, p90, p99)
  • Error rates under load
  • Resource utilization at different load levels
  • PQC operation performance

Based on load testing results, appropriate scaling parameters can be configured, and infrastructure can be provisioned to handle expected traffic volumes with adequate headroom for traffic spikes.

Section sources

  • cloudbuild.yaml
  • server/server/storage.py
  • server/server/cloud_storage.py

Disaster Recovery and Backup

Backup Strategies

The platform implements several backup strategies to ensure data durability and availability:

For credential data stored in Google Cloud Storage:

  • GCS provides built-in redundancy across multiple locations
  • Versioning can be enabled to protect against accidental deletion
  • Cross-region replication can be configured for disaster recovery
  • Regular backups to separate buckets or regions

For local storage (development environments):

  • Regular snapshots of the container or host system
  • External backup solutions for persistent volumes
  • Manual export of credential data

The metadata service also includes backup capabilities:

  • Local caching of FIDO MDS metadata
  • Regular updates from the FIDO Alliance endpoint
  • Fallback to packaged metadata if downloads fail

Rollback Procedures

The containerized deployment model facilitates straightforward rollback procedures:

  1. Maintain previous container image versions in the registry
  2. Update the deployment configuration to reference the previous image
  3. Redeploy the service with the previous image
  4. Verify functionality and monitor for issues

For Render deployments, rollback can be accomplished by:

  • Accessing the deployment history in the Render dashboard
  • Selecting a previous deployment
  • Rolling back to the selected version

For Cloud Run deployments, rollback can be performed using the gcloud command:

gcloud run services update-traffic pqc-webauthn --to-revisions=[REVISION_ID]

High Availability

To ensure high availability, the following strategies should be implemented:

  • Deploy across multiple regions or availability zones
  • Use managed services with built-in redundancy (Cloud Run, GCS)
  • Implement health checks and automatic recovery
  • Configure appropriate monitoring and alerting
  • Establish clear incident response procedures

The platform's stateless design and support for external storage backends make it well-suited for high-availability deployments.

Section sources

  • server/server/storage.py
  • server/server/cloud_storage.py
  • cloudbuild.yaml

Deployment Examples

Development Environment

For local development, the application can be run directly using Python:

# Install dependencies
pip install -r requirements.txt

# Run the development server
python -m server.server

Or using Docker:

# Build the container image
docker build -t pqc-webauthn .

# Run the container
docker run -p 8000:8000 pqc-webauthn

Staging Environment

For staging deployments on Render:

# render.yaml
services:
  - type: web
    name: pqc-webauthn-staging
    runtime: docker
    plan: standard
    envVars:
      - key: FIDO_SERVER_RP_NAME
        value: Staging Demo Server
      - key: FIDO_SERVER_RP_ID
        value: staging.example.com
      - key: FIDO_SERVER_GCS_ENABLED
        value: true
      - key: FIDO_SERVER_GCS_BUCKET
        value: pqc-webauthn-staging-credentials
    dockerfilePath: ./Dockerfile
    autoDeploy: true

Production Environment

For production deployment using Google Cloud Build and Cloud Run:

# cloudbuild.yaml
steps:
  - name: 'gcr.io/cloud-builders/docker'
    id: 'Build Docker Image'
    env:
      - 'DOCKER_BUILDKIT=1'
    args:
      [
        'build',
        '-t', 'gcr.io/$PROJECT_ID/pqc-webauthn-prod',
        '.'
      ]

  - name: 'gcr.io/cloud-builders/docker'
    id: 'Push Image'
    args:
      [
        'push',
        'gcr.io/$PROJECT_ID/pqc-webauthn-prod'
      ]

  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    id: 'Deploy to Cloud Run'
    entrypoint: 'bash'
    args:
      - '-c'
      - |
        gcloud run deploy pqc-webauthn-prod \
          --image gcr.io/$PROJECT_ID/pqc-webauthn-prod \
          --region us-central1 \
          --platform managed \
          --cpu=1 \
          --memory=2Gi \
          --port=8080 \
          --allow-unauthenticated \
          --min-instances=1 \
          --max-instances=10 \
          --execution-environment=gen2 \
          --project $PROJECT_ID \
          --set-env-vars=FIDO_SERVER_RP_NAME="Production Server",FIDO_SERVER_RP_ID="auth.example.com",FIDO_SERVER_GCS_ENABLED=true,FIDO_SERVER_GCS_BUCKET=pqc-webauthn-prod-credentials

These examples demonstrate the flexibility of the deployment configuration, allowing the same codebase to be deployed in different environments with appropriate configuration for each use case.

Section sources

  • Dockerfile
  • render.yaml
  • cloudbuild.yaml
  • server/server/config.py

Post-Quantum WebAuthn Platform

Getting Started

Architectural Foundations

Cryptography & Security

Authentication Platform

Core Protocol

Flows & Interfaces

Authenticator Capabilities

Server Platform

Frontend Platform

Architecture

Interaction & Utilities

Metadata Service (MDS)

Storage & Data Management

Data Models & Encoding

API Reference

Cross-Platform & HID

Operations & Troubleshooting

Glossary & References

Clone this wiki locally