Deployment & Operations

Containerization Strategy

The Post-Quantum WebAuthn Platform employs a multi-stage Docker build process to create optimized container images for deployment. The strategy focuses on security, performance, and efficient image size management while accommodating the specialized requirements of post-quantum cryptography operations.

The Docker build process consists of two distinct stages: a build stage and a runtime stage. The build stage uses the python:3.12-slim base image with additional development tools and dependencies required to compile and install the necessary packages. This includes build-essential, cmake, git, libssl-dev, and ninja-build. The prebuilt liboqs library bundle is copied to /opt/liboqs and configured in the system's library path to ensure proper linking for post-quantum cryptographic operations.

graph TD
A[Build Stage] --> B[Install Build Dependencies]
B --> C[Copy liboqs Bundle]
C --> D[Install Python Dependencies]
D --> E[Clean Build Artifacts]
E --> F[Runtime Stage]
F --> G[Install Runtime Dependencies]
G --> H[Copy liboqs Bundle]
H --> I[Copy Installed Packages]
I --> J[Copy Application Code]
J --> K[Configure Environment]
K --> L[Set Startup Command]

Diagram sources

Dockerfile

The runtime stage creates a minimal container by copying only the necessary installed packages and application code from the build stage, significantly reducing the attack surface and image size. The final image includes only the essential runtime dependencies, specifically libssl3, and the prebuilt liboqs library. The container is configured to run the application using Gunicorn as the WSGI server, binding to the port specified by the PORT environment variable (defaulting to 8000).

The container optimization strategy includes several key elements:

Multi-stage builds to separate build and runtime dependencies
Removal of build tools and development packages after installation
Use of slim base images to minimize footprint
Prebuilt liboqs binaries to avoid compilation in the container
Proper library path configuration for cryptographic operations

Section sources

Dockerfile

Cloud Deployment with Render

The platform is configured for deployment on the Render cloud platform using the render.yaml configuration file. This declarative configuration specifies the deployment parameters and runtime environment for the web service.

The deployment configuration defines a web service with the following specifications:

Service type: web
Name: python-fido2-webauthn-demo
Runtime: docker
Plan: free
Dockerfile path: ./Dockerfile
Docker context: .
Auto-deploy: enabled

The configuration leverages Docker deployment, allowing for consistent environments across development and production. The auto-deploy feature ensures that new deployments are automatically triggered when changes are pushed to the repository, facilitating continuous delivery.

graph TD
A[GitHub Repository] --> B{Push to Main Branch}
B --> C[Render Platform]
C --> D[Build Docker Image]
D --> E[Deploy Web Service]
E --> F[Running Application]
F --> G[External Access]
G --> H[Users]

Diagram sources

render.yaml

The deployment strategy on Render provides several operational benefits:

Free tier availability for development and testing
Automatic HTTPS and custom domain support
Zero-downtime deployments
Built-in logging and monitoring
Environment variable management
Global CDN for improved performance

For production deployments, the free plan would typically be upgraded to a standard or higher plan to ensure adequate resources and reliability. The Docker-based deployment ensures that the application runs in a consistent environment regardless of the underlying infrastructure.

Section sources

render.yaml

CI/CD Pipeline Configuration

The platform implements a continuous integration and deployment pipeline using Google Cloud Build, configured through the cloudbuild.yaml file. This automated pipeline handles the build, testing, and deployment process, ensuring consistent and reliable releases.

The CI/CD pipeline consists of three main steps:

Build Docker Image: The pipeline uses Docker BuildKit to build the container image with detailed logging enabled. The image is tagged with the project ID for proper identification and versioning.
Push Image: The built Docker image is pushed to Google Artifact Registry, a secure and private container registry. This step ensures that the built image is stored in a reliable repository and can be accessed by deployment targets.
Deploy to Cloud Run: The pipeline deploys the container image to Google Cloud Run, a fully managed compute platform that automatically scales applications. The deployment configuration includes specific resource allocations and runtime parameters.

graph TD
A[Code Commit] --> B[Cloud Build Trigger]
B --> C[Build Docker Image]
C --> D[Push to Artifact Registry]
D --> E[Deploy to Cloud Run]
E --> F[Production Environment]
F --> G[Health Checks]
G --> H[Monitoring]

Diagram sources

cloudbuild.yaml

The Cloud Run deployment configuration includes the following parameters:

Image: gcr.io/$PROJECT_ID/pqc-webauthn
Region: us-central1
Platform: managed
CPU: 0.5
Memory: 512Mi
Port: 8080
Authentication: allow-unauthenticated
Scaling: min-instances=0, max-instances=3
Execution environment: gen2

The pipeline also includes a timeout configuration of 1500 seconds (25 minutes) to accommodate the build and deployment process. The use of environment variables like $PROJECT_ID ensures that the pipeline can be easily adapted to different Google Cloud projects.

This CI/CD strategy provides several advantages:

Automated testing and deployment
Consistent build environment
Versioned container images
Rollback capability through image versioning
Integration with Google Cloud's security and monitoring tools
Scalable deployment infrastructure

Section sources

cloudbuild.yaml

Operational Guidelines

Monitoring and Logging

The Post-Quantum WebAuthn Platform implements comprehensive monitoring and logging to ensure operational visibility and facilitate troubleshooting. The system leverages both application-level logging and platform-level monitoring to provide a complete operational picture.

Application logging is implemented through Python's logging module, with log messages output to standard output for collection by the container runtime. The logging configuration includes:

Structured log messages with timestamps and severity levels
Contextual information for debugging
Performance metrics for critical operations
Security-related events and authentication attempts

The platform supports optional credential logging to GitHub through the device_logs.py module. This feature can be enabled by setting the ENABLE_GITHUB_LOGGING environment variable. When enabled, registration and authentication events are uploaded to a GitHub repository for long-term storage and analysis.

graph TD
A[Application] --> B[Log Events]
B --> C{Logging Enabled?}
C --> |Yes| D[Format Log Data]
C --> |No| E[Discard Log]
D --> F[Upload to GitHub]
F --> G[GitHub Repository]
B --> H[Standard Output]
H --> I[Container Runtime]
I --> J[Log Aggregation]

Diagram sources

server/server/device_logs.py
server/server/github_client.py

Alerting

The platform does not include built-in alerting functionality but is designed to integrate with external monitoring and alerting systems. Key metrics that should be monitored include:

Request latency and response times
Error rates and failure patterns
Authentication success/failure ratios
Resource utilization (CPU, memory)
PQC algorithm performance metrics

These metrics can be collected through the application logs and container monitoring interfaces, then fed into external alerting systems like Google Cloud Monitoring, Prometheus with Alertmanager, or third-party services like Datadog or New Relic.

Runtime Configuration

The application supports several environment variables for runtime configuration:

FIDO_SERVER_SECRET_KEY: Secret key for Flask session management
FIDO_SERVER_RP_NAME: Relying party name
FIDO_SERVER_RP_ID: Relying party identifier
FIDO_SERVER_GCS_ENABLED: Enable Google Cloud Storage integration
FIDO_SERVER_GCS_BUCKET: GCS bucket name for credential storage
ENABLE_GITHUB_LOGGING: Enable credential logging to GitHub

The configuration system implements a fallback mechanism for the secret key, attempting to read from environment variables, files, or generating a random key if none are provided.

Section sources

server/server/config.py
server/server/device_logs.py
server/server/github_client.py

Infrastructure Requirements

Compute Resources for PQC Operations

The Post-Quantum WebAuthn Platform has specific compute requirements due to the intensive nature of post-quantum cryptographic operations. The prebuilt liboqs library provides optimized implementations of various PQC algorithms, but these operations still require more computational resources than classical cryptography.

The platform should be deployed on instances with the following minimum specifications:

CPU: 2 vCPUs (recommended for production)
Memory: 2GB RAM (minimum), 4GB recommended
Storage: SSD storage for optimal performance
Network: High-bandwidth, low-latency connectivity

The current Cloud Run configuration specifies 0.5 CPU and 512Mi memory, which is suitable for development and low-traffic environments but may need to be increased for production workloads with high authentication request volumes.

The liboqs library supports multiple post-quantum algorithms, including:

Key encapsulation mechanisms (KEM): Kyber, NTRU, Classic McEliece
Digital signatures: Dilithium, Falcon, SPHINCS+

Each algorithm has different performance characteristics and resource requirements. The system automatically detects available PQC algorithms and logs their selection during authentication operations.

graph TD
A[PQC Algorithm Selection] --> B{Available Algorithms}
B --> C[Kyber]
B --> D[NTRU]
B --> E[Classic McEliece]
B --> F[Dilithium]
B --> G[Falcon]
B --> H[SPHINCS+]
C --> I[Performance Metrics]
D --> I
E --> I
F --> I
G --> I
H --> I
I --> J[Logging]

Diagram sources

server/server/pqc.py
prebuilt_liboqs/linux-x86_64/include/oqs/

Network Configuration

The platform requires specific network configuration to ensure proper operation and security:

HTTPS/TLS termination (handled by Render or Cloud Run)
WebAuthn API endpoints accessible over secure connections
Outbound connectivity for metadata downloads (FIDO Alliance MDS)
Optional connectivity to Google Cloud Storage
Optional connectivity to GitHub API for credential logging

The application binds to port 8000 (configurable via the PORT environment variable) and should be accessed through HTTPS to comply with WebAuthn security requirements. Some browsers require secure contexts for WebAuthn operations, making HTTPS essential.

Storage Provisioning

The platform supports multiple storage backends for credential data:

Local file storage (default, for development)
Google Cloud Storage (recommended for production)

The storage configuration is controlled by environment variables:

FIDO_SERVER_GCS_ENABLED: Enables GCS integration
FIDO_SERVER_GCS_BUCKET: Specifies the GCS bucket name
FIDO_SERVER_GCS_CREDENTIALS_FILE: Path to service account key file
FIDO_SERVER_GCS_CREDENTIALS_JSON: JSON credentials as environment variable

The local storage backend saves credential data in the session-credentials directory relative to the application root. For production deployments, Google Cloud Storage is recommended for its durability, scalability, and built-in redundancy.

Section sources

server/server/storage.py
server/server/cloud_storage.py
server/server/pqc.py

Scaling Considerations

The Post-Quantum WebAuthn Platform is designed to handle varying volumes of authentication requests, with specific considerations for scaling in production environments.

Horizontal Scaling

The application is stateless, making it suitable for horizontal scaling. Multiple instances can be deployed behind a load balancer to distribute authentication requests. The current Cloud Run configuration supports automatic scaling from 0 to 3 instances based on traffic.

For high-volume deployments, the following scaling parameters should be adjusted:

Increase maximum instances to handle peak loads
Configure minimum instances to reduce cold start latency
Implement request queuing for burst protection
Use regional or multi-regional deployments for global availability

Performance Optimization

Several strategies can be employed to optimize performance for high-volume authentication:

Implement caching for frequently accessed data (metadata, public keys)
Optimize PQC algorithm selection based on performance characteristics
Use connection pooling for database/storage operations
Implement request batching where appropriate
Optimize TLS configuration for performance

The platform already includes some performance optimizations:

Metadata caching to reduce external API calls
Efficient credential storage and retrieval
Prebuilt liboqs library for optimized PQC operations
Gunicorn worker configuration for concurrent request handling

Load Testing and Capacity Planning

Before deploying to production, load testing should be conducted to determine the platform's capacity and identify bottlenecks. Key metrics to monitor during load testing include:

Requests per second (RPS)
Latency percentiles (p50, p90, p99)
Error rates under load
Resource utilization at different load levels
PQC operation performance

Based on load testing results, appropriate scaling parameters can be configured, and infrastructure can be provisioned to handle expected traffic volumes with adequate headroom for traffic spikes.

Section sources

cloudbuild.yaml
server/server/storage.py
server/server/cloud_storage.py

Disaster Recovery and Backup

Backup Strategies

The platform implements several backup strategies to ensure data durability and availability:

For credential data stored in Google Cloud Storage:

GCS provides built-in redundancy across multiple locations
Versioning can be enabled to protect against accidental deletion
Cross-region replication can be configured for disaster recovery
Regular backups to separate buckets or regions

For local storage (development environments):

Regular snapshots of the container or host system
External backup solutions for persistent volumes
Manual export of credential data

The metadata service also includes backup capabilities:

Local caching of FIDO MDS metadata
Regular updates from the FIDO Alliance endpoint
Fallback to packaged metadata if downloads fail

Rollback Procedures

The containerized deployment model facilitates straightforward rollback procedures:

Maintain previous container image versions in the registry
Update the deployment configuration to reference the previous image
Redeploy the service with the previous image
Verify functionality and monitor for issues

For Render deployments, rollback can be accomplished by:

Accessing the deployment history in the Render dashboard
Selecting a previous deployment
Rolling back to the selected version

For Cloud Run deployments, rollback can be performed using the gcloud command:

gcloud run services update-traffic pqc-webauthn --to-revisions=[REVISION_ID]

High Availability

To ensure high availability, the following strategies should be implemented:

Deploy across multiple regions or availability zones
Use managed services with built-in redundancy (Cloud Run, GCS)
Implement health checks and automatic recovery
Configure appropriate monitoring and alerting
Establish clear incident response procedures

The platform's stateless design and support for external storage backends make it well-suited for high-availability deployments.

Section sources

server/server/storage.py
server/server/cloud_storage.py
cloudbuild.yaml

Deployment Examples

Development Environment

For local development, the application can be run directly using Python:

# Install dependencies
pip install -r requirements.txt

# Run the development server
python -m server.server

Or using Docker:

# Build the container image
docker build -t pqc-webauthn .

# Run the container
docker run -p 8000:8000 pqc-webauthn

Staging Environment

For staging deployments on Render:

# render.yaml
services:
  - type: web
    name: pqc-webauthn-staging
    runtime: docker
    plan: standard
    envVars:
      - key: FIDO_SERVER_RP_NAME
        value: Staging Demo Server
      - key: FIDO_SERVER_RP_ID
        value: staging.example.com
      - key: FIDO_SERVER_GCS_ENABLED
        value: true
      - key: FIDO_SERVER_GCS_BUCKET
        value: pqc-webauthn-staging-credentials
    dockerfilePath: ./Dockerfile
    autoDeploy: true

Production Environment

For production deployment using Google Cloud Build and Cloud Run:

# cloudbuild.yaml
steps:
  - name: 'gcr.io/cloud-builders/docker'
    id: 'Build Docker Image'
    env:
      - 'DOCKER_BUILDKIT=1'
    args:
      [
        'build',
        '-t', 'gcr.io/$PROJECT_ID/pqc-webauthn-prod',
        '.'
      ]

  - name: 'gcr.io/cloud-builders/docker'
    id: 'Push Image'
    args:
      [
        'push',
        'gcr.io/$PROJECT_ID/pqc-webauthn-prod'
      ]

  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    id: 'Deploy to Cloud Run'
    entrypoint: 'bash'
    args:
      - '-c'
      - |
        gcloud run deploy pqc-webauthn-prod \
          --image gcr.io/$PROJECT_ID/pqc-webauthn-prod \
          --region us-central1 \
          --platform managed \
          --cpu=1 \
          --memory=2Gi \
          --port=8080 \
          --allow-unauthenticated \
          --min-instances=1 \
          --max-instances=10 \
          --execution-environment=gen2 \
          --project $PROJECT_ID \
          --set-env-vars=FIDO_SERVER_RP_NAME="Production Server",FIDO_SERVER_RP_ID="auth.example.com",FIDO_SERVER_GCS_ENABLED=true,FIDO_SERVER_GCS_BUCKET=pqc-webauthn-prod-credentials

These examples demonstrate the flexibility of the deployment configuration, allowing the same codebase to be deployed in different environments with appropriate configuration for each use case.

Deployment & Operations

Deployment & Operations

Table of Contents

Containerization Strategy

Cloud Deployment with Render

CI/CD Pipeline Configuration

Operational Guidelines

Monitoring and Logging

Alerting

Runtime Configuration

Infrastructure Requirements

Compute Resources for PQC Operations

Network Configuration

Storage Provisioning

Scaling Considerations

Horizontal Scaling

Performance Optimization

Load Testing and Capacity Planning

Disaster Recovery and Backup

Backup Strategies

Rollback Procedures

High Availability

Deployment Examples

Development Environment

Staging Environment

Production Environment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!