Skip to content

[Feature] implement cross-platform E2E testing for release artifacts with standalone package validation #39

@wherka-ama

Description

@wherka-ama

Feature description

Implement comprehensive end-to-end (E2E) testing that validates actual release artifacts (.deb, .rpm, standalone binary) on all supported platforms (Linux, macOS, Windows). Tests run as part of the release pipeline: artifacts are uploaded as prerelease, E2E tests execute against real packages, and only upon success is the release promoted to latest. These tests must validate real-world multi-organization GitHub App authentication workflows using dedicated test organizations with realistic configurations.

Problem or use case

Current CI validates unit tests and builds but does not verify actual release artifacts or GitHub App authentication workflows against real GitHub infrastructure:

  • Release artifacts untested: .deb, .rpm packages and standalone binaries are not validated before users consume them
  • Standalone vs extension gap: The tool can run standalone (via packages) or as gh CLI extension; only the latter is indirectly tested
  • Platform-specific packaging issues: Package installation paths, permissions, and git credential helper registration may vary by platform and package format
  • Risk of broken releases: Defects in packaging (e.g., nfpm configuration, missing dependencies) reach users before detection
  • Multi-org scenarios untested: Cross-organization repository access with submodules is a core use case but lacks validation
  • Private repo workflows: Authentication for private repositories (the primary use case) has no automated verification

Proposed solution

Workflow trigger: Run on release creation, after artifacts are built and uploaded as prerelease.

Release pipeline flow:

  1. Tag pushed → release workflow triggers
  2. Build artifacts (.deb, .rpm, binary archives) for all platforms
  3. Upload to GitHub Release as prerelease (not marked latest)
  4. Trigger E2E test job with matrix: ubuntu-latest, macos-latest, windows-latest
  5. E2E tests download and install actual release artifacts
  6. If E2E tests pass: promote release from prerelease to latest
  7. If E2E tests fail: release remains prerelease, alert maintainers

Platform matrix: ubuntu-latest (deb/rpm), macos-latest, windows-latest.

Test infrastructure:

  • Create 2-3 dedicated GitHub organizations solely for E2E testing
  • Each org contains 2-4 private repositories with realistic structures:
    • Main repo with submodules pointing to other test org repos
    • Cross-org access patterns matching enterprise use cases
    • Repositories named deterministically (e.g., gh-app-auth-test-{org}-{n})

Pre-test validation step:

- name: Verify test org repository visibility
  run: |
    # Ensure all expected repos are private before proceeding
    gh api /repos/{owner}/{repo} --jq '.private' | grep -q 'true'

Package-specific test phases:

Platform Package Format Installation Method Tool Invocation
Ubuntu .deb dpkg -i gh-app-auth_*.deb /usr/bin/gh-app-auth (standalone)
Fedora(or Ubuntu with alien) .rpm rpm -i gh-app-auth-*.rpm /usr/bin/gh-app-auth (standalone)
macOS .tar.gz Extract archive ./gh-app-auth (standalone)
Windows .zip Extract archive gh-app-auth.exe (standalone)

Test phases per platform:

  1. Download release artifact from prerelease URL
  2. Install package (deb/rpm) or extract archive (macOS/Windows)
  3. Standalone mode: Run gh-app-auth directly (not via gh CLI extension)
  4. Configure GitHub App credential (via secrets: E2E_APP_ID, E2E_PRIVATE_KEY)
  5. Run gh-app-auth setup with test org patterns
  6. Run gh-app-auth gitconfig --sync (validate credential helper registration)
  7. Clone private repo with submodules across orgs using generated credentials
  8. Verify submodule initialization succeeds
  9. Run gh-app-auth cleanup

Required status check: E2E tests are blocking for release promotion; a failed test prevents the release from being marked latest.

Alternative solutions

Approach Pros Cons
Mock GitHub API (current gap) Fast, deterministic, no secrets Does not catch real auth failures, API drift, packaging issues
Test on PR merge (original plan) Early defect detection Tests source, not artifacts; cannot test packages that don't exist yet
Post-release testing only Tests actual user experience Broken releases reach users before detection
Prerelease gate (proposed) Tests real artifacts, blocks broken releases Slightly delayed "latest" availability
Use public test repos Simpler setup Does not validate private repo auth (the actual use case)

Use case examples

# .github/workflows/release-e2e.yml (triggered by release workflow)
name: Release E2E Tests

on:
  workflow_call:
    inputs:
      release_tag:
        required: true
        type: string

jobs:
  e2e-linux-deb:
    runs-on: ubuntu-latest
    steps:
      - name: Download .deb artifact from prerelease
        run: |
          gh release download ${{ inputs.release_tag }} \
            --repo ${{ github.repository }} \
            --pattern "*.deb" \
            --dir ./artifacts
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: Install .deb package
        run: sudo dpkg -i ./artifacts/gh-app-auth_*.deb

      - name: Verify standalone binary path
        run: which gh-app-auth && /usr/bin/gh-app-auth --version

      - name: Verify test repository privacy
        env:
          GH_TOKEN: ${{ secrets.E2E_GITHUB_TOKEN }}
        run: |
          gh api /repos/gh-app-auth-test-org-1/main-repo --jq '.private' | grep 'true'

      - name: Setup gh-app-auth (standalone mode)
        run: |
          gh-app-auth setup \
            --app-id ${{ secrets.E2E_APP_ID }} \
            --private-key "${{ secrets.E2E_PRIVATE_KEY }}" \
            --pattern "https://github.com/gh-app-auth-test-org-*"

      - name: Configure git credential helper
        run: gh-app-auth gitconfig --sync --global

      - name: Clone multi-org repo with submodules (tests real auth)
        run: |
          git clone --recurse-submodules \
            https://github.com/gh-app-auth-test-org-1/main-repo \
            /tmp/test-clone
          test -f /tmp/test-clone/submodules/org-2-repo/README.md

  e2e-linux-rpm:
    runs-on: ubuntu-latest  # Or fedora-latest if available
    container: fedora:latest
    steps:
      - name: Download .rpm artifact
        run: gh release download ${{ inputs.release_tag }} --pattern "*.rpm"
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: Install .rpm package
        run: rpm -i gh-app-auth-*.rpm

      - name: Run standalone tool tests
        run: |
          gh-app-auth --version
          gh-app-auth setup --help

  promote-release:
    needs: [e2e-linux-deb, e2e-linux-rpm, e2e-macos, e2e-windows]
    runs-on: ubuntu-latest
    if: success()
    steps:
      - name: Promote prerelease to latest
        run: |
          gh release edit ${{ inputs.release_tag }} \
            --repo ${{ github.repository }} \
            --latest
        env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Impact

  • Users affected: All users consuming release artifacts (.deb, .rpm, binaries)
  • Frequency: Every release (tag push)
  • Priority: Critical for preventing broken releases from being marked latest

Additional context

  • Prerelease duration: Releases remain in "prerelease" state for ~5-15 minutes during E2E validation
  • Package testing scope: Both .deb (Debian/Ubuntu) and .rpm (Fedora/RHEL) must pass before promotion
  • Standalone binary validation: Tool must work without gh CLI installed (true standalone mode)
  • Cost consideration: E2E tests run on release only (not every PR), keeping CI costs manageable

Implementation considerations

  • Security: E2E GitHub App credentials require repository and contents scopes; store as repository secrets with restricted access; rotate keys regularly
  • Test org isolation: Keep test orgs completely separate from production; add CODEOWNERS, disable issues/PRs to prevent confusion
  • Package installation hygiene: Test jobs should run in clean containers (fedora:latest for rpm, ubuntu:latest for deb) to detect missing dependencies
  • Self-healing tests: Pre-flight check that repos are private; if not, fail fast with clear error
  • Flakiness mitigation: Retry clone operations (GitHub API has occasional 5xx on App token generation); use continue-on-error: false only at job level
  • Breaking changes: None to CLI; additive release workflow only

Recommended high-level solutions

  1. Prerelease gate pattern: Upload as prerelease → run E2E → promote to latest on success. Prevents users from installing broken artifacts.
  2. Package-specific test jobs: Separate jobs for .deb, .rpm, and archive formats to isolate packaging failures.
  3. Standalone validation: Explicitly test /usr/bin/gh-app-auth works without gh CLI; validates nfpm.yaml configuration and PATH setup.
  4. Test orgs via Terraform: Manage gh-app-auth-test-{1,2,3} orgs with repositories, branch protection, and App installations as code.
  5. Outcome-based assertions: Instead of checking exit codes, verify actual file content from cloned submodules is accessible.

Metadata

Metadata

Assignees

No fields configured for Feature.

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions