Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions agents/container-reliability-engineer.agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
name: 'container-reliability-engineer'
description: 'Container reliability specialist focused on healthchecks, restart policies, graceful shutdown, resource limits, and dependency ordering for Docker and Compose stacks'
tools: ['codebase', 'edit/editFiles', 'search', 'runCommands', 'terminalCommand']
---

# Container Reliability Engineer

You are a container reliability engineer. Your job is to make Docker and Docker Compose workloads survive restarts, crashes, and slow-starting dependencies without manual intervention.

## Core Expertise

- **Healthchecks**: Dockerfile `HEALTHCHECK` and compose `healthcheck` blocks with probes appropriate to each service and base image
- **Startup ordering**: `depends_on` with `condition: service_healthy` instead of sleep hacks or retry loops in entrypoints
- **Restart policies**: choosing between `no`, `on-failure`, `unless-stopped`, and `always` based on the workload
- **Graceful shutdown**: SIGTERM handling, `stop_grace_period`, and PID 1 signal problems (exec-form entrypoints, tini)
- **Resource limits**: memory/CPU limits and reservations that prevent one container from starving the host

## Working Method

1. Read the Dockerfiles and compose files before proposing anything; never guess the stack.
2. Diagnose with evidence: `docker inspect --format '{{json .State.Health}}'`, `docker events`, and container logs.
3. Propose the smallest change that fixes the reliability gap and explain the failure mode it prevents.
4. Flag anti-patterns when you see them: `sleep` in entrypoints, `restart: always` masking crash loops, missing `init`, shell-form CMD swallowing signals.
5. When the stack targets Kubernetes, translate advice to liveness/readiness/startup probes and `terminationGracePeriodSeconds`.

## Response Style

- Show complete, drop-in config blocks, not fragments.
- State chosen timing values (interval, timeout, start_period, grace period) and why.
- Keep prose short; let the configuration and one-line rationales carry the answer.
1 change: 1 addition & 0 deletions docs/README.agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-agents) for guidelines on how to
| [CentOS Linux Expert](../agents/centos-linux-expert.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fcentos-linux-expert.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fcentos-linux-expert.agent.md) | CentOS (Stream/Legacy) Linux specialist focused on RHEL-compatible administration, yum/dnf workflows, and enterprise hardening. | |
| [Clojure Interactive Programming](../agents/clojure-interactive-programming.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fclojure-interactive-programming.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fclojure-interactive-programming.agent.md) | Expert Clojure pair programmer with REPL-first methodology, architectural oversight, and interactive problem-solving. Enforces quality standards, prevents workarounds, and develops solutions incrementally through live REPL evaluation before file modifications. | |
| [Comet Opik](../agents/comet-opik.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fcomet-opik.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fcomet-opik.agent.md) | Unified Comet Opik agent for instrumenting LLM apps, managing prompts/projects, auditing prompts, and investigating traces/metrics via the latest Opik MCP server. | opik<br />[![Install MCP](https://img.shields.io/badge/Install-VS_Code-0098FF?style=flat-square)](https://aka.ms/awesome-copilot/install/mcp-vscode?name=opik&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22opik-mcp%22%5D%2C%22env%22%3A%7B%7D%7D)<br />[![Install MCP](https://img.shields.io/badge/Install-VS_Code_Insiders-24bfa5?style=flat-square)](https://aka.ms/awesome-copilot/install/mcp-vscodeinsiders?name=opik&config=%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22opik-mcp%22%5D%2C%22env%22%3A%7B%7D%7D)<br />[![Install MCP](https://img.shields.io/badge/Install-Visual_Studio-C16FDE?style=flat-square)](https://aka.ms/awesome-copilot/install/mcp-visualstudio/mcp-install?%7B%22command%22%3A%22npx%22%2C%22args%22%3A%5B%22-y%22%2C%22opik-mcp%22%5D%2C%22env%22%3A%7B%7D%7D) |
| [Container Reliability Engineer](../agents/container-reliability-engineer.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fcontainer-reliability-engineer.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fcontainer-reliability-engineer.agent.md) | Container reliability specialist focused on healthchecks, restart policies, graceful shutdown, resource limits, and dependency ordering for Docker and Compose stacks | |
| [Context Architect](../agents/context-architect.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fcontext-architect.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fcontext-architect.agent.md) | An agent that helps plan and execute multi-file changes by identifying relevant context and dependencies | |
| [Context7 Expert](../agents/context7.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fcontext7.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fcontext7.agent.md) | Expert in latest library versions, best practices, and correct syntax using up-to-date documentation | [context7](https://github.com/mcp/io.github.upstash/context7)<br />[![Install MCP](https://img.shields.io/badge/Install-VS_Code-0098FF?style=flat-square)](https://aka.ms/awesome-copilot/install/mcp-vscode?name=context7&config=%7B%22url%22%3A%22https%3A%2F%2Fmcp.context7.com%2Fmcp%22%2C%22headers%22%3A%7B%22CONTEXT7_API_KEY%22%3A%22%24%7B%7B%20secrets.COPILOT_MCP_CONTEXT7%20%7D%7D%22%7D%7D)<br />[![Install MCP](https://img.shields.io/badge/Install-VS_Code_Insiders-24bfa5?style=flat-square)](https://aka.ms/awesome-copilot/install/mcp-vscodeinsiders?name=context7&config=%7B%22url%22%3A%22https%3A%2F%2Fmcp.context7.com%2Fmcp%22%2C%22headers%22%3A%7B%22CONTEXT7_API_KEY%22%3A%22%24%7B%7B%20secrets.COPILOT_MCP_CONTEXT7%20%7D%7D%22%7D%7D)<br />[![Install MCP](https://img.shields.io/badge/Install-Visual_Studio-C16FDE?style=flat-square)](https://aka.ms/awesome-copilot/install/mcp-visualstudio/mcp-install?%7B%22url%22%3A%22https%3A%2F%2Fmcp.context7.com%2Fmcp%22%2C%22headers%22%3A%7B%22CONTEXT7_API_KEY%22%3A%22%24%7B%7B%20secrets.COPILOT_MCP_CONTEXT7%20%7D%7D%22%7D%7D) |
| [Create PRD Chat Mode](../agents/prd.agent.md)<br />[![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fprd.agent.md)<br />[![Install in VS Code Insiders](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://aka.ms/awesome-copilot/install/agent?url=vscode-insiders%3Achat-agent%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fagents%2Fprd.agent.md) | Generate a comprehensive Product Requirements Document (PRD) in Markdown, detailing user stories, acceptance criteria, technical considerations, and metrics. Optionally create GitHub issues upon user confirmation. | |
Expand Down
1 change: 1 addition & 0 deletions docs/README.skills.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to
| [dependabot](../skills/dependabot/SKILL.md)<br />`gh skills install github/awesome-copilot dependabot` | Comprehensive guide for configuring and managing GitHub Dependabot. Use this skill when users ask about creating or optimizing dependabot.yml files, managing Dependabot pull requests, configuring dependency update strategies, setting up grouped updates, monorepo patterns, multi-ecosystem groups, security update configuration, auto-triage rules, or any GitHub Advanced Security (GHAS) supply chain security topic related to Dependabot. For pre-commit dependency vulnerability scanning in AI coding agents via the GitHub MCP Server, this skill references the Advanced Security plugin (`advanced-security@copilot-plugins`). Use this skill when an agent needs to scan dependencies for known vulnerabilities before committing. | `references/dependabot-yml-reference.md`<br />`references/example-configs.md`<br />`references/pr-commands.md` |
| [devops-rollout-plan](../skills/devops-rollout-plan/SKILL.md)<br />`gh skills install github/awesome-copilot devops-rollout-plan` | Generate comprehensive rollout plans with preflight checks, step-by-step deployment, verification signals, rollback procedures, and communication plans for infrastructure and application changes | None |
| [diagnose](../skills/diagnose/SKILL.md)<br />`gh skills install github/awesome-copilot diagnose` | Perform a systematic diagnostic scan of an AI workflow across 5 quality dimensions — prompt quality, context efficiency, tool health, architecture fitness, and safety — producing a scored report with prioritized remediation actions. | None |
| [docker-healthcheck-generator](../skills/docker-healthcheck-generator/SKILL.md)<br />`gh skills install github/awesome-copilot docker-healthcheck-generator` | Generate Docker HEALTHCHECK instructions and docker-compose healthcheck blocks tailored to the services detected in the project. Use when the user asks to add healthchecks to a Dockerfile or compose file, mentions container health, readiness of dependent services, or wants depends_on with service_healthy conditions. | None |
| [documentation-writer](../skills/documentation-writer/SKILL.md)<br />`gh skills install github/awesome-copilot documentation-writer` | Diátaxis Documentation Expert. An expert technical writer specializing in creating high-quality software documentation, guided by the principles and structure of the Diátaxis technical documentation authoring framework. | None |
| [dotnet-best-practices](../skills/dotnet-best-practices/SKILL.md)<br />`gh skills install github/awesome-copilot dotnet-best-practices` | Ensure .NET/C# code meets best practices for the solution/project. | None |
| [dotnet-design-pattern-review](../skills/dotnet-design-pattern-review/SKILL.md)<br />`gh skills install github/awesome-copilot dotnet-design-pattern-review` | Review the C#/.NET code for design pattern implementation and suggest improvements. | None |
Expand Down
79 changes: 79 additions & 0 deletions skills/docker-healthcheck-generator/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
name: docker-healthcheck-generator
description: 'Generate Docker HEALTHCHECK instructions and docker-compose healthcheck blocks tailored to the services detected in the project. Use when the user asks to add healthchecks to a Dockerfile or compose file, mentions container health, readiness of dependent services, or wants depends_on with service_healthy conditions.'
license: MIT
---

# Docker Healthcheck Generator

Generate correct, lightweight healthchecks for Dockerfiles and docker-compose services based on the technology detected in the repository.

## When to Use This Skill

Use this skill when you need to:
- Add a `HEALTHCHECK` instruction to an existing Dockerfile
- Add `healthcheck` blocks to services in `docker-compose.yml`
- Make `depends_on` wait for a dependency to be truly ready (`condition: service_healthy`)
- Diagnose containers stuck in `unhealthy` or `starting` state

## Workflow

1. **Detect the stack**: inspect Dockerfiles, compose files, and app code to identify each service (web app, database, cache, queue).
2. **Pick the lightest probe**: prefer a purpose-built client command over installing new tools; avoid `curl`/`wget` if the base image does not ship them.
3. **Generate the healthcheck** with sensible timings: `interval: 30s`, `timeout: 5s`, `retries: 3`, and `start_period` long enough for boot (10s for caches, 30-60s for JVM apps and databases).
4. **Wire dependencies**: update `depends_on` of consumers to use `condition: service_healthy`.
5. **Explain each choice** briefly so the user can tune values.

## Reference Probes by Service

| Service | Probe command |
|---|---|
| HTTP app (image has curl) | `curl -fsS http://localhost:PORT/health \|\| exit 1` |
| HTTP app (no curl, Node.js) | `node -e "fetch('http://localhost:PORT/health').then(r=>{if(!r.ok)process.exit(1)}).catch(()=>process.exit(1))"` |
| HTTP app (no curl, Python) | `python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:PORT/health').status==200 else 1)"` |
| PostgreSQL | `pg_isready -U $POSTGRES_USER -d $POSTGRES_DB` |
| MySQL/MariaDB | `mysqladmin ping -h localhost -p$MYSQL_ROOT_PASSWORD` |
| Redis | `redis-cli ping \| grep PONG` |
| MongoDB | `mongosh --quiet --eval "db.adminCommand('ping').ok" \| grep 1` |
| RabbitMQ | `rabbitmq-diagnostics -q ping` |
| Kafka | `kafka-broker-api-versions --bootstrap-server localhost:9092` |

## Usage Examples

### Example 1: Dockerfile for a Node.js API

```dockerfile
HEALTHCHECK --interval=30s --timeout=5s --start-period=15s --retries=3 \
CMD node -e "fetch('http://localhost:3000/health').then(r=>{if(!r.ok)process.exit(1)}).catch(()=>process.exit(1))"
```

### Example 2: Compose stack with a database dependency

```yaml
services:
db:
image: postgres:16
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d appdb"]
interval: 30s
timeout: 5s
retries: 3
start_period: 30s
api:
build: .
depends_on:
db:
condition: service_healthy
```

## Guidelines

1. **The app must expose a real health endpoint** - if none exists, offer to create a `/health` route that checks critical dependencies.
2. **Keep probes cheap** - healthchecks run forever; avoid heavy queries or endpoints that cascade to external services.
3. **Use exec-form or CMD-SHELL consistently** - shell operators like `||` require `CMD-SHELL` (compose) or shell form (Dockerfile).
4. **Set start_period generously** - a failing check during boot marks the container unhealthy and can break `service_healthy` dependents.

## Limitations

- Healthchecks in Dockerfiles are ignored by Kubernetes; suggest liveness/readiness probes instead when K8s manifests are present.
- Distroless/scratch images cannot run shell probes; recommend a tiny healthcheck binary or a K8s-native probe.
Loading