| layout | default |
|---|---|
| title | Chapter 8: Production Operations |
| nav_order | 8 |
| parent | HAPI Tutorial |
Welcome to Chapter 8: Production Operations. In this part of HAPI Tutorial: Remote Control for Local AI Coding Sessions, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
This chapter closes with production reliability patterns for HAPI hub operations.
- monitor hub uptime and API/SSE health
- track session concurrency and approval latency
- back up and validate SQLite persistence lifecycle
- maintain runbooks for relay/tunnel/auth failures
| Metric | Operational Value |
|---|---|
| active sessions | capacity planning |
| mean approval latency | responsiveness and risk signal |
| failed action relay count | transport/auth quality |
| reconnect frequency | network stability insight |
- restore authenticated connectivity
- protect session state integrity
- communicate impact and expected recovery time
- perform root-cause review and tighten controls
You now have an operational model for running HAPI at production scale with controlled remote agent workflows.
Related:
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for core abstractions in this chapter so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 8: Production Operations as an operating subsystem inside HAPI Tutorial: Remote Control for Local AI Coding Sessions, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around execution and reliability details as your checklist when adapting these patterns to your own repository.
Under the hood, Chapter 8: Production Operations usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
core component. - Input normalization: shape incoming data so
execution layerreceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
state model. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Use the following upstream sources to verify implementation details while reading this chapter:
- HAPI Repository
Why it matters: authoritative reference on
HAPI Repository(github.com). - HAPI Releases
Why it matters: authoritative reference on
HAPI Releases(github.com). - HAPI Docs
Why it matters: authoritative reference on
HAPI Docs(hapi.run).
- Tutorial Index
- Previous Chapter: Chapter 7: Configuration and Security
- Main Catalog
- A-Z Tutorial Directory
The readRunnerState function in cli/src/persistence.ts handles a key part of this chapter's functionality:
* Read runner state from local file
*/
export async function readRunnerState(): Promise<RunnerLocallyPersistedState | null> {
try {
if (!existsSync(configuration.runnerStateFile)) {
return null;
}
const content = await readFile(configuration.runnerStateFile, 'utf-8');
return JSON.parse(content) as RunnerLocallyPersistedState;
} catch (error) {
// State corrupted somehow :(
console.error(`[PERSISTENCE] Runner state file corrupted: ${configuration.runnerStateFile}`, error);
return null;
}
}
/**
* Write runner state to local file (synchronously for atomic operation)
*/
export function writeRunnerState(state: RunnerLocallyPersistedState): void {
writeFileSync(configuration.runnerStateFile, JSON.stringify(state, null, 2), 'utf-8');
}
/**
* Clean up runner state file and lock file
*/
export async function clearRunnerState(): Promise<void> {
if (existsSync(configuration.runnerStateFile)) {
await unlink(configuration.runnerStateFile);
}
// Also clean up lock file if it exists (for stale cleanup)
if (existsSync(configuration.runnerLockFile)) {This function is important because it defines how HAPI Tutorial: Remote Control for Local AI Coding Sessions implements the patterns covered in this chapter.
The writeRunnerState function in cli/src/persistence.ts handles a key part of this chapter's functionality:
* Write runner state to local file (synchronously for atomic operation)
*/
export function writeRunnerState(state: RunnerLocallyPersistedState): void {
writeFileSync(configuration.runnerStateFile, JSON.stringify(state, null, 2), 'utf-8');
}
/**
* Clean up runner state file and lock file
*/
export async function clearRunnerState(): Promise<void> {
if (existsSync(configuration.runnerStateFile)) {
await unlink(configuration.runnerStateFile);
}
// Also clean up lock file if it exists (for stale cleanup)
if (existsSync(configuration.runnerLockFile)) {
try {
await unlink(configuration.runnerLockFile);
} catch {
// Lock file might be held by running runner, ignore error
}
}
}
/**
* Acquire an exclusive lock file for the runner.
* The lock file proves the runner is running and prevents multiple instances.
* Returns the file handle to hold for the runner's lifetime, or null if locked.
*/
export async function acquireRunnerLock(
maxAttempts: number = 5,
delayIncrementMs: number = 200
): Promise<FileHandle | null> {This function is important because it defines how HAPI Tutorial: Remote Control for Local AI Coding Sessions implements the patterns covered in this chapter.
The clearRunnerState function in cli/src/persistence.ts handles a key part of this chapter's functionality:
* Clean up runner state file and lock file
*/
export async function clearRunnerState(): Promise<void> {
if (existsSync(configuration.runnerStateFile)) {
await unlink(configuration.runnerStateFile);
}
// Also clean up lock file if it exists (for stale cleanup)
if (existsSync(configuration.runnerLockFile)) {
try {
await unlink(configuration.runnerLockFile);
} catch {
// Lock file might be held by running runner, ignore error
}
}
}
/**
* Acquire an exclusive lock file for the runner.
* The lock file proves the runner is running and prevents multiple instances.
* Returns the file handle to hold for the runner's lifetime, or null if locked.
*/
export async function acquireRunnerLock(
maxAttempts: number = 5,
delayIncrementMs: number = 200
): Promise<FileHandle | null> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
// 'wx' ensures we only create if it doesn't exist (atomic lock acquisition)
const fileHandle = await open(configuration.runnerLockFile, 'wx');
// Write PID to lock file for debugging
await fileHandle.writeFile(String(process.pid));
return fileHandle;This function is important because it defines how HAPI Tutorial: Remote Control for Local AI Coding Sessions implements the patterns covered in this chapter.
The acquireRunnerLock function in cli/src/persistence.ts handles a key part of this chapter's functionality:
* Returns the file handle to hold for the runner's lifetime, or null if locked.
*/
export async function acquireRunnerLock(
maxAttempts: number = 5,
delayIncrementMs: number = 200
): Promise<FileHandle | null> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
// 'wx' ensures we only create if it doesn't exist (atomic lock acquisition)
const fileHandle = await open(configuration.runnerLockFile, 'wx');
// Write PID to lock file for debugging
await fileHandle.writeFile(String(process.pid));
return fileHandle;
} catch (error: any) {
if (error.code === 'EEXIST') {
// Lock file exists, check if process is still running
try {
const lockPid = readFileSync(configuration.runnerLockFile, 'utf-8').trim();
if (lockPid && !isNaN(Number(lockPid))) {
if (!isProcessAlive(Number(lockPid))) {
// Process doesn't exist, remove stale lock
unlinkSync(configuration.runnerLockFile);
continue; // Retry acquisition
}
}
} catch {
// Can't read lock file, might be corrupted
}
}
if (attempt === maxAttempts) {
return null;This function is important because it defines how HAPI Tutorial: Remote Control for Local AI Coding Sessions implements the patterns covered in this chapter.
flowchart TD
A[readRunnerState]
B[writeRunnerState]
C[clearRunnerState]
D[acquireRunnerLock]
E[releaseRunnerLock]
A --> B
B --> C
C --> D
D --> E