Date: 2025-11-15 Issue: Orphaned processes creating LSL files in wrong directories Resolution: PID 78503, 78502, 33998 killed - folder recreation stopped
The coding infrastructure has a Process State Manager (PSM) that tracks running services in .live-process-registry.json, but not all spawned processes are registered, leading to orphaned processes that cannot be monitored or cleaned up.
Root Cause of coding-tmp Issue:
- Enhanced-transcript-monitor (PID 78503, 33998) was spawned WITHOUT project path parameter
- Without project path, it used
process.cwd()which resolved to different directories - These processes were NOT registered in PSM, making them invisible to monitoring systems
- When parent sessions ended, these orphans continued running and creating files
/Users/q284340/Agentic/coding/.live-process-registry.json
{
"version": "3.0.0",
"lastChange": 1763186615258,
"sessions": {},
"services": {
"global": {
"service-name": {
"pid": 12345,
"script": "path/to/script.js",
"type": "global",
"startTime": 1763186615258,
"lastHealthCheck": 1763186615258,
"status": "running",
"metadata": {}
}
},
"projects": {
"/path/to/project": {
"service-name": {
"pid": 67890,
"script": "script.js",
"type": "per-project",
"startTime": 1763186615258,
"lastHealthCheck": 1763186615258,
"status": "running",
"metadata": {}
}
}
}
}
}These processes correctly register with PSM:
-
GlobalServiceCoordinator.startService() (
scripts/global-service-coordinator.js:248)- Spawns services and registers them with PSM
- Proper cleanup on shutdown
- Health checking enabled
-
enhanced-transcript-monitor.js (
scripts/enhanced-transcript-monitor.js:2130)- Registers itself via
this.processStateManager.registerService() - Unregisters on shutdown (line 2284)
- Registers itself via
-
global-lsl-coordinator.js (
scripts/global-lsl-coordinator.js:309)- Registers itself via
this.processStateManager.registerService() - Unregisters on shutdown (line 355)
- Registers itself via
-
system-monitor-watchdog.js (
scripts/system-monitor-watchdog.js:161)- Registers spawned coordinator via PSM
-
start-services-robust.js (
scripts/start-services-robust.js:350)- Registers services via PSM
These wrappers are registered by their parent, but their CHILDREN are not:
-
dashboard-service.js (
scripts/dashboard-service.js:32)const child = spawn('npm', ['run', 'dev'], { ... });
- Wrapper PID is registered by GlobalServiceCoordinator
- BUT: npm spawns child processes (Next.js) that are NOT tracked
- Gap: Child process tree invisible to PSM
-
api-service.js (
scripts/api-service.js:33)const child = spawn('node', [API_SERVER_PATH], { ... });
- Wrapper PID registered, but spawned API server not explicitly tracked
- Relies on parent wrapper for lifecycle
-
combined-status-line-wrapper.js (
scripts/combined-status-line-wrapper.js:19)const child = spawn('node', [join(__dirname, 'combined-status-line.js')], { ... });
- Wrapper registered, child not explicitly tracked
-
tool-interaction-hook-wrapper.js (
scripts/tool-interaction-hook-wrapper.js:19)- Similar wrapper pattern without child registration
These spawn processes WITHOUT PSM registration:
-
combined-status-line.js (
scripts/combined-status-line.js:1014)const monitor = spawn('node', [monitorScript], { ... });
- Spawns enhanced-transcript-monitor dynamically
- CRITICAL: No PSM registration
- Result: Orphaned processes like PID 78503, 33998
-
health-prompt-hook.js (
scripts/health-prompt-hook.js:130)const child = spawn('node', [VERIFIER_SCRIPT, '--auto-heal'], { ... });
- Spawns verifier without registration
- Short-lived process, but still untracked
-
process-state-manager.js (
scripts/process-state-manager.js:466)const lsof = spawn('lsof', [levelDBLockPath]);
- Utility spawn for diagnostics
- Not a long-running service, acceptable to not register
combined-status-line.jsspawns monitors based on runtime detection- These spawned processes don't register themselves
- No parent-child relationship tracking
- Wrappers like
dashboard-service.jsspawn npm/node - Those children spawn more processes (Next.js, webpack, etc.)
- Only the wrapper PID is tracked
- When wrapper dies, orphaned children may remain
- Processes spawned without explicit project path use
process.cwd() - Can resolve to wrong directories (e.g., coding-tmp instead of coding)
- LSL files created in wrong locations
- When parent process crashes/exits unexpectedly
- Children not automatically cleaned up
- Accumulation of orphaned processes
/Users/q284340/Agentic/coding/.global-lsl-registry.json
Current State:
{
"version": "1.0.0",
"lastUpdated": 1763186616057,
"projects": {
"coding": {
"projectPath": "/Users/q284340/Agentic/coding",
"monitorPid": 51568,
"startTime": 1763186616056,
"parentPid": null,
"lastHealthCheck": 1763186616056,
"status": "active",
"exchanges": 0
}
},
"coordinator": {
"pid": 34855,
"startTime": 1763186162997,
"healthCheckInterval": 30000
}
}Problem:
- Dead processes (78503, 33998) were never in this registry
- They were spawned by untracked mechanisms
- No health checking or cleanup possible
-
Add PSM Registration to All Dynamic Spawns
- Modify
combined-status-line.jsto register spawned monitors - Add registration wrapper function for all spawn() calls
- Modify
-
Implement Child Process Tree Tracking
- Track PIDs of all child processes
- Use
psor/procto discover child tree - Add cleanup logic for entire process tree
-
Mandatory Project Path for LSL Monitors
- Never allow enhanced-transcript-monitor to run without explicit project path
- Add validation that fails startup if project path missing
- Remove fallback to
process.cwd()
-
Add Dead Process Cleanup Cron
- Regular scan for processes matching coding patterns
- Cross-reference with PSM registry
- Kill orphans not in registry (with safety checks)
-
Process Manager as Single Source of Truth
- All spawns MUST go through PSM
- No direct spawn() calls outside PSM
- Centralized process lifecycle management
-
Health Monitoring for All Services
- Regular heartbeat checks
- Auto-restart on failure
- Alerting on repeated failures
-
Graceful Shutdown Protocol
- SIGTERM handlers in all services
- Proper cleanup and unregistration
- Timeout-based SIGKILL fallback
-
Process Hierarchy Visualization
- Tool to show process tree
- Identify parent-child relationships
- Highlight orphans and zombies
scripts/combined-status-line.js- Add PSM registration for spawned monitorsscripts/enhanced-transcript-monitor.js- Enforce project path requirementscripts/global-service-coordinator.js- Track child process trees
scripts/dashboard-service.js- Register child PIDsscripts/api-service.js- Register child PIDsscripts/health-prompt-hook.js- Add short-lived process tracking
- Create PSM developer guide
- Document spawn() best practices
- Add process troubleshooting runbook
Current Purpose:
- Track running services (global and per-project)
- Enable health checks
- Facilitate graceful shutdown
- Prevent duplicate service instances
Methods:
registerService(serviceInfo)- Add service to registryunregisterService(name, type, context)- Remove serviceisServiceRunning(name, type, context)- Check if service alivecleanupDeadProcesses()- Remove entries for dead PIDsgetHealthStatus()- Get comprehensive system health
Access Pattern:
import ProcessStateManager from './process-state-manager.js';
const psm = new ProcessStateManager();
await psm.initialize();
// Register service
await psm.registerService({
name: 'my-service',
pid: process.pid,
type: 'global',
script: 'my-service.js',
metadata: { custom: 'data' }
});
// Check if running
const running = await psm.isServiceRunning('my-service', 'global');
// Unregister on shutdown
await psm.unregisterService('my-service', 'global');Issue: Duplicate transcript monitor instances were running simultaneously for the same project, bypassing the singleton check that should prevent this.
Root Cause:
In scripts/enhanced-transcript-monitor.js:2123, the singleton check was calling ProcessStateManager's getService() method with incorrect parameters:
// BEFORE (BROKEN):
const existingService = await this.processStateManager.getService(serviceName, 'per-project', projectPath);The third parameter was passing projectPath as a string, but ProcessStateManager.getService() expects it as part of a context object:
// From ProcessStateManager.js:184-200
async getService(name, type, context = {}) {
if (type === 'per-project' && context.projectPath) { // ← Expects context.projectPath!
const projectServices = registry.services.projects[context.projectPath];
return projectServices ? projectServices[name] || null : null;
}
}Because context.projectPath was always undefined, the singleton check always returned null, allowing duplicate instances.
Fix Applied: Changed line 2123 to pass an object instead of a string:
// AFTER (CORRECT):
const existingService = await this.processStateManager.getService(serviceName, 'per-project', { projectPath });Verification: Tested by attempting to start duplicate instances - now properly rejects with error message:
❌ Another instance of enhanced-transcript-monitor is already running for project coding
PID: 41120, Started: 2025-11-19T05:52:57.324Z
To fix: Kill the existing instance with: kill 41120
Impact:
- Prevents duplicate transcript monitor instances per project
- Ensures proper singleton enforcement for per-project services
- Reduces orphaned processes and resource waste
Related Issue: This was identified after user reported duplicate ProcessStateManager instances running from previous day's session still active alongside new session's instance.
Problem: Orphaned node processes from previous sessions accumulate over time, consuming resources and potentially causing conflicts. These include:
- Transcript monitors without valid project paths
- Stuck ukb/vkb operations that never completed
- Orphaned qdrant-sync processes
- Old shell snapshot processes
Solution: Created automated cleanup utility that intelligently detects and removes orphaned processes.
Implementation:
-
Cleanup Script (
scripts/cleanup-orphaned-processes.js)- Detects processes matching patterns:
vkb,ukb,enhanced-transcript,sync-graph,zsh snapshots - Validates transcript monitors have absolute paths to existing directories
- Identifies stuck operations (running longer than expected)
- Provides
--dry-runmode for safety
- Detects processes matching patterns:
-
Wrapper Command (
bin/cleanup-orphans)- Convenient executable wrapper
- Pass-through for all arguments
Usage:
# Preview what would be cleaned
./bin/cleanup-orphans --dry-run
# Clean up orphaned processes
./bin/cleanup-orphansDetection Logic:
// Transcript monitors: Only valid if project path exists
if (command.includes('enhanced-transcript-monitor.js')) {
const projectPath = extractProjectPath(command);
if (!existsSync(projectPath)) {
return { kill: true, reason: 'Invalid project path' };
}
}
// ukb/vkb operations: Stuck if still running (except vkb server)
if (command.includes('ukb-database/cli.js') || command.includes('vkb-cli.js')) {
if (!command.includes('server start')) {
return { kill: true, reason: 'Stuck ukb/vkb operation' };
}
}
// Qdrant sync and shell snapshots: Always orphans if found
if (command.includes('sync-graph-to-qdrant.js')) {
return { kill: true, reason: 'Orphaned qdrant sync' };
}Example Output:
🧹 Orphaned Process Cleanup
═══════════════════════════
Found 4 potentially orphaned process(es)
Cleaning up 2 orphaned process(es):
✅ Killed PID 78503: Invalid transcript monitor (path does not exist)
✅ Killed PID 41120: Invalid transcript monitor (no path argument)
✅ Cleanup complete: 2/2 processes cleaned up
Integration:
The cleanup utility complements the automatic pre-startup cleanup in start-services-robust.js by providing:
- Manual cleanup between sessions
- Dry-run capability for safety
- More granular control over what gets killed
- Detailed reporting of cleanup actions
Files:
scripts/cleanup-orphaned-processes.js- Main cleanup logic (186 lines)bin/cleanup-orphans- Executable wrapper
The Process State Manager infrastructure exists but is incompletely implemented. Critical gaps in process registration led to orphaned processes creating LSL files in incorrect locations.
Immediate Fix Applied:
- Killed orphaned PIDs: 78503, 78502, 33998
- Verified folder recreation stopped
Required Next Steps:
- Enforce PSM registration for ALL spawned processes
- Implement child process tree tracking
- Add project path validation to LSL monitors
- Create automated orphan detection and cleanup
Critical Rule:
🚨 NEVER spawn a process without registering it in PSM 🚨