diff --git a/README.md b/README.md index 432c6ff..f4a862e 100644 --- a/README.md +++ b/README.md @@ -289,6 +289,24 @@ All operations can be run in "dry-run" mode first: - Maintains accurate infrastructure state - Detects drift from expected configuration +### v2.0 Enhanced Safety Layer +- **Simulation Modes**: Full simulation, validation-only, or disabled +- **Validation Framework**: Parameter validation, dependency validation, quota checks +- **Audit Logging**: Complete operation audit trail +- **Dry-Run Decorator**: Transparent operation wrapping for safety + +### v2.0 Plugin Architecture +- **Dynamic Registration**: Register and unregister resource handlers at runtime +- **Middleware Chain**: Cross-cutting concerns (logging, recovery, metrics, timeouts) +- **Hook System**: Before/after execution callbacks +- **Lifecycle Management**: Dependency-aware initialization and health monitoring + +### v2.0 Enhanced Reasoning +- **Multi-Step Verification**: Pre-flight, dependency, safety, and impact phases +- **Risk Assessment**: Automatic risk level evaluation with mitigations +- **Rollback Planning**: Each operation includes rollback strategies +- **Cost Estimation**: Automatic cost calculation for planned resources + ### Contributing 1. Fork the repository @@ -402,6 +420,14 @@ agent: - ✅ ReAct Agent - ✅ Better UX/UI +### v2.0 Enhancements (Latest) +- ✅ Plugin Architecture for resource handlers +- ✅ Safety Layer with simulation mode and validation +- ✅ Enhanced reasoning with multi-step verification +- ✅ Dry-run decorator pattern +- ✅ Audit logging framework +- ✅ Cost estimation integration + ### Upcoming Features (v0.1.*) - 🔄 Cost optimization recommendations - 🔄 Enhanced conflict resolution diff --git a/docs/architecture/architecture-overview.md b/docs/architecture/architecture-overview.md index 6edce8a..c455945 100644 --- a/docs/architecture/architecture-overview.md +++ b/docs/architecture/architecture-overview.md @@ -345,6 +345,291 @@ The testing architecture includes a sophisticated mock system that provides real - Secure state file storage with restricted access - Comprehensive audit logging for compliance requirements +## Plugin Architecture (v2.0) + +### Overview + +The Plugin Architecture provides a modular, extensible system for managing AWS resource handlers. It enables dynamic registration, lifecycle management, and middleware chaining for all resource operations. + +### Core Components + +#### Plugin Registry + +The central registry manages all resource handlers: + +```go +type PluginRegistry struct { + handlers map[string]ResourceHandler + metadata map[string]PluginMetadata + hooks map[string][]PluginHook + middlewares map[string][]Middleware +} +``` + +**Key Features:** +- Dynamic plugin registration and unregistration +- Thread-safe operations with RWMutex +- Hook system for before/after execution callbacks +- Middleware chaining for cross-cutting concerns + +#### Resource Handler Interface + +All resource handlers implement a unified interface: + +```go +type ResourceHandler interface { + Plugin + Create(ctx context.Context, params interface{}) (*types.AWSResource, error) + List(ctx context.Context) ([]*types.AWSResource, error) + Get(ctx context.Context, id string) (*types.AWSResource, error) + Update(ctx context.Context, id string, params interface{}) (*types.AWSResource, error) + Delete(ctx context.Context, id string) error + ExecuteSpecial(ctx context.Context, operation string, params interface{}) (*types.AWSResource, error) + ValidateParams(operation string, params interface{}) error +} +``` + +#### Lifecycle Management + +The LifecycleManager handles plugin initialization order based on dependencies: + +```go +type LifecycleManager struct { + registry *PluginRegistry + statuses map[string]*PluginStatus + healthTick time.Duration +} +``` + +**Features:** +- Topological sorting for dependency-aware initialization +- Continuous health monitoring +- Graceful shutdown with error handling +- Hot reload support for development + +### Built-in Middleware + +| Middleware | Purpose | +|------------|---------| +| `LoggingMiddleware` | Operation logging with duration tracking | +| `RecoveryMiddleware` | Panic recovery with stack traces | +| `MetricsMiddleware` | Performance metrics collection | +| `TimeoutMiddleware` | Configurable operation timeouts | + +### Usage Example + +```go +registry := plugins.NewPluginRegistry(logger) +lifecycle := plugins.NewLifecycleManager(registry, logger) + +// Register plugin with middleware +registry.Register(ec2Handler, &PluginConfig{ + AWSClient: awsClient, + Logger: logger, +}) + +registry.AddMiddleware("ec2-instance", plugins.LoggingMiddleware(logger)) +registry.AddMiddleware("ec2-instance", plugins.TimeoutMiddleware(30*time.Second)) + +// Execute with hooks and middleware +result, err := registry.ExecuteWithHooks(ctx, "ec2-instance", "create", params) +``` + +## Safety Layer Architecture (v2.0) + +### Overview + +The Safety Layer provides comprehensive protection mechanisms including simulation mode, validation, and dry-run capabilities. It ensures safe infrastructure operations through multiple validation checkpoints. + +### Core Components + +#### Simulator + +The Simulator enables safe testing of infrastructure operations: + +```go +type Simulator struct { + config SimulationConfig + validators map[string]SafetyValidator + rules map[string][]ValidationRule +} +``` + +**Simulation Modes:** +| Mode | Description | +|------|-------------| +| `SimulationDisabled` | Normal execution without simulation | +| `SimulationDryRun` | Full simulation with no actual changes | +| `SimulationValidate` | Validation only, no resource changes | + +**Simulation Results:** +```go +type SimulationResult struct { + Success bool + WouldCreate []*types.AWSResource + WouldModify []*types.AWSResource + WouldDelete []string + ValidationErrors []ValidationError + Warnings []string + EstimatedCost *CostEstimate +} +``` + +#### Dry-Run Decorator + +The decorator pattern wraps operations for safety: + +```go +type DryRunDecorator struct { + simulator *Simulator + auditLog *AuditLog + config DryRunConfig +} +``` + +**Features:** +- Transparent dry-run mode activation +- Pre-execution validation +- Audit trail generation +- Blocked operation enforcement + +#### Validation Layer + +Multi-level validation for infrastructure operations: + +```go +type ValidationLayer struct { + validators map[string]ResourceValidator + rules map[string][]ValidationRule +} +``` + +**Validation Levels:** +- **Error**: Blocks execution +- **Warning**: Logs but allows execution +- **Info**: Informational only + +**Built-in Validators:** +| Validator | Scope | +|-----------|-------| +| `ValidatorRegistry` | Parameter validation with rules | +| `DependencyValidator` | Dependency order verification | +| `QuotaValidator` | Resource quota enforcement | + +### Validation Rules + +Pre-configured rules for common AWS resources: + +```yaml +ec2-instance: + - imageId: required + - instanceType: required, pattern("^[a-z][0-9][a-z]?\\.[a-z0-9]+$") + +rds-instance: + - engine: required + - dbInstanceClass: required + - allocatedStorage: min(20) + +vpc: + - cidrBlock: required, cidr +``` + +### Usage Example + +```go +simulator := safety.NewSimulator(logger, safety.SimulationConfig{ + Mode: safety.SimulationDryRun, + EnableCostEstimate: true, + EnableQuotaCheck: true, +}) + +decorator := safety.NewDryRunDecorator(logger, simulator, safety.DryRunConfig{ + EnabledByDefault: true, + RequireConfirmation: true, + AuditEnabled: true, +}) + +// Wrap operation with dry-run support +safeOperation := decorator.Decorate(createInstance, &safety.ExecutionContext{ + Operation: "create", + ResourceType: "ec2-instance", + DryRun: true, +}) + +result, err := safeOperation(ctx, params) +``` + +## Enhanced Reasoning System (v2.0) + +### Overview + +The reasoning system has been enhanced with multi-step verification protocols, providing structured safety checks at every stage of infrastructure planning. + +### Verification Phases + +#### Phase 1: Pre-Flight Validation +1. Parse and validate user intent +2. Identify all required AWS resources +3. Check MANAGED RESOURCES section +4. Verify tool availability +5. Assess operation risk level + +#### Phase 2: Dependency Resolution +1. Map resource dependencies +2. Identify cross-service dependencies +3. Verify prerequisite resources +4. Detect circular dependencies +5. Generate execution order + +#### Phase 3: Safety Validation +1. Validate parameter completeness +2. Check naming conflicts +3. Verify CIDR availability +4. Validate security group rules +5. Check IAM requirements + +#### Phase 4: Impact Assessment +1. Identify resources to be created +2. Identify resources to be modified +3. Identify affected resources +4. Estimate cost implications +5. Assess rollback complexity + +### Enhanced Output Format + +```json +{ + "action": "create_infrastructure", + "reasoning": "Explanation with verification approach", + "riskAssessment": { + "level": "medium", + "factors": ["New VPC creation", "Multiple subnets"], + "mitigations": ["Rollback plan included"] + }, + "verificationSummary": { + "preFlightChecks": ["VPC exists", "Subnet available"], + "dependencyOrder": ["VPC", "Subnet", "SecurityGroup", "Instance"], + "postFlightChecks": ["Instance running", "Correct subnet"] + }, + "executionPlan": [{ + "id": "step-create-instance", + "verification": { + "preConditions": ["subnetId exists"], + "postConditions": ["instanceId returned", "state is running"], + "rollbackStrategy": "terminate-instance" + } + }] +} +``` + +### Rollback Considerations + +Each operation includes rollback planning: +- Can this operation be easily undone? +- What resources depend on this resource? +- Is there a rollback path if subsequent steps fail? +- Should this be split into multiple plans? + ## Conclusion The AI Infrastructure Agent represents a sophisticated approach to infrastructure automation, combining the power of Large Language Models with robust engineering practices. The system's layered architecture provides separation of concerns while maintaining flexibility and extensibility. @@ -358,4 +643,10 @@ Key architectural strengths include: - **Performance**: Optimized for concurrent operations and resource efficiency - **Security**: Multiple security layers with defense-in-depth approach +The v2.0 enhancements add: + +- **Plugin Architecture**: Dynamic, middleware-enabled resource handlers +- **Safety Layer**: Comprehensive simulation and validation framework +- **Enhanced Reasoning**: Multi-step verification with rollback planning + The system is designed to scale from proof-of-concept deployments to production environments while maintaining the simplicity of natural language infrastructure management. diff --git a/pkg/plugins/interfaces.go b/pkg/plugins/interfaces.go new file mode 100644 index 0000000..1b3b17e --- /dev/null +++ b/pkg/plugins/interfaces.go @@ -0,0 +1,71 @@ +package plugins + +import ( + "context" + + "github.com/versus-control/ai-infrastructure-agent/pkg/types" +) + +type PluginMetadata struct { + Name string + Version string + Description string + ResourceType string + SupportedOps []string + Dependencies []string + Priority int + Author string +} + +type ResourceHandler interface { + Plugin + Create(ctx context.Context, params interface{}) (*types.AWSResource, error) + List(ctx context.Context) ([]*types.AWSResource, error) + Get(ctx context.Context, id string) (*types.AWSResource, error) + Update(ctx context.Context, id string, params interface{}) (*types.AWSResource, error) + Delete(ctx context.Context, id string) error + ExecuteSpecial(ctx context.Context, operation string, params interface{}) (*types.AWSResource, error) + ValidateParams(operation string, params interface{}) error + GetSupportedOperations() []string +} + +type Plugin interface { + Initialize(config *PluginConfig) error + Shutdown() error + GetMetadata() PluginMetadata + HealthCheck(ctx context.Context) error +} + +type PluginConfig struct { + AWSClient interface{} + Logger interface{} + Config map[string]interface{} +} + +type PluginHook interface { + BeforeExecute(ctx context.Context, operation string, params interface{}) (context.Context, error) + AfterExecute(ctx context.Context, operation string, params interface{}, result interface{}, err error) error + OnFailure(ctx context.Context, operation string, params interface{}, err error) error +} + +type CompositePlugin struct { + Metadata PluginMetadata + Handler ResourceHandler + Hooks []PluginHook + Middlewares []Middleware +} + +type Middleware func(next ExecuteFunc) ExecuteFunc + +type ExecuteFunc func(ctx context.Context, operation string, params interface{}) (*types.AWSResource, error) + +type PluginEvent struct { + Type string + Plugin string + Timestamp int64 + Data interface{} +} + +type PluginEventListener interface { + OnPluginEvent(event PluginEvent) +} diff --git a/pkg/plugins/lifecycle.go b/pkg/plugins/lifecycle.go new file mode 100644 index 0000000..18ca6cb --- /dev/null +++ b/pkg/plugins/lifecycle.go @@ -0,0 +1,338 @@ +package plugins + +import ( + "context" + "fmt" + "reflect" + "runtime" + "sync" + "time" + + "github.com/versus-control/ai-infrastructure-agent/internal/logging" + "github.com/versus-control/ai-infrastructure-agent/pkg/types" +) + +type PluginState string + +const ( + StateUninitialized PluginState = "uninitialized" + StateInitializing PluginState = "initializing" + StateReady PluginState = "ready" + StateRunning PluginState = "running" + StateError PluginState = "error" + StateShuttingDown PluginState = "shutting_down" + StateStopped PluginState = "stopped" +) + +type PluginStatus struct { + Name string + State PluginState + LastHealth time.Time + Error error + Metadata PluginMetadata +} + +type LifecycleManager struct { + registry *PluginRegistry + logger *logging.Logger + statuses map[string]*PluginStatus + mu sync.RWMutex + stopCh chan struct{} + healthTick time.Duration +} + +func NewLifecycleManager(registry *PluginRegistry, logger *logging.Logger) *LifecycleManager { + return &LifecycleManager{ + registry: registry, + logger: logger, + statuses: make(map[string]*PluginStatus), + stopCh: make(chan struct{}), + healthTick: 30 * time.Second, + } +} + +func (lm *LifecycleManager) InitializeAll(configs map[string]*PluginConfig) error { + lm.mu.Lock() + defer lm.mu.Unlock() + + plugins := lm.registry.ListPlugins() + order := lm.resolveInitializationOrder(plugins) + + for _, metadata := range order { + resourceType := metadata.ResourceType + lm.statuses[resourceType] = &PluginStatus{ + Name: metadata.Name, + State: StateInitializing, + } + + config := configs[resourceType] + if config == nil { + config = &PluginConfig{Config: make(map[string]interface{})} + } + + if err := lm.registry.Register(nil, config); err != nil { + lm.statuses[resourceType].State = StateError + lm.statuses[resourceType].Error = err + lm.logger.WithError(err).WithField("plugin", metadata.Name).Error("Failed to initialize plugin") + continue + } + + lm.statuses[resourceType].State = StateReady + lm.statuses[resourceType].Metadata = metadata + lm.logger.WithField("plugin", metadata.Name).Info("Plugin initialized successfully") + } + + return nil +} + +func (lm *LifecycleManager) resolveInitializationOrder(plugins []PluginMetadata) []PluginMetadata { + graph := make(map[string][]string) + inDegree := make(map[string]int) + + for _, p := range plugins { + graph[p.ResourceType] = p.Dependencies + inDegree[p.ResourceType] = 0 + } + + for _, p := range plugins { + for _, dep := range p.Dependencies { + inDegree[p.ResourceType]++ + } + } + + var queue []PluginMetadata + for _, p := range plugins { + if inDegree[p.ResourceType] == 0 { + queue = append(queue, p) + } + } + + var result []PluginMetadata + for len(queue) > 0 { + current := queue[0] + queue = queue[1:] + result = append(result, current) + + for _, p := range plugins { + for _, dep := range p.Dependencies { + if dep == current.ResourceType { + inDegree[p.ResourceType]-- + if inDegree[p.ResourceType] == 0 { + queue = append(queue, p) + } + } + } + } + } + + return result +} + +func (lm *LifecycleManager) StartHealthMonitoring(ctx context.Context) { + ticker := time.NewTicker(lm.healthTick) + defer ticker.Stop() + + for { + select { + case <-ctx.Done(): + return + case <-lm.stopCh: + return + case <-ticker.C: + lm.performHealthChecks(ctx) + } + } +} + +func (lm *LifecycleManager) performHealthChecks(ctx context.Context) { + results := lm.registry.HealthCheck(ctx) + + lm.mu.Lock() + defer lm.mu.Unlock() + + for resourceType, err := range results { + if status, exists := lm.statuses[resourceType]; exists { + status.LastHealth = time.Now() + if err != nil { + status.State = StateError + status.Error = err + } else if status.State == StateError { + status.State = StateReady + status.Error = nil + } + } + } +} + +func (lm *LifecycleManager) ShutdownAll() error { + lm.mu.Lock() + defer lm.mu.Unlock() + + close(lm.stopCh) + + var errors []error + for resourceType, status := range lm.statuses { + if status.State == StateReady || status.State == StateRunning { + status.State = StateShuttingDown + if err := lm.registry.Unregister(resourceType); err != nil { + errors = append(errors, err) + status.State = StateError + status.Error = err + } else { + status.State = StateStopped + } + } + } + + if len(errors) > 0 { + return fmt.Errorf("errors during shutdown: %v", errors) + } + return nil +} + +func (lm *LifecycleManager) GetStatus(resourceType string) (*PluginStatus, error) { + lm.mu.RLock() + defer lm.mu.RUnlock() + + status, exists := lm.statuses[resourceType] + if !exists { + return nil, fmt.Errorf("no status for resource type: %s", resourceType) + } + return status, nil +} + +func (lm *LifecycleManager) GetAllStatuses() map[string]*PluginStatus { + lm.mu.RLock() + defer lm.mu.RUnlock() + + result := make(map[string]*PluginStatus) + for k, v := range lm.statuses { + result[k] = v + } + return result +} + +func (lm *LifecycleManager) ReloadPlugin(resourceType string, config *PluginConfig) error { + lm.mu.Lock() + defer lm.mu.Unlock() + + if status, exists := lm.statuses[resourceType]; exists { + if status.State == StateRunning { + return fmt.Errorf("cannot reload running plugin: %s", resourceType) + } + } + + if err := lm.registry.Unregister(resourceType); err != nil { + lm.logger.WithError(err).Debug("Plugin not registered, proceeding with reload") + } + + lm.statuses[resourceType] = &PluginStatus{ + Name: resourceType, + State: StateInitializing, + } + + if config == nil { + config = &PluginConfig{Config: make(map[string]interface{})} + } + + if err := lm.registry.Register(nil, config); err != nil { + lm.statuses[resourceType].State = StateError + lm.statuses[resourceType].Error = err + return err + } + + lm.statuses[resourceType].State = StateReady + return nil +} + +func CreateChainMiddleware(middlewares ...Middleware) Middleware { + return func(next ExecuteFunc) ExecuteFunc { + for i := len(middlewares) - 1; i >= 0; i-- { + next = middlewares[i](next) + } + return next + } +} + +func LoggingMiddleware(logger *logging.Logger) Middleware { + return func(next ExecuteFunc) ExecuteFunc { + return func(ctx context.Context, operation string, params interface{}) (*types.AWSResource, error) { + start := time.Now() + result, err := next(ctx, operation, params) + duration := time.Since(start) + + logger.WithFields(map[string]interface{}{ + "operation": operation, + "duration_ms": duration.Milliseconds(), + "has_error": err != nil, + "params_type": reflect.TypeOf(params).String(), + }).Info("Operation executed") + + return result, err + } + } +} + +func RecoveryMiddleware(logger *logging.Logger) Middleware { + return func(next ExecuteFunc) ExecuteFunc { + return func(ctx context.Context, operation string, params interface{}) (result *types.AWSResource, err error) { + defer func() { + if r := recover(); r != nil { + buf := make([]byte, 4096) + n := runtime.Stack(buf, false) + err = fmt.Errorf("panic recovered: %v\n%s", r, buf[:n]) + logger.WithField("stack", string(buf[:n])).Error("Panic in operation") + } + }() + return next(ctx, operation, params) + } + } +} + +func MetricsMiddleware() Middleware { + metrics := make(map[string]int64) + var mu sync.Mutex + + return func(next ExecuteFunc) ExecuteFunc { + return func(ctx context.Context, operation string, params interface{}) (*types.AWSResource, error) { + start := time.Now() + result, err := next(ctx, operation, params) + duration := time.Since(start).Microseconds() + + mu.Lock() + metrics[operation] = duration + mu.Unlock() + + return result, err + } + } +} + +func TimeoutMiddleware(timeout time.Duration) Middleware { + return func(next ExecuteFunc) ExecuteFunc { + return func(ctx context.Context, operation string, params interface{}) (*types.AWSResource, error) { + ctx, cancel := context.WithTimeout(ctx, timeout) + defer cancel() + + resultCh := make(chan struct { + result *types.AWSResource + err error + }) + + go func() { + result, err := next(ctx, operation, params) + resultCh <- struct { + result *types.AWSResource + err error + }{result, err} + }() + + select { + case <-ctx.Done(): + return nil, fmt.Errorf("operation %s timed out after %v", operation, timeout) + case res := <-resultCh: + return res.result, res.err + } + } + } +} diff --git a/pkg/plugins/registry.go b/pkg/plugins/registry.go new file mode 100644 index 0000000..0173772 --- /dev/null +++ b/pkg/plugins/registry.go @@ -0,0 +1,253 @@ +package plugins + +import ( + "context" + "fmt" + "sort" + "sync" + + "github.com/versus-control/ai-infrastructure-agent/internal/logging" +) + +type PluginRegistry struct { + mu sync.RWMutex + handlers map[string]ResourceHandler + metadata map[string]PluginMetadata + hooks map[string][]PluginHook + middlewares map[string][]Middleware + logger *logging.Logger + initialized bool +} + +func NewPluginRegistry(logger *logging.Logger) *PluginRegistry { + return &PluginRegistry{ + handlers: make(map[string]ResourceHandler), + metadata: make(map[string]PluginMetadata), + hooks: make(map[string][]PluginHook), + middlewares: make(map[string][]Middleware), + logger: logger, + initialized: false, + } +} + +func (r *PluginRegistry) Register(plugin ResourceHandler, config *PluginConfig) error { + r.mu.Lock() + defer r.mu.Unlock() + + metadata := plugin.GetMetadata() + resourceType := metadata.ResourceType + + if _, exists := r.handlers[resourceType]; exists { + return fmt.Errorf("plugin already registered for resource type: %s", resourceType) + } + + if err := plugin.Initialize(config); err != nil { + return fmt.Errorf("failed to initialize plugin %s: %w", metadata.Name, err) + } + + r.handlers[resourceType] = plugin + r.metadata[resourceType] = metadata + r.hooks[resourceType] = make([]PluginHook, 0) + r.middlewares[resourceType] = make([]Middleware, 0) + + r.logger.WithFields(map[string]interface{}{ + "plugin_name": metadata.Name, + "resource_type": resourceType, + "version": metadata.Version, + "supported_ops": metadata.SupportedOps, + }).Info("Plugin registered successfully") + + return nil +} + +func (r *PluginRegistry) Unregister(resourceType string) error { + r.mu.Lock() + defer r.mu.Unlock() + + plugin, exists := r.handlers[resourceType] + if !exists { + return fmt.Errorf("no plugin registered for resource type: %s", resourceType) + } + + if err := plugin.Shutdown(); err != nil { + r.logger.WithError(err).Warn("Plugin shutdown encountered error") + } + + delete(r.handlers, resourceType) + delete(r.metadata, resourceType) + delete(r.hooks, resourceType) + delete(r.middlewares, resourceType) + + r.logger.WithField("resource_type", resourceType).Info("Plugin unregistered") + return nil +} + +func (r *PluginRegistry) GetHandler(resourceType string) (ResourceHandler, error) { + r.mu.RLock() + defer r.mu.RUnlock() + + handler, exists := r.handlers[resourceType] + if !exists { + return nil, fmt.Errorf("no handler for resource type: %s", resourceType) + } + return handler, nil +} + +func (r *PluginRegistry) GetMetadata(resourceType string) (PluginMetadata, error) { + r.mu.RLock() + defer r.mu.RUnlock() + + metadata, exists := r.metadata[resourceType] + if !exists { + return PluginMetadata{}, fmt.Errorf("no metadata for resource type: %s", resourceType) + } + return metadata, nil +} + +func (r *PluginRegistry) ListPlugins() []PluginMetadata { + r.mu.RLock() + defer r.mu.RUnlock() + + plugins := make([]PluginMetadata, 0, len(r.metadata)) + for _, m := range r.metadata { + plugins = append(plugins, m) + } + + sort.Slice(plugins, func(i, j int) bool { + return plugins[i].Priority < plugins[j].Priority + }) + + return plugins +} + +func (r *PluginRegistry) AddHook(resourceType string, hook PluginHook) error { + r.mu.Lock() + defer r.mu.Unlock() + + if _, exists := r.handlers[resourceType]; !exists { + return fmt.Errorf("no plugin registered for resource type: %s", resourceType) + } + + r.hooks[resourceType] = append(r.hooks[resourceType], hook) + return nil +} + +func (r *PluginRegistry) AddMiddleware(resourceType string, middleware Middleware) error { + r.mu.Lock() + defer r.mu.Unlock() + + if _, exists := r.handlers[resourceType]; !exists { + return fmt.Errorf("no plugin registered for resource type: %s", resourceType) + } + + r.middlewares[resourceType] = append(r.middlewares[resourceType], middleware) + return nil +} + +func (r *PluginRegistry) ExecuteWithHooks( + ctx context.Context, + resourceType string, + operation string, + params interface{}, +) (interface{}, error) { + r.mu.RLock() + handler, handlerExists := r.handlers[resourceType] + hooks := r.hooks[resourceType] + middlewares := r.middlewares[resourceType] + r.mu.RUnlock() + + if !handlerExists { + return nil, fmt.Errorf("no handler for resource type: %s", resourceType) + } + + for _, hook := range hooks { + var err error + ctx, err = hook.BeforeExecute(ctx, operation, params) + if err != nil { + return nil, fmt.Errorf("before hook failed: %w", err) + } + } + + executeFunc := r.buildExecuteFunc(handler, operation) + + for i := len(middlewares) - 1; i >= 0; i-- { + executeFunc = middlewares[i](executeFunc) + } + + result, err := executeFunc(ctx, operation, params) + + for _, hook := range hooks { + if hookErr := hook.AfterExecute(ctx, operation, params, result, err); hookErr != nil { + r.logger.WithError(hookErr).Warn("After hook failed") + } + } + + if err != nil { + for _, hook := range hooks { + if hookErr := hook.OnFailure(ctx, operation, params, err); hookErr != nil { + r.logger.WithError(hookErr).Warn("OnFailure hook failed") + } + } + } + + return result, err +} + +func (r *PluginRegistry) buildExecuteFunc(handler ResourceHandler, _ string) ExecuteFunc { + return func(ctx context.Context, operation string, params interface{}) (*types.AWSResource, error) { + switch operation { + case "create": + return handler.Create(ctx, params) + case "list": + resources, err := handler.List(ctx) + if err != nil { + return nil, err + } + if len(resources) > 0 { + return resources[0], nil + } + return nil, nil + case "get": + if id, ok := params.(string); ok { + return handler.Get(ctx, id) + } + return nil, fmt.Errorf("invalid params for get operation") + case "update": + if p, ok := params.(map[string]interface{}); ok { + if id, exists := p["id"].(string); exists { + return handler.Update(ctx, id, p) + } + } + return nil, fmt.Errorf("invalid params for update operation") + case "delete": + if id, ok := params.(string); ok { + return nil, handler.Delete(ctx, id) + } + return nil, fmt.Errorf("invalid params for delete operation") + default: + return handler.ExecuteSpecial(ctx, operation, params) + } + } +} + +func (r *PluginRegistry) HealthCheck(ctx context.Context) map[string]error { + r.mu.RLock() + defer r.mu.RUnlock() + + results := make(map[string]error) + for resourceType, handler := range r.handlers { + results[resourceType] = handler.HealthCheck(ctx) + } + return results +} + +func (r *PluginRegistry) GetSupportedResourceTypes() []string { + r.mu.RLock() + defer r.mu.RUnlock() + + types := make([]string, 0, len(r.handlers)) + for resourceType := range r.handlers { + types = append(types, resourceType) + } + return types +} diff --git a/pkg/safety/dryrun_decorator.go b/pkg/safety/dryrun_decorator.go new file mode 100644 index 0000000..07ea606 --- /dev/null +++ b/pkg/safety/dryrun_decorator.go @@ -0,0 +1,485 @@ +package safety + +import ( + "context" + "fmt" + "reflect" + "sync" + "time" + + "github.com/versus-control/ai-infrastructure-agent/internal/logging" + "github.com/versus-control/ai-infrastructure-agent/pkg/types" +) + +type ExecutionContext struct { + Operation string + ResourceType string + Params interface{} + DryRun bool + SimulatedOnly bool + StartTime time.Time + CorrelationID string +} + +type OperationFunc func(ctx context.Context, params interface{}) (*types.AWSResource, error) + +type DryRunDecorator struct { + logger *logging.Logger + simulator *Simulator + auditLog *AuditLog + config DryRunConfig + mu sync.RWMutex +} + +type DryRunConfig struct { + EnabledByDefault bool + RequireConfirmation bool + AllowedOperations []string + BlockedOperations []string + AuditEnabled bool +} + +func NewDryRunDecorator(logger *logging.Logger, simulator *Simulator, config DryRunConfig) *DryRunDecorator { + return &DryRunDecorator{ + logger: logger, + simulator: simulator, + auditLog: NewAuditLog(logger), + config: config, + } +} + +func (d *DryRunDecorator) Decorate(operation OperationFunc, execCtx *ExecutionContext) OperationFunc { + return func(ctx context.Context, params interface{}) (*types.AWSResource, error) { + if execCtx == nil { + execCtx = &ExecutionContext{ + DryRun: d.config.EnabledByDefault, + StartTime: time.Now(), + } + } + + if d.isBlocked(execCtx.Operation) { + return nil, fmt.Errorf("operation %s is blocked by safety policy", execCtx.Operation) + } + + if execCtx.DryRun || d.simulator.IsSimulationEnabled() { + return d.executeDryRun(ctx, operation, params, execCtx) + } + + if err := d.preExecuteValidation(ctx, execCtx, params); err != nil { + return nil, fmt.Errorf("pre-execution validation failed: %w", err) + } + + d.auditLog.Record(AuditEntry{ + CorrelationID: execCtx.CorrelationID, + Operation: execCtx.Operation, + ResourceType: execCtx.ResourceType, + Params: params, + DryRun: false, + Status: "started", + Timestamp: time.Now(), + }) + + result, err := operation(ctx, params) + + status := "completed" + if err != nil { + status = "failed" + } + + d.auditLog.Record(AuditEntry{ + CorrelationID: execCtx.CorrelationID, + Operation: execCtx.Operation, + ResourceType: execCtx.ResourceType, + Params: params, + Result: result, + Error: err, + DryRun: false, + Status: status, + Duration: time.Since(execCtx.StartTime), + Timestamp: time.Now(), + }) + + return result, err + } +} + +func (d *DryRunDecorator) executeDryRun(ctx context.Context, operation OperationFunc, params interface{}, execCtx *ExecutionContext) (*types.AWSResource, error) { + d.logger.WithFields(map[string]interface{}{ + "operation": execCtx.Operation, + "resourceType": execCtx.ResourceType, + "dryRun": true, + }).Info("Executing in dry-run mode") + + var simResult *SimulationResult + var simErr error + + switch execCtx.Operation { + case "create": + simResult, simErr = d.simulator.SimulateCreate(ctx, execCtx.ResourceType, params) + case "update": + var id string + if m, ok := params.(map[string]interface{}); ok { + id, _ = m["id"].(string) + } + simResult, simErr = d.simulator.SimulateUpdate(ctx, execCtx.ResourceType, id, params) + case "delete": + var id string + if m, ok := params.(map[string]interface{}); ok { + id, _ = m["id"].(string) + } else { + id, _ = params.(string) + } + simResult, simErr = d.simulator.SimulateDelete(ctx, execCtx.ResourceType, id) + default: + simResult, simErr = d.simulator.SimulateSpecial(ctx, execCtx.ResourceType, execCtx.Operation, params) + } + + if simErr != nil { + return nil, fmt.Errorf("simulation failed: %w", simErr) + } + + d.auditLog.Record(AuditEntry{ + CorrelationID: execCtx.CorrelationID, + Operation: execCtx.Operation, + ResourceType: execCtx.ResourceType, + Params: params, + SimResult: simResult, + DryRun: true, + Status: "simulated", + Duration: time.Since(execCtx.StartTime), + Timestamp: time.Now(), + }) + + if !simResult.Success { + return nil, fmt.Errorf("dry-run validation failed: %s", simResult.Message) + } + + return &types.AWSResource{ + ID: fmt.Sprintf("dry-run-%s", execCtx.CorrelationID), + Type: execCtx.ResourceType, + State: "simulated", + Region: "dry-run", + Details: map[string]interface{}{ + "simulation_result": simResult, + "dry_run": true, + }, + }, nil +} + +func (d *DryRunDecorator) preExecuteValidation(ctx context.Context, execCtx *ExecutionContext, params interface{}) error { + errors, valid := d.simulator.ValidateBeforeExecute(execCtx.Operation, execCtx.ResourceType, params) + if !valid { + var errMsgs []string + for _, e := range errors { + if e.Level == ValidationErrorLevel { + errMsgs = append(errMsgs, fmt.Sprintf("%s: %s", e.Field, e.Message)) + } + } + return fmt.Errorf("validation errors: %v", errMsgs) + } + + for _, e := range errors { + if e.Level == ValidationWarningLevel { + d.logger.WithFields(map[string]interface{}{ + "field": e.Field, + "message": e.Message, + "rule": e.Rule, + }).Warn("Validation warning") + } + } + + return nil +} + +func (d *DryRunDecorator) isBlocked(operation string) bool { + d.mu.RLock() + defer d.mu.RUnlock() + + for _, blocked := range d.config.BlockedOperations { + if blocked == operation { + return true + } + } + return false +} + +func (d *DryRunDecorator) SetDryRunMode(enabled bool) { + d.mu.Lock() + defer d.mu.Unlock() + d.config.EnabledByDefault = enabled + d.logger.WithField("dry_run_enabled", enabled).Info("Dry-run mode updated") +} + +func (d *DryRunDecorator) GetAuditLog() []AuditEntry { + return d.auditLog.GetEntries() +} + +type AuditEntry struct { + CorrelationID string + Operation string + ResourceType string + Params interface{} + Result interface{} + SimResult *SimulationResult + Error error + DryRun bool + Status string + Duration time.Duration + Timestamp time.Time +} + +type AuditLog struct { + logger *logging.Logger + entries []AuditEntry + mu sync.RWMutex + maxSize int +} + +func NewAuditLog(logger *logging.Logger) *AuditLog { + return &AuditLog{ + logger: logger, + entries: make([]AuditEntry, 0), + maxSize: 10000, + } +} + +func (a *AuditLog) Record(entry AuditEntry) { + a.mu.Lock() + defer a.mu.Unlock() + + if len(a.entries) >= a.maxSize { + a.entries = a.entries[1:] + } + + a.entries = append(a.entries, entry) + + a.logger.WithFields(map[string]interface{}{ + "correlation_id": entry.CorrelationID, + "operation": entry.Operation, + "resource_type": entry.ResourceType, + "dry_run": entry.DryRun, + "status": entry.Status, + "duration_ms": entry.Duration.Milliseconds(), + }).Info("Audit record") +} + +func (a *AuditLog) GetEntries() []AuditEntry { + a.mu.RLock() + defer a.mu.RUnlock() + + result := make([]AuditEntry, len(a.entries)) + copy(result, a.entries) + return result +} + +func (a *AuditLog) GetEntriesByOperation(operation string) []AuditEntry { + a.mu.RLock() + defer a.mu.RUnlock() + + var result []AuditEntry + for _, entry := range a.entries { + if entry.Operation == operation { + result = append(result, entry) + } + } + return result +} + +func (a *AuditLog) GetEntriesByCorrelationID(correlationID string) []AuditEntry { + a.mu.RLock() + defer a.mu.RUnlock() + + var result []AuditEntry + for _, entry := range a.entries { + if entry.CorrelationID == correlationID { + result = append(result, entry) + } + } + return result +} + +type ValidationLayer struct { + logger *logging.Logger + validators map[string]ResourceValidator + rules map[string][]ValidationRule + mu sync.RWMutex +} + +type ResourceValidator func(ctx context.Context, params interface{}) []ValidationError + +func NewValidationLayer(logger *logging.Logger) *ValidationLayer { + return &ValidationLayer{ + logger: logger, + validators: make(map[string]ResourceValidator), + rules: make(map[string][]ValidationRule), + } +} + +func (v *ValidationLayer) RegisterValidator(resourceType string, validator ResourceValidator) { + v.mu.Lock() + defer v.mu.Unlock() + v.validators[resourceType] = validator +} + +func (v *ValidationLayer) AddRule(resourceType string, rule ValidationRule) { + v.mu.Lock() + defer v.mu.Unlock() + v.rules[resourceType] = append(v.rules[resourceType], rule) +} + +func (v *ValidationLayer) Validate(ctx context.Context, resourceType string, operation string, params interface{}) (*ValidationResult, error) { + v.mu.RLock() + validator, hasValidator := v.validators[resourceType] + rules := v.rules[resourceType] + v.mu.RUnlock() + + result := &ValidationResult{ + Valid: true, + Errors: make([]ValidationError, 0), + Warnings: make([]ValidationError, 0), + } + + if hasValidator { + errors := validator(ctx, params) + for _, e := range errors { + if e.Level == ValidationErrorLevel { + result.Errors = append(result.Errors, e) + } else { + result.Warnings = append(result.Warnings, e) + } + } + } + + for _, rule := range rules { + if err := v.applyRule(rule, params); err != nil { + if err.Level == ValidationErrorLevel { + result.Errors = append(result.Errors, *err) + } else { + result.Warnings = append(result.Warnings, *err) + } + } + } + + result.Valid = len(result.Errors) == 0 + + if !result.Valid { + return result, fmt.Errorf("validation failed with %d errors", len(result.Errors)) + } + + return result, nil +} + +type ValidationResult struct { + Valid bool + Errors []ValidationError + Warnings []ValidationError +} + +func (v *ValidationLayer) applyRule(rule ValidationRule, params interface{}) *ValidationError { + val := reflect.ValueOf(params) + + if val.Kind() == reflect.Map { + if m, ok := params.(map[string]interface{}); ok { + value, exists := m[rule.Field] + + switch rule.RuleType { + case "required": + if !exists || value == nil || value == "" { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s is required", rule.Field), + Rule: rule.Name, + Level: rule.Level, + } + } + case "enum": + if allowed, ok := rule.Params["values"].([]string); ok { + strVal, _ := value.(string) + found := false + for _, a := range allowed { + if a == strVal { + found = true + break + } + } + if !found && exists { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s must be one of: %v", rule.Field, allowed), + Rule: rule.Name, + Level: rule.Level, + } + } + } + case "range": + min, hasMin := rule.Params["min"].(float64) + max, hasMax := rule.Params["max"].(float64) + if numVal, ok := value.(float64); ok && exists { + if hasMin && numVal < min { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s must be >= %v", rule.Field, min), + Rule: rule.Name, + Level: rule.Level, + } + } + if hasMax && numVal > max { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s must be <= %v", rule.Field, max), + Rule: rule.Name, + Level: rule.Level, + } + } + } + } + } + } + + return nil +} + +func (v *ValidationLayer) ValidatePlan(plan []*types.ExecutionPlanStep) (*PlanValidationResult, error) { + result := &PlanValidationResult{ + Valid: true, + StepResults: make(map[string]*ValidationResult), + GlobalErrors: make([]ValidationError, 0), + } + + seenResources := make(map[string]bool) + for _, step := range plan { + if step.ResourceID != "" { + if seenResources[step.ResourceID] { + result.GlobalErrors = append(result.GlobalErrors, ValidationError{ + Field: step.ID, + Message: fmt.Sprintf("Duplicate resource ID: %s", step.ResourceID), + Rule: "unique_resource_id", + Level: ValidationErrorLevel, + }) + } + seenResources[step.ResourceID] = true + } + + stepResult, _ := v.Validate(context.Background(), step.ResourceID, step.Action, step.ToolParameters) + result.StepResults[step.ID] = stepResult + if !stepResult.Valid { + result.Valid = false + } + } + + if len(result.GlobalErrors) > 0 { + result.Valid = false + } + + if !result.Valid { + return result, fmt.Errorf("plan validation failed") + } + + return result, nil +} + +type PlanValidationResult struct { + Valid bool + StepResults map[string]*ValidationResult + GlobalErrors []ValidationError +} diff --git a/pkg/safety/simulation.go b/pkg/safety/simulation.go new file mode 100644 index 0000000..5fb362f --- /dev/null +++ b/pkg/safety/simulation.go @@ -0,0 +1,567 @@ +package safety + +import ( + "context" + "fmt" + "reflect" + "strings" + "sync" + "time" + + "github.com/versus-control/ai-infrastructure-agent/internal/logging" + "github.com/versus-control/ai-infrastructure-agent/pkg/types" +) + +type SimulationMode string + +const ( + SimulationDisabled SimulationMode = "disabled" + SimulationDryRun SimulationMode = "dry_run" + SimulationValidate SimulationMode = "validate_only" +) + +type SimulationResult struct { + Success bool + Message string + WouldCreate []*types.AWSResource + WouldModify []*types.AWSResource + WouldDelete []string + ValidationErrors []ValidationError + Warnings []string + Duration time.Duration + EstimatedCost *CostEstimate +} + +type ValidationError struct { + Field string + Message string + Rule string + Level ValidationLevel +} + +type ValidationLevel string + +const ( + ValidationErrorLevel ValidationLevel = "error" + ValidationWarningLevel ValidationLevel = "warning" + ValidationInfoLevel ValidationLevel = "info" +) + +type CostEstimate struct { + Hourly float64 + Monthly float64 + Currency string + Items []CostItem +} + +type CostItem struct { + Resource string + Type string + HourlyCost float64 + MonthlyCost float64 +} + +type OperationSimulator interface { + SimulateCreate(ctx context.Context, resourceType string, params interface{}) (*SimulationResult, error) + SimulateUpdate(ctx context.Context, resourceType string, id string, params interface{}) (*SimulationResult, error) + SimulateDelete(ctx context.Context, resourceType string, id string) (*SimulationResult, error) + SimulateSpecial(ctx context.Context, resourceType string, operation string, params interface{}) (*SimulationResult, error) +} + +type SafetyValidator interface { + ValidateParams(operation string, params interface{}) []ValidationError + ValidateDependencies(operation string, params interface{}, state map[string]interface{}) []ValidationError + ValidateQuotas(ctx context.Context, resourceType string, params interface{}) []ValidationError + ValidateNaming(resourceType string, name string) []ValidationError +} + +type SimulationConfig struct { + Mode SimulationMode + EnableCostEstimate bool + EnableQuotaCheck bool + EnableDependencyCheck bool + MaxWarnings int + StrictMode bool +} + +type Simulator struct { + logger *logging.Logger + config SimulationConfig + validators map[string]SafetyValidator + rules map[string][]ValidationRule + mu sync.RWMutex +} + +type ValidationRule struct { + Name string + Field string + RuleType string + Params map[string]interface{} + Level ValidationLevel + Description string +} + +func NewSimulator(logger *logging.Logger, config SimulationConfig) *Simulator { + return &Simulator{ + logger: logger, + config: config, + validators: make(map[string]SafetyValidator), + rules: make(map[string][]ValidationRule), + } +} + +func (s *Simulator) RegisterValidator(resourceType string, validator SafetyValidator) { + s.mu.Lock() + defer s.mu.Unlock() + s.validators[resourceType] = validator +} + +func (s *Simulator) AddRule(resourceType string, rule ValidationRule) { + s.mu.Lock() + defer s.mu.Unlock() + s.rules[resourceType] = append(s.rules[resourceType], rule) +} + +func (s *Simulator) SimulateCreate(ctx context.Context, resourceType string, params interface{}) (*SimulationResult, error) { + start := time.Now() + result := &SimulationResult{ + Success: true, + WouldCreate: make([]*types.AWSResource, 0), + WouldModify: make([]*types.AWSResource, 0), + WouldDelete: make([]string, 0), + ValidationErrors: make([]ValidationError, 0), + Warnings: make([]string, 0), + } + + if s.config.Mode == SimulationDisabled { + return result, nil + } + + s.mu.RLock() + validator, hasValidator := s.validators[resourceType] + rules := s.rules[resourceType] + s.mu.RUnlock() + + if hasValidator { + result.ValidationErrors = append(result.ValidationErrors, validator.ValidateParams("create", params)...) + } + + for _, rule := range rules { + if err := s.applyRule(rule, params); err != nil { + result.ValidationErrors = append(result.ValidationErrors, *err) + } + } + + if len(result.ValidationErrors) > 0 { + for _, e := range result.ValidationErrors { + if e.Level == ValidationErrorLevel { + result.Success = false + break + } + } + } + + simulatedResource := s.createSimulatedResource(resourceType, params) + result.WouldCreate = append(result.WouldCreate, simulatedResource) + + if s.config.EnableCostEstimate { + result.EstimatedCost = s.estimateCost(resourceType, params) + } + + result.Duration = time.Since(start) + result.Message = s.buildSimulationMessage(result) + + return result, nil +} + +func (s *Simulator) SimulateUpdate(ctx context.Context, resourceType string, id string, params interface{}) (*SimulationResult, error) { + start := time.Now() + result := &SimulationResult{ + Success: true, + WouldCreate: make([]*types.AWSResource, 0), + WouldModify: make([]*types.AWSResource, 0), + WouldDelete: make([]string, 0), + ValidationErrors: make([]ValidationError, 0), + Warnings: make([]string, 0), + } + + if s.config.Mode == SimulationDisabled { + return result, nil + } + + s.mu.RLock() + validator, hasValidator := s.validators[resourceType] + s.mu.RUnlock() + + if hasValidator { + result.ValidationErrors = append(result.ValidationErrors, validator.ValidateParams("update", params)...) + } + + if id == "" { + result.ValidationErrors = append(result.ValidationErrors, ValidationError{ + Field: "id", + Message: "Resource ID is required for update operation", + Rule: "required", + Level: ValidationErrorLevel, + }) + } + + if len(result.ValidationErrors) > 0 { + for _, e := range result.ValidationErrors { + if e.Level == ValidationErrorLevel { + result.Success = false + break + } + } + } + + modifiedResource := &types.AWSResource{ + ID: id, + Type: resourceType, + State: "would_be_modified", + Region: "simulated", + } + result.WouldModify = append(result.WouldModify, modifiedResource) + + result.Duration = time.Since(start) + result.Message = s.buildSimulationMessage(result) + + return result, nil +} + +func (s *Simulator) SimulateDelete(ctx context.Context, resourceType string, id string) (*SimulationResult, error) { + start := time.Now() + result := &SimulationResult{ + Success: true, + WouldCreate: make([]*types.AWSResource, 0), + WouldModify: make([]*types.AWSResource, 0), + WouldDelete: make([]string, 0), + ValidationErrors: make([]ValidationError, 0), + Warnings: make([]string, 0), + } + + if s.config.Mode == SimulationDisabled { + return result, nil + } + + if id == "" { + result.ValidationErrors = append(result.ValidationErrors, ValidationError{ + Field: "id", + Message: "Resource ID is required for delete operation", + Rule: "required", + Level: ValidationErrorLevel, + }) + result.Success = false + } else { + result.WouldDelete = append(result.WouldDelete, id) + result.Warnings = append(result.Warnings, + fmt.Sprintf("Resource %s of type %s would be permanently deleted", id, resourceType)) + } + + result.Duration = time.Since(start) + result.Message = s.buildSimulationMessage(result) + + return result, nil +} + +func (s *Simulator) SimulateSpecial(ctx context.Context, resourceType string, operation string, params interface{}) (*SimulationResult, error) { + start := time.Now() + result := &SimulationResult{ + Success: true, + WouldCreate: make([]*types.AWSResource, 0), + WouldModify: make([]*types.AWSResource, 0), + WouldDelete: make([]string, 0), + ValidationErrors: make([]ValidationError, 0), + Warnings: make([]string, 0), + } + + if s.config.Mode == SimulationDisabled { + return result, nil + } + + s.mu.RLock() + validator, hasValidator := s.validators[resourceType] + s.mu.RUnlock() + + if hasValidator { + result.ValidationErrors = append(result.ValidationErrors, validator.ValidateParams(operation, params)...) + } + + switch operation { + case "start": + result.Message = "Resource would be started" + case "stop": + result.Message = "Resource would be stopped" + result.Warnings = append(result.Warnings, "Stopping may cause service interruption") + case "reboot": + result.Message = "Resource would be rebooted" + result.Warnings = append(result.Warnings, "Reboot will cause temporary unavailability") + default: + result.Message = fmt.Sprintf("Special operation '%s' would be executed", operation) + } + + if len(result.ValidationErrors) > 0 { + for _, e := range result.ValidationErrors { + if e.Level == ValidationErrorLevel { + result.Success = false + break + } + } + } + + result.Duration = time.Since(start) + return result, nil +} + +func (s *Simulator) ValidateBeforeExecute(operation string, resourceType string, params interface{}) ([]ValidationError, bool) { + errors := make([]ValidationError, 0) + + s.mu.RLock() + validator, hasValidator := s.validators[resourceType] + rules := s.rules[resourceType] + s.mu.RUnlock() + + if hasValidator { + errors = append(errors, validator.ValidateParams(operation, params)...) + } + + for _, rule := range rules { + if err := s.applyRule(rule, params); err != nil { + errors = append(errors, *err) + } + } + + hasBlockingErrors := false + for _, e := range errors { + if e.Level == ValidationErrorLevel { + hasBlockingErrors = true + break + } + } + + return errors, !hasBlockingErrors +} + +func (s *Simulator) applyRule(rule ValidationRule, params interface{}) *ValidationError { + v := reflect.ValueOf(params) + if v.Kind() == reflect.Ptr { + v = v.Elem() + } + + if v.Kind() == reflect.Struct { + field := v.FieldByName(rule.Field) + if !field.IsValid() { + if rule.RuleType == "required" { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s is required", rule.Field), + Rule: rule.Name, + Level: rule.Level, + } + } + return nil + } + + switch rule.RuleType { + case "required": + if field.IsZero() { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s is required", rule.Field), + Rule: rule.Name, + Level: rule.Level, + } + } + case "min_length": + if minLen, ok := rule.Params["min"].(int); ok { + if field.Len() < minLen { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s must be at least %d characters", rule.Field, minLen), + Rule: rule.Name, + Level: rule.Level, + } + } + } + case "max_length": + if maxLen, ok := rule.Params["max"].(int); ok { + if field.Len() > maxLen { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s must be at most %d characters", rule.Field, maxLen), + Rule: rule.Name, + Level: rule.Level, + } + } + } + case "pattern": + if pattern, ok := rule.Params["pattern"].(string); ok { + if !strings.Contains(field.String(), pattern) { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s does not match required pattern", rule.Field), + Rule: rule.Name, + Level: rule.Level, + } + } + } + } + } + + if m, ok := params.(map[string]interface{}); ok { + value, exists := m[rule.Field] + + switch rule.RuleType { + case "required": + if !exists || value == nil || value == "" { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s is required", rule.Field), + Rule: rule.Name, + Level: rule.Level, + } + } + case "min": + if minVal, ok := rule.Params["min"].(float64); ok { + if numVal, ok := value.(float64); ok && numVal < minVal { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s must be at least %v", rule.Field, minVal), + Rule: rule.Name, + Level: rule.Level, + } + } + } + case "max": + if maxVal, ok := rule.Params["max"].(float64); ok { + if numVal, ok := value.(float64); ok && numVal > maxVal { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s must be at most %v", rule.Field, maxVal), + Rule: rule.Name, + Level: rule.Level, + } + } + } + } + } + + return nil +} + +func (s *Simulator) createSimulatedResource(resourceType string, params interface{}) *types.AWSResource { + resource := &types.AWSResource{ + Type: resourceType, + State: "would_create", + Region: "simulated", + Details: map[string]interface{}{ + "simulated": true, + }, + } + + if m, ok := params.(map[string]interface{}); ok { + if name, exists := m["name"].(string); exists { + resource.ID = name + } else if id, exists := m["instanceId"].(string); exists { + resource.ID = id + } else { + resource.ID = fmt.Sprintf("simulated-%s-%d", resourceType, time.Now().Unix()) + } + + for k, v := range m { + resource.Details[k] = v + } + } + + return resource +} + +func (s *Simulator) estimateCost(resourceType string, params interface{}) *CostEstimate { + estimate := &CostEstimate{ + Currency: "USD", + Items: make([]CostItem, 0), + } + + costTable := map[string]float64{ + "ec2-instance.t2.micro": 0.0116, + "ec2-instance.t2.small": 0.023, + "ec2-instance.t2.medium": 0.0464, + "ec2-instance.t3.micro": 0.0104, + "ec2-instance.t3.small": 0.0208, + "ec2-instance.t3.medium": 0.0416, + "rds-instance.db.t3.micro": 0.022, + "rds-instance.db.t3.small": 0.044, + "rds-instance.db.t3.medium": 0.088, + } + + var instanceType string + if m, ok := params.(map[string]interface{}); ok { + if t, exists := m["instanceType"].(string); exists { + instanceType = t + } else if t, exists := m["dbInstanceClass"].(string); exists { + instanceType = t + } + } + + key := fmt.Sprintf("%s.%s", resourceType, instanceType) + if hourly, exists := costTable[key]; exists { + item := CostItem{ + Resource: resourceType, + Type: instanceType, + HourlyCost: hourly, + MonthlyCost: hourly * 24 * 30, + } + estimate.Items = append(estimate.Items, item) + estimate.Hourly += hourly + estimate.Monthly += hourly * 24 * 30 + } + + return estimate +} + +func (s *Simulator) buildSimulationMessage(result *SimulationResult) string { + var parts []string + + if len(result.WouldCreate) > 0 { + parts = append(parts, fmt.Sprintf("Would create %d resource(s)", len(result.WouldCreate))) + } + if len(result.WouldModify) > 0 { + parts = append(parts, fmt.Sprintf("Would modify %d resource(s)", len(result.WouldModify))) + } + if len(result.WouldDelete) > 0 { + parts = append(parts, fmt.Sprintf("Would delete %d resource(s)", len(result.WouldDelete))) + } + + if len(result.ValidationErrors) > 0 { + errorCount := 0 + for _, e := range result.ValidationErrors { + if e.Level == ValidationErrorLevel { + errorCount++ + } + } + if errorCount > 0 { + parts = append(parts, fmt.Sprintf("Found %d validation error(s)", errorCount)) + } + } + + if len(parts) == 0 { + return "Simulation completed with no changes" + } + + return strings.Join(parts, "; ") +} + +func (s *Simulator) IsSimulationEnabled() bool { + return s.config.Mode != SimulationDisabled +} + +func (s *Simulator) GetMode() SimulationMode { + return s.config.Mode +} + +func (s *Simulator) SetMode(mode SimulationMode) { + s.mu.Lock() + defer s.mu.Unlock() + s.config.Mode = mode + s.logger.WithField("mode", mode).Info("Simulation mode changed") +} diff --git a/pkg/safety/validator.go b/pkg/safety/validator.go new file mode 100644 index 0000000..fc670bc --- /dev/null +++ b/pkg/safety/validator.go @@ -0,0 +1,384 @@ +package safety + +import ( + "context" + "fmt" + "regexp" + "strings" + "sync" + "time" + + "github.com/versus-control/ai-infrastructure-agent/internal/logging" +) + +type ValidatorRegistry struct { + logger *logging.Logger + rules map[string][]ValidationRule + providers map[string]ValidationProvider + mu sync.RWMutex +} + +type ValidationProvider interface { + Validate(ctx context.Context, resourceType string, operation string, params interface{}) []ValidationError + GetRules() []ValidationRule +} + +func NewValidatorRegistry(logger *logging.Logger) *ValidatorRegistry { + vr := &ValidatorRegistry{ + logger: logger, + rules: make(map[string][]ValidationRule), + providers: make(map[string]ValidationProvider), + } + + vr.registerDefaultRules() + return vr +} + +func (vr *ValidatorRegistry) registerDefaultRules() { + ec2Rules := []ValidationRule{ + {Name: "ec2_image_required", Field: "imageId", RuleType: "required", Level: ValidationErrorLevel, Description: "AMI ID is required"}, + {Name: "ec2_instance_type_required", Field: "instanceType", RuleType: "required", Level: ValidationErrorLevel, Description: "Instance type is required"}, + {Name: "ec2_instance_type_valid", Field: "instanceType", RuleType: "pattern", Level: ValidationErrorLevel, Params: map[string]interface{}{"pattern": "^[a-z][0-9][a-z]?\\.[a-z0-9]+$"}, Description: "Invalid instance type format"}, + } + + rdsRules := []ValidationRule{ + {Name: "rds_engine_required", Field: "engine", RuleType: "required", Level: ValidationErrorLevel, Description: "Database engine is required"}, + {Name: "rds_class_required", Field: "dbInstanceClass", RuleType: "required", Level: ValidationErrorLevel, Description: "DB instance class is required"}, + {Name: "rds_identifier_required", Field: "dbInstanceIdentifier", RuleType: "required", Level: ValidationErrorLevel, Description: "DB identifier is required"}, + {Name: "rds_storage_min", Field: "allocatedStorage", RuleType: "min", Level: ValidationWarningLevel, Params: map[string]interface{}{"min": 20.0}, Description: "Storage should be at least 20GB"}, + } + + vpcRules := []ValidationRule{ + {Name: "vpc_cidr_required", Field: "cidrBlock", RuleType: "required", Level: ValidationErrorLevel, Description: "CIDR block is required"}, + {Name: "vpc_cidr_valid", Field: "cidrBlock", RuleType: "cidr", Level: ValidationErrorLevel, Description: "Valid CIDR format required"}, + } + + securityGroupRules := []ValidationRule{ + {Name: "sg_name_required", Field: "groupName", RuleType: "required", Level: ValidationErrorLevel, Description: "Security group name is required"}, + {Name: "sg_vpc_required", Field: "vpcId", RuleType: "required", Level: ValidationErrorLevel, Description: "VPC ID is required"}, + } + + vr.rules["ec2-instance"] = ec2Rules + vr.rules["rds-instance"] = rdsRules + vr.rules["vpc"] = vpcRules + vr.rules["security-group"] = securityGroupRules +} + +func (vr *ValidatorRegistry) RegisterProvider(resourceType string, provider ValidationProvider) { + vr.mu.Lock() + defer vr.mu.Unlock() + vr.providers[resourceType] = provider + + for _, rule := range provider.GetRules() { + vr.rules[resourceType] = append(vr.rules[resourceType], rule) + } +} + +func (vr *ValidatorRegistry) Validate(ctx context.Context, resourceType string, operation string, params interface{}) []ValidationError { + vr.mu.RLock() + rules := vr.rules[resourceType] + provider, hasProvider := vr.providers[resourceType] + vr.mu.RUnlock() + + errors := make([]ValidationError, 0) + + for _, rule := range rules { + if err := vr.applyRule(rule, params); err != nil { + errors = append(errors, *err) + } + } + + if hasProvider { + errors = append(errors, provider.Validate(ctx, resourceType, operation, params)...) + } + + return errors +} + +func (vr *ValidatorRegistry) applyRule(rule ValidationRule, params interface{}) *ValidationError { + m, ok := params.(map[string]interface{}) + if !ok { + return nil + } + + value, exists := m[rule.Field] + + switch rule.RuleType { + case "required": + if !exists || value == nil || value == "" { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s is required", rule.Field), + Rule: rule.Name, + Level: rule.Level, + } + } + + case "pattern": + if !exists || value == nil { + return nil + } + strVal, _ := value.(string) + pattern, _ := rule.Params["pattern"].(string) + if pattern != "" { + matched, err := regexp.MatchString(pattern, strVal) + if err != nil || !matched { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s has invalid format", rule.Field), + Rule: rule.Name, + Level: rule.Level, + } + } + } + + case "cidr": + if !exists || value == nil { + return nil + } + strVal, _ := value.(string) + if !isValidCIDR(strVal) { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s is not a valid CIDR block", rule.Field), + Rule: rule.Name, + Level: rule.Level, + } + } + + case "min": + if !exists || value == nil { + return nil + } + minVal, _ := rule.Params["min"].(float64) + if numVal, ok := value.(float64); ok && numVal < minVal { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s must be at least %v", rule.Field, minVal), + Rule: rule.Name, + Level: rule.Level, + } + } + + case "max": + if !exists || value == nil { + return nil + } + maxVal, _ := rule.Params["max"].(float64) + if numVal, ok := value.(float64); ok && numVal > maxVal { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s must be at most %v", rule.Field, maxVal), + Rule: rule.Name, + Level: rule.Level, + } + } + + case "enum": + if !exists || value == nil { + return nil + } + allowed, _ := rule.Params["values"].([]string) + strVal, _ := value.(string) + found := false + for _, a := range allowed { + if a == strVal { + found = true + break + } + } + if !found { + return &ValidationError{ + Field: rule.Field, + Message: fmt.Sprintf("Field %s must be one of: %v", rule.Field, allowed), + Rule: rule.Name, + Level: rule.Level, + } + } + } + + return nil +} + +func isValidCIDR(cidr string) bool { + parts := strings.Split(cidr, "/") + if len(parts) != 2 { + return false + } + + ipPattern := `^(\d{1,3}\.){3}\d{1,3}$` + matched, _ := regexp.MatchString(ipPattern, parts[0]) + if !matched { + return false + } + + for _, octet := range strings.Split(parts[0], ".") { + var num int + fmt.Sscanf(octet, "%d", &num) + if num < 0 || num > 255 { + return false + } + } + + var prefix int + fmt.Sscanf(parts[1], "%d", &prefix) + if prefix < 0 || prefix > 32 { + return false + } + + return true +} + +func (vr *ValidatorRegistry) AddRule(resourceType string, rule ValidationRule) { + vr.mu.Lock() + defer vr.mu.Unlock() + vr.rules[resourceType] = append(vr.rules[resourceType], rule) +} + +func (vr *ValidatorRegistry) GetRules(resourceType string) []ValidationRule { + vr.mu.RLock() + defer vr.mu.RUnlock() + return vr.rules[resourceType] +} + +type DependencyValidator struct { + logger *logging.Logger +} + +func NewDependencyValidator(logger *logging.Logger) *DependencyValidator { + return &DependencyValidator{logger: logger} +} + +func (dv *DependencyValidator) ValidateDependencies(plan []*PlanStep) []ValidationError { + errors := make([]ValidationError, 0) + resolved := make(map[string]bool) + + for _, step := range plan { + for _, dep := range step.DependsOn { + if !resolved[dep] { + errors = append(errors, ValidationError{ + Field: step.ID, + Message: fmt.Sprintf("Step %s depends on unresolved step %s", step.ID, dep), + Rule: "dependency_order", + Level: ValidationErrorLevel, + }) + } + } + resolved[step.ID] = true + } + + return errors +} + +func (dv *DependencyValidator) DetectCircularDependencies(plan []*PlanStep) []ValidationError { + errors := make([]ValidationError, 0) + graph := make(map[string][]string) + + for _, step := range plan { + graph[step.ID] = step.DependsOn + } + + visited := make(map[string]bool) + recStack := make(map[string]bool) + + var dfs func(node string, path []string) bool + dfs = func(node string, path []string) bool { + visited[node] = true + recStack[node] = true + + for _, neighbor := range graph[node] { + if !visited[neighbor] { + if dfs(neighbor, append(path, neighbor)) { + return true + } + } else if recStack[neighbor] { + errors = append(errors, ValidationError{ + Field: node, + Message: fmt.Sprintf("Circular dependency detected: %v", append(path, neighbor)), + Rule: "circular_dependency", + Level: ValidationErrorLevel, + }) + return true + } + } + + recStack[node] = false + return false + } + + for _, step := range plan { + if !visited[step.ID] { + dfs(step.ID, []string{step.ID}) + } + } + + return errors +} + +type PlanStep struct { + ID string + DependsOn []string +} + +type QuotaValidator struct { + logger *logging.Logger + quotas map[string]ResourceQuota +} + +type ResourceQuota struct { + ResourceType string + MaxCount int + CurrentCount int + LastUpdated time.Time +} + +func NewQuotaValidator(logger *logging.Logger) *QuotaValidator { + return &QuotaValidator{ + logger: logger, + quotas: make(map[string]ResourceQuota), + } +} + +func (qv *QuotaValidator) SetQuota(resourceType string, maxCount int) { + qv.quotas[resourceType] = ResourceQuota{ + ResourceType: resourceType, + MaxCount: maxCount, + LastUpdated: time.Now(), + } +} + +func (qv *QuotaValidator) UpdateCurrentCount(resourceType string, count int) { + if quota, exists := qv.quotas[resourceType]; exists { + quota.CurrentCount = count + quota.LastUpdated = time.Now() + qv.quotas[resourceType] = quota + } +} + +func (qv *QuotaValidator) ValidateQuota(resourceType string, additionalCount int) []ValidationError { + errors := make([]ValidationError, 0) + + quota, exists := qv.quotas[resourceType] + if !exists { + return errors + } + + if quota.CurrentCount+additionalCount > quota.MaxCount { + errors = append(errors, ValidationError{ + Field: resourceType, + Message: fmt.Sprintf("Quota exceeded: %s has max %d, current %d, requesting %d", resourceType, quota.MaxCount, quota.CurrentCount, additionalCount), + Rule: "quota_exceeded", + Level: ValidationErrorLevel, + }) + } + + return errors +} + +func (qv *QuotaValidator) GetQuotaStatus(resourceType string) (*ResourceQuota, error) { + quota, exists := qv.quotas[resourceType] + if !exists { + return nil, fmt.Errorf("no quota configured for resource type: %s", resourceType) + } + return "a, nil +} diff --git a/settings/templates/decision-plan-prompt-optimized.txt b/settings/templates/decision-plan-prompt-optimized.txt index f9bbbf0..82ecef0 100644 --- a/settings/templates/decision-plan-prompt-optimized.txt +++ b/settings/templates/decision-plan-prompt-optimized.txt @@ -1,6 +1,49 @@ -🎯 AWS INFRASTRUCTURE AUTOMATION AGENT +🎯 AWS INFRASTRUCTURE AUTOMATION AGENT v2.0 +Enhanced with Multi-Step Verification and Safety Protocols -You are an expert AWS infrastructure automation agent. Generate executable infrastructure plans using available MCP tools and current infrastructure state. +You are an expert AWS infrastructure automation agent with comprehensive state management capabilities. Generate executable infrastructure plans using available MCP tools and current infrastructure state. + +═══════════════════════════════════════════════════════════════════ +🔒 SAFETY PROTOCOL - MANDATORY VERIFICATION STEPS +═══════════════════════════════════════════════════════════════════ + +Before generating ANY execution plan, you MUST complete these verification steps: + +PHASE 1: PRE-FLIGHT VALIDATION +┌─────────────────────────────────────────────────────────────────┐ +│ □ Step 1.1: Parse and validate user intent │ +│ □ Step 1.2: Identify all required AWS resources │ +│ □ Step 1.3: Check MANAGED RESOURCES section for existing items │ +│ □ Step 1.4: Verify tool availability for requested operations │ +│ □ Step 1.5: Assess operation risk level (low/medium/high) │ +└─────────────────────────────────────────────────────────────────┘ + +PHASE 2: DEPENDENCY RESOLUTION +┌─────────────────────────────────────────────────────────────────┐ +│ □ Step 2.1: Map resource dependencies (VPC → Subnet → Instance) │ +│ □ Step 2.2: Identify cross-service dependencies │ +│ □ Step 2.3: Verify prerequisite resources exist or are planned │ +│ □ Step 2.4: Detect potential circular dependencies │ +│ □ Step 2.5: Generate correct execution order │ +└─────────────────────────────────────────────────────────────────┘ + +PHASE 3: SAFETY VALIDATION +┌─────────────────────────────────────────────────────────────────┐ +│ □ Step 3.1: Validate parameter completeness and types │ +│ □ Step 3.2: Check for naming conflicts │ +│ □ Step 3.3: Verify CIDR block availability (network resources) │ +│ □ Step 3.4: Validate security group rules for conflicts │ +│ □ Step 3.5: Check IAM permissions requirements │ +└─────────────────────────────────────────────────────────────────┘ + +PHASE 4: IMPACT ASSESSMENT +┌─────────────────────────────────────────────────────────────────┐ +│ □ Step 4.1: Identify resources that will be created │ +│ □ Step 4.2: Identify resources that will be modified │ +│ □ Step 4.3: Identify resources that may be affected │ +│ □ Step 4.4: Estimate cost implications │ +│ □ Step 4.5: Assess rollback complexity │ +└─────────────────────────────────────────────────────────────────┘ ═══════════════════════════════════════════════════════════════════ ⚠️ CRITICAL: STATE-AWARE RESOURCE HANDLING @@ -43,7 +86,30 @@ vpc→vpcId, subnet→subnetId, security_group→groupId, ec2_instance→instanc rds_instance→dbInstanceIdentifier, lambda_function→functionArn, s3_bucket→bucketName, load_balancer→loadBalancerArn, target_group→targetGroupArn, iam_role→roleArn -Universal Rule: For ANY resource type, extract primary identifier from [property:value] +═══════════════════════════════════════════════════════════════════ +🔍 MULTI-STEP VERIFICATION PROTOCOL +═══════════════════════════════════════════════════════════════════ + +For each step in your execution plan, include verification checkpoints: + +VERIFICATION_TEMPLATE: +{ + "id": "step-create-ec2", + "verification": { + "preConditions": [ + "subnetId exists and is available", + "securityGroupId exists", + "AMI imageId is valid" + ], + "postConditions": [ + "instanceId is returned", + "instanceState is 'pending' or 'running'", + "instance has correct subnetId" + ], + "rollbackStrategy": "terminate-instance on failure", + "timeout": "300s" + } +} ═══════════════════════════════════════════════════════════════════ 🔑 EXECUTION RULES @@ -70,110 +136,120 @@ Discovery: get-default-{resource}, get-{resource}, list-{resources}, get-latest- Creation: create-{resource} Modification: start-{resource}, stop-{resource}, update-{resource}, modify-{resource}-{attribute} -Examples: get-default-vpc, get-subnet, list-subnets, get-latest-ubuntu-ami, list-subnets-for-alb, create-ec2-instance, start-ec2-instance, stop-ec2-instance, modify-ec2-instance-type, update-auto-scaling-group - ═══════════════════════════════════════════════════════════════════ -🧠 DEPENDENCY ANALYSIS +🧠 DEPENDENCY ANALYSIS (ENHANCED) ═══════════════════════════════════════════════════════════════════ UNIVERSAL DEPENDENCY PRINCIPLES: 1. Check MANAGED RESOURCES first (use directly if exists) 2. Foundation Layer: VPC, Regions, Availability Zones -3. Network Layer: Subnets, Internet Gateways, NAT Gateways, Route Tables, Transit Gateways -4. Security Layer: Security Groups, NACLs, IAM Roles/Policies, KMS Keys -5. Resource Groups: DB Subnet Groups, Cache Subnet Groups, ECS Clusters, EKS Clusters -6. Primary Resources: EC2, Lambda, RDS, S3, ECS Services, EKS Nodes, SageMaker, etc. -7. Configuration Layer: Load Balancer Listeners, Target Groups, Auto Scaling Policies, CloudWatch Alarms +3. Network Layer: Subnets, Internet Gateways, NAT Gateways, Route Tables +4. Security Layer: Security Groups, NACLs, IAM Roles/Policies +5. Resource Groups: DB Subnet Groups, Cache Subnet Groups, ECS Clusters +6. Primary Resources: EC2, Lambda, RDS, S3, ECS Services +7. Configuration Layer: Load Balancer Listeners, Target Groups, Auto Scaling -DEPENDENCY PATTERNS (apply to ANY resource type): +DEPENDENCY PATTERNS: • Network-attached resources → Need: vpcId, subnetId(s), securityGroupIds -• Compute resources → May need: subnetId, imageId/AMI, instanceType, keyPair, userData -• Storage resources → May need: volumeType, size, encryption, KMS key -• Database resources → May need: dbSubnetGroupName, engine, engineVersion, masterUser -• Container resources → May need: clusterName, taskDefinition, serviceRole, executionRole -• Serverless resources → May need: roleArn, runtime, handler, code/package -• Load balanced resources → May need: loadBalancerArn, targetGroupArn, listenerArn -• Multi-AZ resources → Need: Multiple subnetIds in different AZs -• Encrypted resources → May need: kmsKeyId or encryption configuration -• Monitored resources → May need: cloudWatchLogGroup, alarmActions - -GENERAL RULE: Analyze MCP tool parameters to determine dependencies for ANY resource type +• Compute resources → May need: subnetId, imageId/AMI, instanceType, keyPair +• Database resources → May need: dbSubnetGroupName, engine, engineVersion +• Load balanced resources → May need: loadBalancerArn, targetGroupArn ═══════════════════════════════════════════════════════════════════ -📖 COMPLETE EXAMPLE +📖 COMPLETE EXAMPLE WITH VERIFICATION ═══════════════════════════════════════════════════════════════════ -Scenario: Create subnets in managed VPC +Scenario: Create EC2 instance in managed VPC with security group MANAGED RESOURCES shows: - vpc-04aea (vpc): created [vpcId:vpc-04aea, cidrBlock:10.0.0.0/16] +- subnet-a1b2c (subnet): created [subnetId:subnet-a1b2c, vpcId:vpc-04aea, cidrBlock:10.0.1.0/24] -User Request: "Create two public subnets" - -✅ CORRECT PLAN: +✅ CORRECT PLAN with Verification: { "action": "create_infrastructure", - "reasoning": "VPC vpc-04aea exists in MANAGED. Extract vpcId and create subnets directly.", - "confidence": 0.9, + "reasoning": "VPC and subnet exist in MANAGED. Creating security group and EC2 instance with verification checkpoints.", + "confidence": 0.95, + "verificationSteps": { + "preFlight": ["VPC exists", "Subnet exists in correct AZ"], + "postFlight": ["Security group created", "Instance running", "Instance in correct subnet"] + }, "executionPlan": [ { - "id": "step-create-subnet-1", - "action": "create", - "mcpTool": "create-public-subnet", - "toolParameters": { - "vpcId": "vpc-04aea", // From MANAGED - "cidrBlock": "10.0.1.0/24", - "name": "public-subnet-1" - }, - "dependsOn": [] // No dependency - }, - { - "id": "step-create-subnet-2", + "id": "step-create-sg", + "name": "Create Security Group", + "description": "Create security group for web server", "action": "create", - "mcpTool": "create-public-subnet", + "mcpTool": "create-security-group", "toolParameters": { - "vpcId": "vpc-04aea", // From MANAGED - "cidrBlock": "10.0.2.0/24", - "name": "public-subnet-2" + "vpcId": "vpc-04aea", + "groupName": "web-server-sg", + "description": "Security group for web server" }, - "dependsOn": [] - } - ] -} - -❌ WRONG PLAN: -{ - "action": "create_infrastructure", - "reasoning": "Need to discover VPC first", - "confidence": 0.8, - "executionPlan": [ - { - "id": "step-discover-vpc", // WRONG! VPC is MANAGED! - "action": "query", // Don't discover MANAGED resources! - "mcpTool": "list-vpcs" + "dependsOn": [], + "estimatedDuration": "5s", + "riskLevel": "low", + "status": "pending", + "verification": { + "preConditions": ["vpcId:vpc-04aea exists in MANAGED"], + "postConditions": ["groupId returned", "security group in vpc-04aea"], + "rollbackStrategy": "delete-security-group" + } }, { - "id": "step-create-subnet-1", + "id": "step-create-ec2", + "name": "Create EC2 Instance", + "description": "Launch web server instance", "action": "create", - "mcpTool": "create-public-subnet", + "mcpTool": "create-ec2-instance", "toolParameters": { - "vpcId": "{{step-discover-vpc.vpcId}}" // WRONG! Should use "vpc-04aea" directly + "vpcId": "vpc-04aea", + "subnetId": "subnet-a1b2c", + "securityGroupIds": ["{{step-create-sg.groupId}}"], + "imageId": "{{step-get-ami.imageId}}", + "instanceType": "t3.micro", + "name": "web-server-1" }, - "dependsOn": ["step-discover-vpc"] // Unnecessary dependency! + "dependsOn": ["step-get-ami", "step-create-sg"], + "estimatedDuration": "60s", + "riskLevel": "medium", + "status": "pending", + "verification": { + "preConditions": [ + "subnetId:subnet-a1b2c exists", + "securityGroupId from step-create-sg" + ], + "postConditions": [ + "instanceId returned", + "instanceState is 'running'", + "instance in subnet-a1b2c" + ], + "rollbackStrategy": "terminate-ec2-instance" + } } ] } ═══════════════════════════════════════════════════════════════════ -📤 JSON OUTPUT FORMAT +📤 JSON OUTPUT FORMAT (ENHANCED v2.0) ═══════════════════════════════════════════════════════════════════ Return ONLY valid JSON (no markdown): { "action": "create_infrastructure|update_infrastructure|delete_infrastructure|no_action", - "reasoning": "Explain your analysis and which MANAGED resources you're reusing", + "reasoning": "Explain analysis, MANAGED resources reused, and verification approach", "confidence": 0.0-1.0, + "riskAssessment": { + "level": "low|medium|high", + "factors": ["List of risk factors"], + "mitigations": ["List of mitigation strategies"] + }, + "verificationSummary": { + "preFlightChecks": ["List of pre-flight validations performed"], + "dependencyOrder": ["Ordered list of resource creation sequence"], + "postFlightChecks": ["List of post-creation validations required"] + }, "confidenceFactors": { "stateCompleteness": "Assessment of available information", "requirementClarity": "How well-defined the request is", @@ -201,40 +277,60 @@ Return ONLY valid JSON (no markdown): "dependsOn": ["step-ids"], "estimatedDuration": "30s", "riskLevel": "low|medium|high", - "status": "pending" + "status": "pending", + "verification": { + "preConditions": ["What must be true before execution"], + "postConditions": ["What must be true after execution"], + "rollbackStrategy": "How to recover if step fails" + } } ] } ═══════════════════════════════════════════════════════════════════ -✅ VALIDATION CHECKLIST +✅ VALIDATION CHECKLIST (ENHANCED) ═══════════════════════════════════════════════════════════════════ -Before submitting: +Before submitting, verify ALL items: STATE AWARENESS (CRITICAL): □ Checked if "🏗️ MANAGED RESOURCES" section exists □ If section exists: verified each needed resource is NOT in MANAGED □ If resource in MANAGED: extracted [property:value] and used directly -□ If section doesn't exist/empty: proceed with normal discovery/creation □ NO discovery steps for MANAGED resources □ NO dependsOn for MANAGED resource values +VERIFICATION COMPLETENESS: +□ Pre-flight validation completed +□ Dependency resolution completed +□ Safety validation completed +□ Impact assessment completed + STRUCTURE: □ Only "create", "query", or "modify" actions □ All query steps FIRST □ Every {{step-id.field}} has step-id in dependsOn □ No forward references □ Parameters use camelCase +□ Each step has verification block □ Valid JSON only (no markdown) -EXAMPLES TO REMEMBER: -□ VPC in MANAGED [vpcId:vpc-xxx]? → "vpcId":"vpc-xxx" directly, NO discovery -□ Subnet in MANAGED [subnetId:subnet-xxx]? → Use directly, NO discovery -□ Security group in MANAGED [groupId:sg-xxx]? → Use directly, NO discovery -□ RDS in MANAGED [dbInstanceIdentifier:xxx]? → Use directly, NO discovery -□ Lambda in MANAGED [functionArn:arn...]? → Use directly, NO discovery -□ S3 in MANAGED [bucketName:xxx]? → Use directly, NO discovery -□ ANY resource in MANAGED? → Extract [property:value] and use directly! +RISK ASSESSMENT: +□ Identified risk level for each step +□ Provided rollback strategies +□ Estimated durations are realistic +□ Cost implications considered + +═══════════════════════════════════════════════════════════════════ +🔄 ROLLBACK CONSIDERATIONS +═══════════════════════════════════════════════════════════════════ + +For each destructive or complex operation, consider: +1. Can this operation be easily undone? +2. What resources depend on this resource? +3. Is there a rollback path if subsequent steps fail? +4. Should this be split into multiple plans for safety? + +═══════════════════════════════════════════════════════════════════ BEGIN YOUR ANALYSIS AND PROVIDE YOUR JSON RESPONSE: