Skip to content

[refactor] Semantic Function Clustering Analysis - Code Organization Improvements #7063

@github-actions

Description

@github-actions

Executive Summary

This analysis examined 331 non-test Go source files across the repository to identify refactoring opportunities through semantic function clustering. The codebase demonstrates generally good organization with well-structured packages, but several opportunities for improvement were identified:

  • 8 validation functions located outside dedicated validation files
  • Multiple parse function clusters scattered across non-parser files (30+ functions in 10+ files)
  • 11 helper files with overlapping responsibilities
  • Similar render/generate function patterns across MCP configuration files (14 render functions)
  • Opportunities to consolidate parsing logic for configuration extraction

The analysis focused on the largest packages: pkg/workflow (175 files) and pkg/cli (114 files), where the most significant refactoring opportunities exist.

Full Analysis Report

Repository Structure Overview

Package Distribution

  • pkg/workflow: 175 files (52.9% of codebase) - Core workflow compilation and execution
  • pkg/cli: 114 files (34.4%) - Command-line interface implementation
  • pkg/parser: 20 files (6.0%) - Parsing and frontmatter extraction
  • pkg/campaign: 8 files (2.4%) - Campaign orchestration
  • Other packages: 14 files (4.3%) - Utilities, logging, console, styles

File Organization Assessment

The repository follows Go best practices with files organized by feature:

Well-Organized Clusters:

  • Create operations: 6 create_*.go files for entity creation (issues, PRs, discussions, etc.)
  • Update operations: 7 update_*.go files for entity updates
  • Compiler modules: 15 compiler_*.go files for workflow compilation stages
  • Engine implementations: 5 *_engine.go files for different AI engines
  • Safe output files: 13 safe_output*.go files for output validation
  • MCP configuration: 24 mcp*.go files for MCP server integration
  • Logs operations: 11 logs_*.go files in CLI for log analysis
  • Update CLI commands: 9 update_*.go files in CLI for update operations

Identified Refactoring Opportunities

1. Validation Functions in Non-Validation Files

Issue: Validation functions scattered outside dedicated validation files

Outliers Found:

File: pkg/workflow/repo_memory.go

  • Function: validateNoDuplicateMemoryIDs
  • Issue: Memory validation function in memory implementation file
  • Recommendation: Move to pkg/workflow/validation.go or create pkg/workflow/repo_memory_validation.go
  • Impact: Improved separation of concerns

File: pkg/workflow/sandbox.go

  • Functions: validateMountsSyntax, validateSandboxConfig
  • Issue: Sandbox validation in main sandbox file
  • Recommendation: Create pkg/workflow/sandbox_validation.go to match pattern of docker_validation.go, npm_validation.go, etc.
  • Impact: Consistent validation file organization across features

Pattern Analysis: The codebase has 22 dedicated validation files following the *_validation.go naming pattern. These outliers break this established convention.


2. Parse Functions Scattered Across Files

Issue: 30+ parse functions distributed across 10+ non-parser files instead of being consolidated

Major Concentrations:

File: pkg/workflow/config_helpers.go (268 lines)
Parse functions:

  • parseLabelsFromConfig
  • parseTitlePrefixFromConfig
  • parseTargetRepoFromConfig
  • parseTargetRepoWithValidation
  • parseParticipantsFromConfig
  • parseAllowedReposFromConfig
  • parseAllowedLabelsFromConfig
  • parseExpiresFromConfig
  • parseRelativeTimeSpec

Analysis: These are config parsing utilities, but the file is named "helpers" rather than "config_parser" which would be more accurate.

File: pkg/workflow/dependabot.go (685 lines)
Parse functions:

  • parseNpmPackage
  • parsePipPackage
  • parseGoPackage

Issue: Package dependency parsing logic embedded in dependabot feature file
Recommendation: Extract to pkg/workflow/package_parsing.go for reusability

File: pkg/workflow/time_delta.go (370 lines)
Parse functions:

  • parseTimeDelta
  • parseTimeDeltaForStopAfter
  • parseTimeDeltaWithMinutes
  • parseAbsoluteDateTime
  • parseRelativeDate

Analysis: This file is well-organized - all time parsing in one place ✅

Other Files with Parse Functions:

  • pkg/workflow/map_helpers.go: parseIntValue
  • pkg/workflow/reactions.go: parseReactionValue
  • pkg/workflow/safe_inputs.go: parseSafeInputsMap
  • pkg/workflow/safe_output_builder.go: parseRequiredLabelsFromConfig, parseRequiredTitlePrefixFromConfig
  • pkg/workflow/safe_outputs_app.go: parseAppConfig
  • pkg/workflow/safe_outputs_config.go: parseMessagesConfig, parseMentionsConfig
  • pkg/workflow/update_entity_helpers.go: parseUpdateEntityBoolField

Recommendation: Consider consolidating related parsing functions:

  1. Config parsing utilities → Rename config_helpers.go to config_parser.go or keep as-is but document purpose
  2. Package parsing → Extract from dependabot.go to dedicated package_parser.go
  3. Generic parsing utilities → Consolidate parseIntValue and similar generic parsers

3. Helper File Proliferation

Issue: 11 different helper files with potentially overlapping responsibilities

Helper Files Inventory:

Workflow Package (10 helper files):

  1. compiler_yaml_helpers.go (102 lines) - YAML generation utilities
  2. compiler_test_helpers.go (69 lines) - Test utilities (appropriate location)
  3. config_helpers.go (268 lines) - Config parsing (largest helper file)
  4. engine_helpers.go (250 lines) - Engine utilities (2nd largest)
  5. git_helpers.go (35 lines) - Git operations
  6. map_helpers.go (43 lines) - Generic map utilities
  7. prompt_step_helper.go (97 lines) - Prompt step utilities
  8. close_entity_helpers.go (183 lines) - Close entity config parsing
  9. update_entity_helpers.go (176 lines) - Update entity config parsing
  10. safe_outputs_env_helpers.go (147 lines) - Safe outputs environment setup

CLI Package (1 helper file):
11. compile_helpers.go - Compilation utilities

Analysis:

Well-Named Helpers (purpose-specific):

  • git_helpers.go - Clear purpose: Git operations
  • map_helpers.go - Clear purpose: Map manipulation utilities
  • compiler_test_helpers.go - Clear purpose: Test utilities
  • prompt_step_helper.go - Clear purpose: Prompt step processing

Poorly Named Helpers (too generic or misleading):

  • ⚠️ config_helpers.go - Actually a config parser, not generic helpers
  • ⚠️ engine_helpers.go - 250 lines, needs investigation of contents
  • ⚠️ close_entity_helpers.go - Actually a parser for close entity configs
  • ⚠️ update_entity_helpers.go - Actually a parser for update entity configs

Recommendation:

  1. Rename misleading helper files to reflect their true purpose:
    • close_entity_helpers.goclose_entity_parser.go or merge with close entity files
    • update_entity_helpers.goupdate_entity_parser.go or merge with update entity files
  2. Review engine_helpers.go to determine if it should be split or renamed
  3. Consider whether safe_outputs_env_helpers.go should be safe_outputs_env_builder.go

4. MCP Configuration Render Functions

Issue: 14 render functions across MCP files with similar patterns

Render Function Cluster:

File: pkg/workflow/mcp-config.go - Central MCP configuration rendering

  • renderPlaywrightMCPConfig
  • renderPlaywrightMCPConfigWithOptions
  • renderSerenaMCPConfigWithOptions
  • renderBuiltinMCPServerBlock
  • renderSafeOutputsMCPConfig
  • renderSafeOutputsMCPConfigWithOptions
  • renderAgenticWorkflowsMCPConfigWithOptions
  • renderPlaywrightMCPConfigTOML
  • renderSafeOutputsMCPConfigTOML
  • renderAgenticWorkflowsMCPConfigTOML
  • renderCustomMCPConfigWrapper
  • renderSharedMCPConfig

File: pkg/workflow/mcp_renderer.go - Additional rendering functions

  • renderSafeInputsMCPConfigWithOptions
  • renderMCPFetchServerConfig

Pattern: Many functions follow the pattern render(Tool)MCPConfigWithOptions or render(Tool)MCPConfigTOML

Analysis:

  • ✅ Functions are well-organized by purpose (YAML vs TOML rendering)
  • ✅ Clear naming convention
  • ⚠️ Potential for abstraction to reduce duplication (common YAML/TOML generation patterns)

Recommendation: Consider extracting common rendering patterns into a generic MCP config renderer that can be configured per tool type, reducing code duplication while maintaining clarity.


5. Extract Function Patterns

Issue: Two distinct "extract" patterns serving different purposes

Pattern 1: Workflow Package Extractors (24 functions)

Purpose: Extract data/metadata from runtime objects
File: Scattered across workflow package

Examples:

  • extractActionRepo, extractActionVersion (action_pins.go)
  • extractNpxPackages, extractPipPackages, extractGoPackages (various files)
  • extractToolsFromFrontmatter, extractMCPServersFromFrontmatter (various files)
  • extractStringFromMap, extractMapFromFrontmatter (map_helpers.go, various)

Pattern 2: Parser Package Extractors (17 functions)

Purpose: Extract structured data from content/text
File: pkg/parser/content_extractor.go and related

Examples:

  • extractToolsFromContent
  • extractSafeOutputsFromContent
  • extractStepsFromContent
  • extractFrontmatterField

Analysis:

  • ✅ Parser package extractors are well-organized in dedicated files
  • ⚠️ Workflow package extractors are scattered across 10+ files
  • These serve different purposes but use the same naming pattern, which could be confusing

Recommendation:

  1. Keep parser extractors as-is (well-organized)
  2. Consider prefixing workflow extractors with their domain (e.g., extractActionRepoextractRepoFromActionUses)
  3. OR consolidate workflow extractors into domain-specific files (action extractors, package extractors, etc.)

6. Generate Function Patterns

Issue: 60+ generate functions across workflow package

Major Clusters:

Runtime Setup (10 functions) - pkg/workflow/runtime_setup.go

  • generateSetupStep
  • generateRuntimeSetupSteps
  • generateSerenaLanguageServiceSteps

Cache Generation (4 functions) - pkg/workflow/cache.go

  • generateCacheSteps
  • generateCacheMemorySteps
  • generateCacheMemoryArtifactUpload
  • generateCacheMemoryPromptSection

Repository Memory (4 functions) - pkg/workflow/repo_memory.go, pkg/workflow/repo_memory_prompt.go

  • generateRepoMemorySteps
  • generateRepoMemoryPushSteps
  • generateRepoMemoryPromptSection
  • generateRepoMemoryArtifactUpload

Safe Inputs/Outputs (10+ functions) - Various safe_*.go files

  • generateSafeInputsMCPServerScript
  • generateSafeInputsToolsConfig
  • generateSafeOutputsConfig
  • generateSafeInputShellToolScript
  • generateSafeInputJavaScriptToolScript
  • generateSafeInputPythonToolScript

SRT/Copilot (5 functions) - pkg/workflow/copilot_srt.go

  • generateSRTInstallationStep
  • generateSRTConfigJSON
  • generateSRTSystemConfigStep

Prompt Generation (2 functions) - pkg/workflow/prompt_step_helper.go

  • generateStaticPromptStep
  • generateStaticPromptStepWithExpressions

Analysis:

  • ✅ Most generate functions are well-organized by feature
  • ✅ Clear naming pattern indicates purpose
  • No major issues - this is good organization

Recommendation: No changes needed. This is a good example of semantic clustering.


7. Build Function Patterns (Expression Builder)

Issue: 30+ Build functions for expression building

File: pkg/workflow/expression_builder.go and related files

Pattern: All functions build expression AST nodes

  • BuildAnd, BuildOr, BuildComparison
  • BuildStringLiteral, BuildNumberLiteral, BuildBooleanLiteral
  • BuildPropertyAccess, BuildFunctionCall
  • BuildLabelContains, BuildEventTypeEquals

Analysis:

  • ✅ Excellent organization - all expression builders in one conceptual area
  • ✅ Clear naming convention
  • ✅ Single responsibility

Recommendation: No changes needed. This is exemplary organization.


8. Format Function Consolidation Opportunity

Issue: Multiple error formatting functions across packages

Error Formatting Functions:

Package: campaign

  • formatValidationErrors - pkg/campaign/validation.go:221

Package: parser

  • FormatImportError - pkg/parser/import_error.go:30

Package: console

  • FormatError - pkg/console/console.go:73
  • FormatErrorMessage - pkg/console/console.go:321
  • FormatErrorWithSuggestions - pkg/console/console.go:326

Package: logger

  • Error formatting in logger/error_formatting.go

Analysis:

  • Error formatting is distributed across packages
  • Each package formats errors differently
  • No consistent error formatting strategy

Recommendation:

  • If these formatters serve different purposes (which they likely do), document the distinction
  • Consider creating a shared error formatting package if patterns emerge
  • Current organization may be acceptable if each formatter is domain-specific

9. Large File Analysis

Top 15 Largest Files in Workflow Package:

Lines File Analysis
1,259 compiler_safe_outputs_consolidated.go ⚠️ Very large - consider splitting
1,169 safe_outputs_config.go ⚠️ Large config file - may need modularization
1,167 copilot_engine.go ⚠️ Engine implementation - potentially acceptable
1,090 frontmatter_extraction.go ⚠️ Recently refactored from 1,294 lines (issue #7051)
990 mcp-config.go ⚠️ MCP configuration - could be split by tool type
982 runtime_setup.go ⚠️ Runtime detection - consider splitting detection vs setup
947 mcp_servers.go ⚠️ MCP server management
945 permissions.go ⚠️ Permission handling
914 js.go ⚠️ JavaScript bundling/validation
847 codex_engine.go ✅ Engine implementation
837 safe_inputs.go ⚠️ Safe inputs handling

Recommendation: Files over 800 lines should be reviewed for potential splitting, especially:

  • compiler_safe_outputs_consolidated.go - Despite "consolidated" name, at 1,259 lines it may benefit from further modularization
  • safe_outputs_config.go - Config parsing could be extracted
  • mcp-config.go - Split by tool type (playwright, serena, github, etc.)

10. CLI Package Organization

Well-Organized Feature Groups:

Compile commands: 8 files with compile_ prefix

  • Orchestrator, validation, stats, watch, config, helpers, command, campaign

Update commands: 9 files with update_ prefix

  • Actions, check, command, display, extension check, git, merge, types, workflows

MCP commands: 18 files with mcp_ prefix

  • Add, inspect, list, registry, secrets, server, validation, etc.

Logs commands: 11 files with logs_ prefix

  • Cache, command, display, download, github API, metrics, models, orchestrator, parsing, report, utils

Analysis: CLI package demonstrates excellent semantic clustering by feature area. This is the gold standard for organization.


Detailed Refactoring Recommendations

Priority 1: High-Impact, Low-Risk Changes

1.1 Move Validation Functions to Validation Files

Effort: 1-2 hours
Files affected: 3
Impact: Consistency with established validation file pattern

Actions:

  • Move validateNoDuplicateMemoryIDs from repo_memory.go to validation.go or create repo_memory_validation.go
  • Create sandbox_validation.go and move validateMountsSyntax and validateSandboxConfig
  • Update tests to verify functions still work after move

1.2 Rename Misleading Helper Files

Effort: 1-2 hours
Files affected: 4
Impact: Improved code navigation and understanding

Actions:

  • Rename close_entity_helpers.go to close_entity_parser.go (or merge into close entity feature files)
  • Rename update_entity_helpers.go to update_entity_parser.go (or merge into update entity feature files)
  • Consider renaming config_helpers.go to config_parser.go for accuracy
  • Update all imports

Priority 2: Medium-Impact Refactoring

2.1 Extract Package Parsing from Dependabot

Effort: 2-3 hours
Files affected: 2-3
Impact: Improved reusability of package parsing logic

Actions:

  • Create pkg/workflow/package_parser.go
  • Move parseNpmPackage, parsePipPackage, parseGoPackage from dependabot.go
  • Update dependabot.go to use new package parser
  • Update tests

2.2 Consolidate Scattered Parse Functions

Effort: 3-4 hours
Files affected: 5-7
Impact: Reduced duplication, improved maintainability

Actions:

  • Review parse functions in safe_output_builder.go and safe_outputs_config.go
  • Identify duplicates with config_helpers.go functions
  • Consolidate to single location per function type
  • Document parsing utilities clearly

Priority 3: Long-Term Improvements

3.1 Modularize Large Files

Effort: 8-12 hours per file
Files affected: 3-5
Impact: Improved code navigation and maintainability

Targets:

  • compiler_safe_outputs_consolidated.go (1,259 lines)
  • safe_outputs_config.go (1,169 lines)
  • mcp-config.go (990 lines)

Approach: Split by functional area while maintaining clear APIs

3.2 Abstract Common MCP Rendering Patterns

Effort: 6-8 hours
Files affected: 2-3
Impact: Reduced code duplication in MCP config rendering

Approach:

  • Identify common YAML/TOML generation patterns
  • Create generic renderer with per-tool configuration
  • Refactor existing render functions to use generic renderer

Function Clustering Summary

Excellent Clustering (No Changes Needed) ✅

  1. Create operations: 6 well-organized files (create_issue.go, create_pr.go, etc.)
  2. Update operations: 7 well-organized files (update_issue.go, update_pr.go, etc.)
  3. Compiler stages: 15 modular compiler files
  4. Engine implementations: 5 separate engine files per engine type
  5. Expression builders: All Build* functions well-organized
  6. Generate functions: Mostly well-organized by feature
  7. CLI feature groups: Exemplary organization (compile_, update_, mcp_, logs_)

Needs Improvement ⚠️

  1. Validation functions: 8 functions in wrong files
  2. Parse functions: 30+ scattered across non-parser files
  3. Helper file naming: 4 files with misleading "helper" names
  4. Large files: 5 files over 900 lines need review

Testing Impact Assessment

Low Risk Changes:

  • Moving validation functions (existing tests should still pass)
  • Renaming files (only import statements affected)

Medium Risk Changes:

  • Extracting package parsing (need to verify dependabot functionality)
  • Consolidating parse functions (need comprehensive testing)

High Risk Changes:

  • Splitting large files (extensive testing required)
  • Refactoring MCP renderers (integration tests critical)

Recommendation: Implement Priority 1 changes first to validate the refactoring approach, then proceed to Priority 2 and 3.


Implementation Checklist

Phase 1: Validation Reorganization

  • Create or update validation files for scattered validation functions
  • Move validateNoDuplicateMemoryIDs to appropriate validation file
  • Create sandbox_validation.go and move sandbox validation functions
  • Run full test suite to verify no breakage
  • Update documentation if needed

Phase 2: Helper File Renaming

  • Rename close_entity_helpers.go to more accurate name
  • Rename update_entity_helpers.go to more accurate name
  • Review and potentially rename config_helpers.go
  • Update all import statements across codebase
  • Run full test suite

Phase 3: Parse Function Consolidation

  • Create package_parser.go and extract package parsing from dependabot
  • Identify duplicate parse functions across safe_outputs files
  • Consolidate duplicate parsing logic
  • Update callers to use consolidated functions
  • Add unit tests for extracted parsers

Phase 4: Large File Modularization (Future)

  • Analyze and plan split for compiler_safe_outputs_consolidated.go
  • Analyze and plan split for safe_outputs_config.go
  • Analyze and plan split for mcp-config.go
  • Execute splits incrementally with comprehensive testing

Analysis Metadata

  • Total Go Files Analyzed: 331 (excluding tests)
  • Packages Examined: 12
  • Function Clusters Identified: 15 major clusters
  • Outliers Found: 8 validation functions, 30+ parse functions
  • Well-Organized Patterns: 7 exemplary patterns
  • Files Needing Attention: 12 files for various improvements
  • Detection Method: Static analysis + semantic pattern matching
  • Analysis Date: 2025-12-20

Conclusion

The codebase demonstrates strong overall organization with excellent semantic clustering in most areas, particularly:

  • CLI command organization (compile, update, mcp, logs groups)
  • Entity operations (create/update patterns)
  • Compiler modularization
  • Engine separation

The identified refactoring opportunities are focused and actionable, primarily involving:

  1. Moving validation functions to correct files (consistency)
  2. Renaming misleading "helper" files (clarity)
  3. Consolidating scattered parsing logic (DRY principle)
  4. Considering modularization of large files (maintainability)

Recommended Approach: Start with low-risk Priority 1 changes to establish the refactoring process, then progressively tackle higher-impact items as time and testing resources permit.

AI generated by Semantic Function Refactoring

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions