Skip to content

ugiordan/architecture-analyzer

Repository files navigation

Architecture Analyzer

A static analysis tool that extracts architecture data from Kubernetes/OpenShift component repositories and generates diagrams, security reports, and code property graphs. Works with any Go-based K8s operator ecosystem.

Documentation | GitHub

Demo

Architecture Analyzer Demo

Features

  • 26 architecture extractors covering CRDs, RBAC, deployments, services, network policies, controller watches, dependencies, secrets, Helm charts, Dockerfiles, webhooks, configmaps, HTTP endpoints, ingress, external connections (Go + Python), feature gates, cache architecture, operator config constants, reconciliation sequences, Prometheus metrics, status conditions, platform detection, Go CRD extraction, webhook behavioral analysis, and programmatic resource operations
  • Go AST extraction via go/packages for operators that .gitignore generated manifests. Extracts CRDs from Go types with kubebuilder markers, analyzes webhook method bodies for field-level mutations and validations, and detects programmatic client.Create/Update/Patch/Delete calls in reconcile methods. Security-hardened for untrusted repo analysis (CGO_ENABLED=0, module isolation, boundedFileSystem).
  • Code property graph with multi-language parsing (Go, Python, TypeScript, Rust), typed node model, edge confidence classification, intraprocedural data flow, control flow graphs, Python class hierarchy extraction (NodeClass with BaseClasses and EdgeContains), and two-phase taint propagation
  • 24 security queries across 4 domains (security, testing, upgrade, architecture) detecting webhook gaps, RBAC bugs, secret leaks, taint paths, complexity hotspots, class hierarchies, factory patterns, external API surfaces, and more
  • SARIF ingestion mapping external scanner findings (Semgrep, gosec, etc.) to CPG nodes for unified analysis
  • Structural diff engine comparing code graphs across versions to detect regressions
  • 7 renderers producing Mermaid diagrams, Structurizr C4 DSL, ASCII security views, and structured markdown reports
  • CycloneDX SBOM generation from extracted data (Go modules, Python deps, Dockerfile base images, deployment container images, operator image constants) with full operational metadata (security contexts, resource limits, health probes)
  • Image & container analysis report covering GPU/CUDA dependencies, base image registries, multi-arch support, Dockerfile issues, container security contexts, resource limits, health probes, sidecar inventory, and deployment issues
  • CRD contract validation detecting breaking schema changes across repos
  • Platform aggregation merging multiple component analyses into a cross-repo view

Architecture

graph LR
    subgraph Inputs
        REPO[Git Repository]
        SARIF[SARIF Files]
    end

    subgraph "Architecture Extractors (26)"
        E1[CRDs & RBAC]
        E2[Services & Deployments]
        E3[Network Policies & Ingress]
        E4[Controller Watches & Dependencies]
        E5[Cache Config & Operator Config]
        E6[Reconcile Sequences & Status Conditions]
        E7[Prometheus Metrics & Platform Detection]
        E8[Secrets, Helm, Dockerfiles, Webhooks, ConfigMaps, HTTP Endpoints, External Connections, Feature Gates]
    end

    subgraph "Code Property Graph"
        PARSE[Multi-Language Parsers<br/>Go, Python, TS, Rust]
        CPG[Typed Node Model<br/>Edge Confidence]
        DF[Data Flow Analysis]
        CFG[Control Flow Graphs]
        TAINT[Taint Propagation Engine]
        DOMAINS[Domain Queries<br/>Security, Testing, Upgrade, Architecture]
    end

    subgraph Outputs
        JSON[component-architecture.json]
        GRAPH[code-graph.json]
        FINDINGS[security-findings.json/sarif]
        DIAGRAMS[Diagrams & Reports]
    end

    REPO --> E1 & E2 & E3 & E4 & E5 & E6 & E7 & E8 --> JSON --> DIAGRAMS
    REPO --> PARSE --> CPG --> DF --> CFG --> TAINT --> DOMAINS --> FINDINGS
    SARIF --> CPG
    CPG --> GRAPH

    classDef extractor fill:#3498db,stroke:#2980b9,color:#fff
    classDef cpg fill:#9b59b6,stroke:#8e44ad,color:#fff
    classDef output fill:#2ecc71,stroke:#27ae60,color:#fff

    class E1,E2,E3,E4,E5,E6,E7,E8 extractor
    class PARSE,CPG,DF,CFG,TAINT,DOMAINS cpg
    class JSON,GRAPH,FINDINGS,DIAGRAMS output
Loading

Requirements

  • Go 1.25+

Installation

git clone https://github.com/ugiordan/architecture-analyzer.git
cd architecture-analyzer
go build -o arch-analyzer ./cmd/arch-analyzer/

Usage

Analyze a repository (extract + render)

./arch-analyzer analyze /path/to/repo --output-dir output/

Produces:

  • output/component-architecture.json (extracted architecture data)
  • output/diagrams/rbac.mmd (Mermaid RBAC graph)
  • output/diagrams/component.mmd (Mermaid component diagram)
  • output/diagrams/dependencies.mmd (Mermaid dependency graph)
  • output/diagrams/dataflow.mmd (Mermaid sequence diagram)
  • output/diagrams/security-network.txt (ASCII security/network diagram)
  • output/diagrams/c4-context.dsl (Structurizr C4 DSL)
  • output/diagrams/report.md (structured markdown report)

Extract only (no diagrams)

./arch-analyzer extract /path/to/repo --output component-architecture.json

Generate SBOM (CycloneDX 1.5)

# From existing extraction
./arch-analyzer sbom component-architecture.json --output sbom.json

# Pipe to stdout
./arch-analyzer sbom component-architecture.json | jq '.components | length'

Includes Go modules, Python deps, Dockerfile base images, deployment container images, and operator image constants. Each component carries operational metadata: security context, resource limits, health probes, Dockerfile issues.

Image & Container Analysis Report

# Single component
./arch-analyzer report component-architecture.json --output report.md

# Cross-component analysis (multiple inputs)
./arch-analyzer report results/*/component-architecture.json --output platform-report.md

10-section report: GPU/CUDA dependencies, base image registries, multi-arch support, Dockerfile issues, security contexts, resource limits, health probes, sidecars, deployment issues, operator image constants.

Code graph security scan

./arch-analyzer scan /path/to/repo --format json --output findings.json
./arch-analyzer scan /path/to/repo --format sarif --output findings.sarif

# With specific domains
./arch-analyzer scan /path/to/repo --domains security,testing,upgrade

# Import SARIF from external scanners alongside the scan
./arch-analyzer scan /path/to/repo --import-sarif gosec.sarif,semgrep.sarif

# With architecture context for richer queries
./arch-analyzer scan /path/to/repo --with-arch

Export code property graph

./arch-analyzer graph /path/to/repo --output code-graph.json
./arch-analyzer graph /path/to/repo --format dot --output code-graph.dot

Structural diff between code graphs

./arch-analyzer diff base.json head.json --format text
./arch-analyzer diff base.json head.json --format json --output diff.json

Ingest external SARIF findings

./arch-analyzer ingest gosec.sarif --graph code-graph.json --output enriched-graph.json

Full analysis (architecture + code graph + schemas)

./arch-analyzer full-analysis /path/to/repo --output-dir output/
./arch-analyzer full-analysis /path/to/repo --import-sarif gosec.sarif --domains security

CRD contract validation

# Extract schemas as baseline
./arch-analyzer extract-schema /path/to/repo --output-dir contracts/schemas

# Validate changes against baseline
./arch-analyzer validate /path/to/repo --contracts-dir contracts

Aggregate multiple components

./arch-analyzer analyze /path/to/repo-a --output-dir results/repo-a
./arch-analyzer analyze /path/to/repo-b --output-dir results/repo-b
./arch-analyzer aggregate results/ --output-dir platform-output/

Platform discovery

./arch-analyzer discover /path/to/operator-repo --format json
./arch-analyzer build-config /path/to/operator-repo

Extractors

Extractor Source Patterns Data Extracted
CRDs config/crd/**, deploy/crds/, charts/**/crds/, manifests/**/crd* Group, version, kind, scope, field count, CEL rules
RBAC config/rbac/, deploy/rbac/, Go kubebuilder markers ClusterRoles, bindings, rules, kubebuilder RBAC markers
Services **/service*.yaml Name, type, ports, selector
Deployments **/deployment*.yaml, **/manager*.yaml, **/statefulset*.yaml Containers, security context, env vars, volumes, resources, probes
Network Policies **/*networkpolicy*, **/*network-polic*, **/*netpol*, **/network-policies/** Pod selector, ingress/egress rules
Controller Watches **/*_controller.go, **/setup.go, **/*reconciler*.go For/Owns/Watches with GVK resolution
Dependencies go.mod Go version, toolchain, modules (direct only), internal ODH deps, replace directives
Secrets Deployments, services Secret names, types, references (never values)
Helm Chart.yaml, values.yaml Chart metadata, security-relevant defaults
Dockerfiles Dockerfile*, Containerfile* Base image, stages, USER, EXPOSE, FIPS indicators
Webhooks **/webhook*.yaml, **/mutating*, **/validating* Webhook rules, failure policy, side effects
ConfigMaps **/configmap*.yaml ConfigMap names, data keys
HTTP Endpoints Go source (http.HandleFunc, mux.Route, gin.Engine) Method, path, handler, middleware
Ingress **/ingress*, **/virtualservice*, **/httproute* Gateway API, Istio, K8s Ingress resources
External Connections (Go) Go source (sql.Open, redis.NewClient, grpc.Dial, sarama.New*) Database, object storage, gRPC, messaging references with credential redaction
External Connections (Python) Python source (psycopg2, sqlalchemy, boto3, requests, httpx, grpc, openai, chromadb, etc.) Database, object storage, gRPC, messaging, HTTP clients, LLM/ML SDK references
Feature Gates Go source (DefaultMutableFeatureGate.Add, featuregate.Feature consts) Gate name, default state, pre-release stage, source location
Cache Config Go source (ctrl.NewManager, cache.Options) Cache scope, filtered types, disabled types, implicit informers, GOMEMLIMIT
Operator Config Go source (const/var blocks in controllers, pkg/config) Classified constants: images, ports, timeouts, env vars, resources, name patterns
Reconcile Sequences Go source (Reconcile() methods) Ordered sub-resource reconciliation steps with conditional guards
Prometheus Metrics Go source (prometheus.New*, promauto.New*) Metric name, type (gauge/counter/histogram/summary), help, labels, namespace
Status Conditions Go source (const blocks in controllers, API types) Condition type constants, associated reason constants, source location
Platform Detection Go source (controllers, reconcilers, config packages) Capability structs (IsOpenShift, HasRoute), API discovery checks, conditional resource creation
Go CRD Extraction Go types with +kubebuilder:object:root=true markers Group, version, kind, scope, storage version, hub/spoke conversion, field count, CEL rules
Webhook Behavioral Analysis Webhook Default() and Validate*() method bodies Field-level mutations, field-level validations, same-receiver method call following
Programmatic Resource Ops Go reconcile methods (client.Create/Update/Patch/Delete) Operation type, target kind, API group, type-resolved via go/packages

Cache Architecture Analysis

The cache analyzer cross-references controller-runtime cache configuration against controller watches and deployment memory limits. It detects:

  • Cluster-wide informers for types that should be namespace-scoped or filtered
  • Missing cache filters on watched types (potential OOM risk at scale)
  • Implicit informers created by client.Get calls for unwatched types
  • Missing DefaultTransform (managedFields wasting memory)
  • Missing GOMEMLIMIT in deployment (Go GC cannot pressure-tune)
  • GOMEMLIMIT exceeding 90% of container memory limit

This catches real bugs like opendatahub-io/data-science-pipelines-operator#992 and opendatahub-io/model-registry-operator#457.

Code Property Graph

The CPG pipeline builds a multi-language code graph from source using tree-sitter (no compilation required) and runs layered analysis on top of it.

Multi-Language Parsing

Four language parsers extract AST-level nodes (functions, call sites, struct literals, HTTP endpoints, DB operations) and edges (calls, contains):

Language Parser CFG Data Flow Taint
Go tree-sitter-go Yes Yes Yes
Python tree-sitter-python Yes Yes Yes
TypeScript tree-sitter-typescript Yes Yes Yes
Rust tree-sitter-rust Yes Yes Yes

Typed Node Model

Nodes carry typed fields instead of string maps, covering function signatures (params, return types), call targets, HTTP routes, DB operations, struct types, class definitions (with base classes for inheritance tracking), cyclomatic complexity, and entrypoint trust level.

Edge Confidence

Call edges are classified by resolution confidence:

Confidence Meaning Example
CERTAIN Exact match, same package Direct function call doWork()
INFERRED Cross-package short-name match utils.Validate() matched heuristically
UNCERTAIN Multiple candidates, interface dispatch handler.Process() with multiple implementations

Security queries never filter out UNCERTAIN edges; they use confidence to prioritize review order.

Intraprocedural Data Flow

Per-function analysis tracks variable assignments, reads, argument passing, field access, and return values. Produces assigns, reads, passes_to, field_access, and returns edges within function bodies.

Control Flow Graphs

Basic block construction within each function with branching edges (true_branch, false_branch, fallthrough, loop_back, loop_exit, exception, entry, exit). Enables path-sensitive analysis: distinguishing "validation guards the dangerous operation" from "validation on independent path."

Taint Propagation

Two-phase taint engine:

  1. Intraprocedural (Phase A): per-function taint propagation along data flow edges, filtered by CFG block reachability. Produces function summaries.
  2. Interprocedural (Phase B): walks the call graph using Phase A summaries to trace taint across function boundaries and storage links.

Sources: user input handlers, deserialization calls. Sinks: SQL execution, subprocess calls, command execution, template rendering, HTML output, file access, eval usage. Bounded by configurable depth (20), path (100), and visit (10K) limits with truncation diagnostics.

SARIF Ingestion

Ingest SARIF 2.1.0 output from external static analyzers (Semgrep, gosec, Trivy, etc.) and map findings to CPG nodes. Enriches external findings with architecture context: "Semgrep found SQL injection at handler.go:42" becomes "that function is an untrusted webhook handler with RBAC for secrets."

Validation: schema validation, path normalization, annotation sanitization, 50K result size limit.

Structural Diff

Compare two code-graph.json files to detect regressions: new functions, removed functions, changed complexity, new call edges, trust level changes. Useful for PR review automation.

Security Queries

Security Domain (12 rules)

Rule ID Severity Description
Webhook Missing Update CGA-003 High Webhooks intercepting CREATE but not UPDATE
RBAC Precedence Bug CGA-004 High Conflicting RBAC rules across bindings
Cert as CA CGA-005 High Certificate used as CA without proper validation
Cross-Namespace Secret CGA-006 High Secret access crossing namespace boundaries
Unfiltered Cache CGA-007 Medium Watched types without cache filters (OOM risk)
Plaintext Secrets CGA-008 Medium Hardcoded secrets or credentials in source
Weak Serial Entropy CGA-009 Medium Weak randomness in security-sensitive contexts
Complexity Hotspot CGA-010 Medium High-complexity functions with security annotations
Untrusted Endpoint CGA-011 Info HTTP endpoints without recognized auth middleware
Unprotected Ingress CGA-012 High Ingress routes without TLS or auth
Overprivileged Secret Access CGA-013 Medium Broad secret access beyond what's needed
Uncontrolled Egress CGA-014 Medium Outbound connections without network policy

Testing Domain (4 rules)

Rule ID Severity Description
Untested Security Function CGA-T01 Medium Security-annotated functions without test coverage
Fake-Only Integration CGA-T02 Low Integration tests using only fakes/mocks
Missing Error Paths CGA-T03 Medium Error return paths without test coverage
Consolidation Opportunity CGA-T04 Low Duplicate test patterns that could be consolidated

Upgrade Domain (4 rules)

Rule ID Severity Description
Unconverted CRD CGA-U01 Medium CRDs still using v1beta1
Pre-Release API Usage CGA-U02 Low Usage of alpha/beta Kubernetes APIs
Ungated Feature CGA-U03 Medium Features without feature gate protection
Unchecked Version Access CGA-U04 Low Version-dependent code without version checks

Architecture Domain (4 rules)

Rule ID Severity Description
Abstraction Layers CGA-A01 Info Surfaces class hierarchies with abstract bases and implementations
External API Surface CGA-A02 Info Functions using external SDK clients (openai, boto3, chromadb, etc.)
Factory Dispatch CGA-A03 Info Factory functions dispatching to multiple implementation types
Unimplemented Interface CGA-A04 Low Abstract bases with no implementations found in analyzed sources

Renderers

Renderer Output Description
RBAC rbac.mmd Mermaid graph: ServiceAccounts -> Bindings -> Roles -> Resources
Component component.mmd Mermaid diagram: CRDs watched, owned, and dependency relationships
Security/Network security-network.txt ASCII layered view: network, RBAC, secrets, security contexts
Dependencies dependencies.mmd Mermaid graph: Go module dependencies (internal ODH highlighted)
C4 c4-context.dsl Structurizr C4 context diagram
Dataflow dataflow.mmd Mermaid sequence diagram: controller watches and service connections
Report report.md Structured markdown with tables for all extracted data and cache issues

Project Structure

architecture-analyzer/
  cmd/arch-analyzer/
    main.go                # CLI entry point with subcommands
  pkg/
    extractor/             # 26 architecture extractors
    renderer/              # 7 diagram/report renderers
    aggregator/            # Platform-wide aggregation
    validator/             # CRD contract validation
    parser/                # Multi-language parsers (Go, Python, TypeScript, Rust)
                           # with CFG construction per language
    builder/               # Code property graph builder (call resolution, edge confidence)
    graph/                 # CPG data structures (typed nodes, edges, basic blocks)
    dataflow/              # Taint propagation engine (intraprocedural + interprocedural)
    diff/                  # Structural diff engine for code graph comparison
    sarif/                 # SARIF 2.1.0 ingestion and node mapping
    linker/                # Storage linker (DB operations to schemas)
    annotator/             # Security annotation engine
    query/                 # Security query engine (base queries + taint-to-sink)
    domains/               # Domain framework with registered query rules
      security/            # 12 security queries
      testing/             # 4 testing queries
      upgrade/             # 4 upgrade queries
      architecture/        # 4 architecture queries
    arch/                  # Architecture data structures
    config/                # Configuration types
  contracts/
    schemas/               # CRD baseline schemas for validation
  scripts/
    analyze-repo.sh        # Clone + analyze + cleanup
  site/
    docs/                  # MkDocs Material documentation
    mkdocs.yml             # Docs site configuration
  .github/workflows/
    analyze-all.yml        # Scheduled analysis workflow
    extract-schemas.yml    # CRD schema extraction workflow
    validate-contracts.yml # CRD contract validation on PRs
    docs.yml               # Deploy docs to GitHub Pages

Running Tests

go test ./...

Documentation

Full documentation is published at ugiordan.github.io/architecture-analyzer and covers installation, guides, CLI reference, architecture, and contributing.

GitHub Actions

  • analyze-all.yml: runs weekly (Monday 06:00 UTC) or on manual dispatch, analyzes all configured platform repos and uploads artifacts
  • extract-schemas.yml: extracts CRD schemas weekly and opens automated PRs for changes
  • validate-contracts.yml: validates CRD contract changes on PRs to the contracts/ directory
  • docs.yml: deploys documentation to GitHub Pages on pushes to main

About

Static analysis tool that extracts architecture data from Kubernetes repos, builds multi-language code property graphs (Go, Python, TS, Rust), and runs security/architecture queries with taint analysis

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages