A static analysis tool that extracts architecture data from Kubernetes/OpenShift component repositories and generates diagrams, security reports, and code property graphs. Works with any Go-based K8s operator ecosystem.
- 26 architecture extractors covering CRDs, RBAC, deployments, services, network policies, controller watches, dependencies, secrets, Helm charts, Dockerfiles, webhooks, configmaps, HTTP endpoints, ingress, external connections (Go + Python), feature gates, cache architecture, operator config constants, reconciliation sequences, Prometheus metrics, status conditions, platform detection, Go CRD extraction, webhook behavioral analysis, and programmatic resource operations
- Go AST extraction via
go/packagesfor operators that.gitignoregenerated manifests. Extracts CRDs from Go types with kubebuilder markers, analyzes webhook method bodies for field-level mutations and validations, and detects programmaticclient.Create/Update/Patch/Deletecalls in reconcile methods. Security-hardened for untrusted repo analysis (CGO_ENABLED=0, module isolation, boundedFileSystem). - Code property graph with multi-language parsing (Go, Python, TypeScript, Rust), typed node model, edge confidence classification, intraprocedural data flow, control flow graphs, Python class hierarchy extraction (NodeClass with BaseClasses and EdgeContains), and two-phase taint propagation
- 24 security queries across 4 domains (security, testing, upgrade, architecture) detecting webhook gaps, RBAC bugs, secret leaks, taint paths, complexity hotspots, class hierarchies, factory patterns, external API surfaces, and more
- SARIF ingestion mapping external scanner findings (Semgrep, gosec, etc.) to CPG nodes for unified analysis
- Structural diff engine comparing code graphs across versions to detect regressions
- 7 renderers producing Mermaid diagrams, Structurizr C4 DSL, ASCII security views, and structured markdown reports
- CycloneDX SBOM generation from extracted data (Go modules, Python deps, Dockerfile base images, deployment container images, operator image constants) with full operational metadata (security contexts, resource limits, health probes)
- Image & container analysis report covering GPU/CUDA dependencies, base image registries, multi-arch support, Dockerfile issues, container security contexts, resource limits, health probes, sidecar inventory, and deployment issues
- CRD contract validation detecting breaking schema changes across repos
- Platform aggregation merging multiple component analyses into a cross-repo view
graph LR
subgraph Inputs
REPO[Git Repository]
SARIF[SARIF Files]
end
subgraph "Architecture Extractors (26)"
E1[CRDs & RBAC]
E2[Services & Deployments]
E3[Network Policies & Ingress]
E4[Controller Watches & Dependencies]
E5[Cache Config & Operator Config]
E6[Reconcile Sequences & Status Conditions]
E7[Prometheus Metrics & Platform Detection]
E8[Secrets, Helm, Dockerfiles, Webhooks, ConfigMaps, HTTP Endpoints, External Connections, Feature Gates]
end
subgraph "Code Property Graph"
PARSE[Multi-Language Parsers<br/>Go, Python, TS, Rust]
CPG[Typed Node Model<br/>Edge Confidence]
DF[Data Flow Analysis]
CFG[Control Flow Graphs]
TAINT[Taint Propagation Engine]
DOMAINS[Domain Queries<br/>Security, Testing, Upgrade, Architecture]
end
subgraph Outputs
JSON[component-architecture.json]
GRAPH[code-graph.json]
FINDINGS[security-findings.json/sarif]
DIAGRAMS[Diagrams & Reports]
end
REPO --> E1 & E2 & E3 & E4 & E5 & E6 & E7 & E8 --> JSON --> DIAGRAMS
REPO --> PARSE --> CPG --> DF --> CFG --> TAINT --> DOMAINS --> FINDINGS
SARIF --> CPG
CPG --> GRAPH
classDef extractor fill:#3498db,stroke:#2980b9,color:#fff
classDef cpg fill:#9b59b6,stroke:#8e44ad,color:#fff
classDef output fill:#2ecc71,stroke:#27ae60,color:#fff
class E1,E2,E3,E4,E5,E6,E7,E8 extractor
class PARSE,CPG,DF,CFG,TAINT,DOMAINS cpg
class JSON,GRAPH,FINDINGS,DIAGRAMS output
- Go 1.25+
git clone https://github.com/ugiordan/architecture-analyzer.git
cd architecture-analyzer
go build -o arch-analyzer ./cmd/arch-analyzer/./arch-analyzer analyze /path/to/repo --output-dir output/Produces:
output/component-architecture.json(extracted architecture data)output/diagrams/rbac.mmd(Mermaid RBAC graph)output/diagrams/component.mmd(Mermaid component diagram)output/diagrams/dependencies.mmd(Mermaid dependency graph)output/diagrams/dataflow.mmd(Mermaid sequence diagram)output/diagrams/security-network.txt(ASCII security/network diagram)output/diagrams/c4-context.dsl(Structurizr C4 DSL)output/diagrams/report.md(structured markdown report)
./arch-analyzer extract /path/to/repo --output component-architecture.json# From existing extraction
./arch-analyzer sbom component-architecture.json --output sbom.json
# Pipe to stdout
./arch-analyzer sbom component-architecture.json | jq '.components | length'Includes Go modules, Python deps, Dockerfile base images, deployment container images, and operator image constants. Each component carries operational metadata: security context, resource limits, health probes, Dockerfile issues.
# Single component
./arch-analyzer report component-architecture.json --output report.md
# Cross-component analysis (multiple inputs)
./arch-analyzer report results/*/component-architecture.json --output platform-report.md10-section report: GPU/CUDA dependencies, base image registries, multi-arch support, Dockerfile issues, security contexts, resource limits, health probes, sidecars, deployment issues, operator image constants.
./arch-analyzer scan /path/to/repo --format json --output findings.json
./arch-analyzer scan /path/to/repo --format sarif --output findings.sarif
# With specific domains
./arch-analyzer scan /path/to/repo --domains security,testing,upgrade
# Import SARIF from external scanners alongside the scan
./arch-analyzer scan /path/to/repo --import-sarif gosec.sarif,semgrep.sarif
# With architecture context for richer queries
./arch-analyzer scan /path/to/repo --with-arch./arch-analyzer graph /path/to/repo --output code-graph.json
./arch-analyzer graph /path/to/repo --format dot --output code-graph.dot./arch-analyzer diff base.json head.json --format text
./arch-analyzer diff base.json head.json --format json --output diff.json./arch-analyzer ingest gosec.sarif --graph code-graph.json --output enriched-graph.json./arch-analyzer full-analysis /path/to/repo --output-dir output/
./arch-analyzer full-analysis /path/to/repo --import-sarif gosec.sarif --domains security# Extract schemas as baseline
./arch-analyzer extract-schema /path/to/repo --output-dir contracts/schemas
# Validate changes against baseline
./arch-analyzer validate /path/to/repo --contracts-dir contracts./arch-analyzer analyze /path/to/repo-a --output-dir results/repo-a
./arch-analyzer analyze /path/to/repo-b --output-dir results/repo-b
./arch-analyzer aggregate results/ --output-dir platform-output/./arch-analyzer discover /path/to/operator-repo --format json
./arch-analyzer build-config /path/to/operator-repo| Extractor | Source Patterns | Data Extracted |
|---|---|---|
| CRDs | config/crd/**, deploy/crds/, charts/**/crds/, manifests/**/crd* |
Group, version, kind, scope, field count, CEL rules |
| RBAC | config/rbac/, deploy/rbac/, Go kubebuilder markers |
ClusterRoles, bindings, rules, kubebuilder RBAC markers |
| Services | **/service*.yaml |
Name, type, ports, selector |
| Deployments | **/deployment*.yaml, **/manager*.yaml, **/statefulset*.yaml |
Containers, security context, env vars, volumes, resources, probes |
| Network Policies | **/*networkpolicy*, **/*network-polic*, **/*netpol*, **/network-policies/** |
Pod selector, ingress/egress rules |
| Controller Watches | **/*_controller.go, **/setup.go, **/*reconciler*.go |
For/Owns/Watches with GVK resolution |
| Dependencies | go.mod |
Go version, toolchain, modules (direct only), internal ODH deps, replace directives |
| Secrets | Deployments, services | Secret names, types, references (never values) |
| Helm | Chart.yaml, values.yaml |
Chart metadata, security-relevant defaults |
| Dockerfiles | Dockerfile*, Containerfile* |
Base image, stages, USER, EXPOSE, FIPS indicators |
| Webhooks | **/webhook*.yaml, **/mutating*, **/validating* |
Webhook rules, failure policy, side effects |
| ConfigMaps | **/configmap*.yaml |
ConfigMap names, data keys |
| HTTP Endpoints | Go source (http.HandleFunc, mux.Route, gin.Engine) |
Method, path, handler, middleware |
| Ingress | **/ingress*, **/virtualservice*, **/httproute* |
Gateway API, Istio, K8s Ingress resources |
| External Connections (Go) | Go source (sql.Open, redis.NewClient, grpc.Dial, sarama.New*) |
Database, object storage, gRPC, messaging references with credential redaction |
| External Connections (Python) | Python source (psycopg2, sqlalchemy, boto3, requests, httpx, grpc, openai, chromadb, etc.) |
Database, object storage, gRPC, messaging, HTTP clients, LLM/ML SDK references |
| Feature Gates | Go source (DefaultMutableFeatureGate.Add, featuregate.Feature consts) |
Gate name, default state, pre-release stage, source location |
| Cache Config | Go source (ctrl.NewManager, cache.Options) |
Cache scope, filtered types, disabled types, implicit informers, GOMEMLIMIT |
| Operator Config | Go source (const/var blocks in controllers, pkg/config) | Classified constants: images, ports, timeouts, env vars, resources, name patterns |
| Reconcile Sequences | Go source (Reconcile() methods) |
Ordered sub-resource reconciliation steps with conditional guards |
| Prometheus Metrics | Go source (prometheus.New*, promauto.New*) |
Metric name, type (gauge/counter/histogram/summary), help, labels, namespace |
| Status Conditions | Go source (const blocks in controllers, API types) | Condition type constants, associated reason constants, source location |
| Platform Detection | Go source (controllers, reconcilers, config packages) | Capability structs (IsOpenShift, HasRoute), API discovery checks, conditional resource creation |
| Go CRD Extraction | Go types with +kubebuilder:object:root=true markers |
Group, version, kind, scope, storage version, hub/spoke conversion, field count, CEL rules |
| Webhook Behavioral Analysis | Webhook Default() and Validate*() method bodies |
Field-level mutations, field-level validations, same-receiver method call following |
| Programmatic Resource Ops | Go reconcile methods (client.Create/Update/Patch/Delete) |
Operation type, target kind, API group, type-resolved via go/packages |
The cache analyzer cross-references controller-runtime cache configuration against controller watches and deployment memory limits. It detects:
- Cluster-wide informers for types that should be namespace-scoped or filtered
- Missing cache filters on watched types (potential OOM risk at scale)
- Implicit informers created by
client.Getcalls for unwatched types - Missing DefaultTransform (managedFields wasting memory)
- Missing GOMEMLIMIT in deployment (Go GC cannot pressure-tune)
- GOMEMLIMIT exceeding 90% of container memory limit
This catches real bugs like opendatahub-io/data-science-pipelines-operator#992 and opendatahub-io/model-registry-operator#457.
The CPG pipeline builds a multi-language code graph from source using tree-sitter (no compilation required) and runs layered analysis on top of it.
Four language parsers extract AST-level nodes (functions, call sites, struct literals, HTTP endpoints, DB operations) and edges (calls, contains):
| Language | Parser | CFG | Data Flow | Taint |
|---|---|---|---|---|
| Go | tree-sitter-go | Yes | Yes | Yes |
| Python | tree-sitter-python | Yes | Yes | Yes |
| TypeScript | tree-sitter-typescript | Yes | Yes | Yes |
| Rust | tree-sitter-rust | Yes | Yes | Yes |
Nodes carry typed fields instead of string maps, covering function signatures (params, return types), call targets, HTTP routes, DB operations, struct types, class definitions (with base classes for inheritance tracking), cyclomatic complexity, and entrypoint trust level.
Call edges are classified by resolution confidence:
| Confidence | Meaning | Example |
|---|---|---|
CERTAIN |
Exact match, same package | Direct function call doWork() |
INFERRED |
Cross-package short-name match | utils.Validate() matched heuristically |
UNCERTAIN |
Multiple candidates, interface dispatch | handler.Process() with multiple implementations |
Security queries never filter out UNCERTAIN edges; they use confidence to prioritize review order.
Per-function analysis tracks variable assignments, reads, argument passing, field access, and return values. Produces assigns, reads, passes_to, field_access, and returns edges within function bodies.
Basic block construction within each function with branching edges (true_branch, false_branch, fallthrough, loop_back, loop_exit, exception, entry, exit). Enables path-sensitive analysis: distinguishing "validation guards the dangerous operation" from "validation on independent path."
Two-phase taint engine:
- Intraprocedural (Phase A): per-function taint propagation along data flow edges, filtered by CFG block reachability. Produces function summaries.
- Interprocedural (Phase B): walks the call graph using Phase A summaries to trace taint across function boundaries and storage links.
Sources: user input handlers, deserialization calls. Sinks: SQL execution, subprocess calls, command execution, template rendering, HTML output, file access, eval usage. Bounded by configurable depth (20), path (100), and visit (10K) limits with truncation diagnostics.
Ingest SARIF 2.1.0 output from external static analyzers (Semgrep, gosec, Trivy, etc.) and map findings to CPG nodes. Enriches external findings with architecture context: "Semgrep found SQL injection at handler.go:42" becomes "that function is an untrusted webhook handler with RBAC for secrets."
Validation: schema validation, path normalization, annotation sanitization, 50K result size limit.
Compare two code-graph.json files to detect regressions: new functions, removed functions, changed complexity, new call edges, trust level changes. Useful for PR review automation.
| Rule | ID | Severity | Description |
|---|---|---|---|
| Webhook Missing Update | CGA-003 | High | Webhooks intercepting CREATE but not UPDATE |
| RBAC Precedence Bug | CGA-004 | High | Conflicting RBAC rules across bindings |
| Cert as CA | CGA-005 | High | Certificate used as CA without proper validation |
| Cross-Namespace Secret | CGA-006 | High | Secret access crossing namespace boundaries |
| Unfiltered Cache | CGA-007 | Medium | Watched types without cache filters (OOM risk) |
| Plaintext Secrets | CGA-008 | Medium | Hardcoded secrets or credentials in source |
| Weak Serial Entropy | CGA-009 | Medium | Weak randomness in security-sensitive contexts |
| Complexity Hotspot | CGA-010 | Medium | High-complexity functions with security annotations |
| Untrusted Endpoint | CGA-011 | Info | HTTP endpoints without recognized auth middleware |
| Unprotected Ingress | CGA-012 | High | Ingress routes without TLS or auth |
| Overprivileged Secret Access | CGA-013 | Medium | Broad secret access beyond what's needed |
| Uncontrolled Egress | CGA-014 | Medium | Outbound connections without network policy |
| Rule | ID | Severity | Description |
|---|---|---|---|
| Untested Security Function | CGA-T01 | Medium | Security-annotated functions without test coverage |
| Fake-Only Integration | CGA-T02 | Low | Integration tests using only fakes/mocks |
| Missing Error Paths | CGA-T03 | Medium | Error return paths without test coverage |
| Consolidation Opportunity | CGA-T04 | Low | Duplicate test patterns that could be consolidated |
| Rule | ID | Severity | Description |
|---|---|---|---|
| Unconverted CRD | CGA-U01 | Medium | CRDs still using v1beta1 |
| Pre-Release API Usage | CGA-U02 | Low | Usage of alpha/beta Kubernetes APIs |
| Ungated Feature | CGA-U03 | Medium | Features without feature gate protection |
| Unchecked Version Access | CGA-U04 | Low | Version-dependent code without version checks |
| Rule | ID | Severity | Description |
|---|---|---|---|
| Abstraction Layers | CGA-A01 | Info | Surfaces class hierarchies with abstract bases and implementations |
| External API Surface | CGA-A02 | Info | Functions using external SDK clients (openai, boto3, chromadb, etc.) |
| Factory Dispatch | CGA-A03 | Info | Factory functions dispatching to multiple implementation types |
| Unimplemented Interface | CGA-A04 | Low | Abstract bases with no implementations found in analyzed sources |
| Renderer | Output | Description |
|---|---|---|
| RBAC | rbac.mmd |
Mermaid graph: ServiceAccounts -> Bindings -> Roles -> Resources |
| Component | component.mmd |
Mermaid diagram: CRDs watched, owned, and dependency relationships |
| Security/Network | security-network.txt |
ASCII layered view: network, RBAC, secrets, security contexts |
| Dependencies | dependencies.mmd |
Mermaid graph: Go module dependencies (internal ODH highlighted) |
| C4 | c4-context.dsl |
Structurizr C4 context diagram |
| Dataflow | dataflow.mmd |
Mermaid sequence diagram: controller watches and service connections |
| Report | report.md |
Structured markdown with tables for all extracted data and cache issues |
architecture-analyzer/
cmd/arch-analyzer/
main.go # CLI entry point with subcommands
pkg/
extractor/ # 26 architecture extractors
renderer/ # 7 diagram/report renderers
aggregator/ # Platform-wide aggregation
validator/ # CRD contract validation
parser/ # Multi-language parsers (Go, Python, TypeScript, Rust)
# with CFG construction per language
builder/ # Code property graph builder (call resolution, edge confidence)
graph/ # CPG data structures (typed nodes, edges, basic blocks)
dataflow/ # Taint propagation engine (intraprocedural + interprocedural)
diff/ # Structural diff engine for code graph comparison
sarif/ # SARIF 2.1.0 ingestion and node mapping
linker/ # Storage linker (DB operations to schemas)
annotator/ # Security annotation engine
query/ # Security query engine (base queries + taint-to-sink)
domains/ # Domain framework with registered query rules
security/ # 12 security queries
testing/ # 4 testing queries
upgrade/ # 4 upgrade queries
architecture/ # 4 architecture queries
arch/ # Architecture data structures
config/ # Configuration types
contracts/
schemas/ # CRD baseline schemas for validation
scripts/
analyze-repo.sh # Clone + analyze + cleanup
site/
docs/ # MkDocs Material documentation
mkdocs.yml # Docs site configuration
.github/workflows/
analyze-all.yml # Scheduled analysis workflow
extract-schemas.yml # CRD schema extraction workflow
validate-contracts.yml # CRD contract validation on PRs
docs.yml # Deploy docs to GitHub Pages
go test ./...Full documentation is published at ugiordan.github.io/architecture-analyzer and covers installation, guides, CLI reference, architecture, and contributing.
analyze-all.yml: runs weekly (Monday 06:00 UTC) or on manual dispatch, analyzes all configured platform repos and uploads artifactsextract-schemas.yml: extracts CRD schemas weekly and opens automated PRs for changesvalidate-contracts.yml: validates CRD contract changes on PRs to thecontracts/directorydocs.yml: deploys documentation to GitHub Pages on pushes to main
