Skip to content

feat(hawk): production hardening — adopt top-50 OSS patterns#2

Merged
Patel230 merged 59 commits into
devfrom
feat/hawk-production-hardening
May 16, 2026
Merged

feat(hawk): production hardening — adopt top-50 OSS patterns#2
Patel230 merged 59 commits into
devfrom
feat/hawk-production-hardening

Conversation

@Patel230
Copy link
Copy Markdown
Contributor

@Patel230 Patel230 commented May 14, 2026

Summary

Production-hardening pass for hawk that brings the repo closer to top-50 OSS
repository standards. The branch contains two commits — a code-quality pass and
a re-baseline + OSS-hygiene pass — both targeting dev.

The mandate (per GOAL.md) is that hawk be the reference Go AI coding agent
for solo developers
: fast, single-binary, zero-config, offline-capable. This
PR raises the floor on linting strictness, error correctness, CI signal, and
project-meta files without changing the existing package layout (a separate PR
will tackle layout migration if/when desired, since that touches every Go
import path).

Commits

  1. feat(hawk): production hardening — linter, CI, errcheck, dead code removal
  2. feat(hawk): re-baseline to v0.2.0 + OSS standards (CoC, PR/issue templates, .gitattributes)

What's in commit 1 — code-quality + infra hardening

Linting & code quality

  • .golangci.yml — strict v2 config: errcheck, staticcheck, gocritic
    (diagnostic + performance), unused, ineffassign, misspell, noctx,
    bodyclose, unconvert, whitespace, with govet enable-all (minus
    fieldalignment).
  • 240+ unchecked error returns fixed across session/, engine/, tool/,
    config/, cmd/, auth/, analytics/, cmdhistory/, container/,
    daemon/, diffsandbox/, eval/, fingerprint/, update/ and other
    production packages. Each fix wraps with context, ignores intentionally with
    a comment, or surfaces the error.
  • Dead-code removal: 13 unused declarations removed (caught by the
    unused linter) — pruned in tool/smart_create.go, taste/collector.go,
    taste/detector.go, repomap/*.go, mcp/server.go,
    permissions/osv_checker.go, container/lifecycle.go, eval/coverage.go.
  • Real bugs fixed: SA4010 (append result never used) in
    mcp/server.go and repomap/depgraph.go.
  • session/ is now fully errcheck-clean — critical because session
    storage backs persistence/replay; silent error drops there mean lost user
    state.

Infrastructure

  • Makefile — standard targets per GOAL.md: build, test,
    test-coverage, test-10x, lint, fmt, vet, security, bench,
    clean, install, release, help.
  • .github/workflows/ci.yml — race-detector tests with coverage upload,
    golangci-lint v2 action, govulncheck and gosec security scans,
    multi-platform build matrix (linux/darwin/windows × amd64/arm64),
    benchmark job on PRs.
  • Dockerfiletini init system to handle zombies / signals
    correctly, timezone data embedded, go mod verify, runs as non-root
    hawk:1000.
  • .editorconfig — consistent formatting (UTF-8, LF, 2-space YAML, tabs
    for Go) across editors.
  • .github/dependabot.yml — weekly updates for gomod and
    github-actions.
  • CONTRIBUTING.md — development setup, branch flow, conventional
    commits, test/lint requirements.

Test coverage

  • auth/auth_test.go — 18.2% → ~70.5% with table-driven tests for
    TokenStore, SecureStorage, OAuthFlow, including edge cases
    (corrupted store, missing key, expired token).
  • update/update_test.go — 22.2% → ~91.9% with httptest mocking and
    full error-path coverage (server failure, invalid JSON, unreachable host,
    Summary() rendering both states).

What's in commit 2 — version re-baseline + OSS standards

Version 0.2.0 across the repo

Aligns hawk with the rest of the hawk-eco ecosystem (eyrie, tok, yaad,
sight, inspect are all v0.1.0/v0.2.0). Updated:

File Change
main.go var Version = "0.2.0"
api/server.go const Version = "0.2.0"
flake.nix version = "0.2.0";
.github/workflows/release.yml sister-repo clone branches v0.4.0v0.2.0 (5 sites)
api/server_test.go current-version assertion → "0.2.0"
update/update_test.go Check("0.2.0") / Summary("0.2.0") / TagName: "v0.2.0" (5 sites)

The semver-comparison table case {"dev version", "0.4.0", "0.3.9", true} in
update_test.go is intentionally unchanged — it tests isNewer()
comparison logic, not the current installed version.

CHANGELOG.md

  • Added ## [Unreleased] at the top describing the re-baseline + the full
    hardening pass (both commits together).
  • Existing 0.4.0, 0.3.0, 0.2.0, 0.0.1 entries preserved verbatim as
    historical reference.
  • Adopted the standard Keep-a-Changelog header.

New OSS standard files

File Purpose
.gitattributes LF line-ending normalization, binary detection, GitHub linguist hints (collapse go.sum, mark *-research.md as documentation)
CODE_OF_CONDUCT.md Contributor Covenant 2.1
.github/PULL_REQUEST_TEMPLATE.md Summary / Changes / Testing / Checklist
.github/ISSUE_TEMPLATE/bug_report.yml structured bug report
.github/ISSUE_TEMPLATE/feature_request.yml feature request with solo-dev fit checks
.github/ISSUE_TEMPLATE/config.yml routes security to advisories, questions to discussions, blocks blank issues

Verification

Check Status
go build ./... ✅ clean
go vet ./... ✅ clean
go test -race -count=1 on api/, update/, cmd/ ✅ pass
go test -race -count=1 on auth/, session/, config/, permissions/, mcp/ ✅ pass
All modified Go files gofmt-clean ✅ yes

Known follow-ups (intentionally not in this PR):

  • 253 pre-existing gofmt issues live in untouched files. A separate PR will
    do a repo-wide gofumpt -w . so the diff stays focused.
  • Layout migration to cmd/hawk/main.go + internal/<feature>/ per GOAL.md
    is intentionally out of scope here — it would touch every Go import path
    across the ecosystem and warrants its own PR with coordinated SDK updates.

Test plan

  • make build (or go build ./...)
  • make test on impacted packages with -race
  • go vet ./...
  • CI on this PR will run lint, security scan, multi-platform builds,
    and benchmarks.
  • CI passes (will verify after push)

Patel230 added 30 commits May 14, 2026 19:42
…moval

- Strict golangci-lint config (errcheck, staticcheck, gocritic, unused, etc.)
- Fixed 240+ unchecked error returns in production code (session, engine, tool, config)
- Removed all dead code flagged by unused linter (13 declarations)
- Fixed SA4010 (append result never used) real bugs in mcp/server.go and repomap/depgraph.go
- Added Makefile with standard targets (build, test, lint, security, bench)
- Improved CI: coverage reporting, benchmark on PR, security scanning
- Improved Dockerfile: tini init, timezone data, verified deps
- Added .editorconfig, dependabot.yml, CONTRIBUTING.md
- Comprehensive auth tests (18% → 71% coverage)
- Comprehensive update tests with HTTP mocking (22% → 92% coverage)
- Session package fully errcheck-clean (critical data integrity)
…lates, .gitattributes)

Re-baselines hawk's version to 0.2.0 across every authoritative location and
adds the top-50 OSS standard files that were missing.

Version 0.2.0 set in:
  - main.go (`var Version`)
  - api/server.go (`const Version`)
  - flake.nix (`version = ...`)
  - .github/workflows/release.yml (sister-repo clone branches)
  - api/server_test.go, update/update_test.go (current-version assertions)

CHANGELOG.md gains an [Unreleased] section that captures both this re-baseline
and the production-hardening pass already on this branch (240+ unchecked-error
fixes, dead-code removal, stricter golangci v2 config, expanded CI with
race + 10x flake detection + govulncheck + gosec + multi-platform builds,
Makefile with standard targets, Dockerfile with tini, dependabot, editorconfig,
+71% auth coverage, +92% update coverage). Historical 0.4.0/0.3.0/0.2.0/0.0.1
entries are preserved for reference.

New top-level OSS files:
  - .gitattributes — LF line-ending normalization, binary detection,
    GitHub linguist hints (collapse go.sum, exclude research docs from stats)
  - CODE_OF_CONDUCT.md — Contributor Covenant 2.1
  - .github/PULL_REQUEST_TEMPLATE.md — checklist for contributors
  - .github/ISSUE_TEMPLATE/bug_report.yml — structured bug template
  - .github/ISSUE_TEMPLATE/feature_request.yml — solo-dev fit checks
  - .github/ISSUE_TEMPLATE/config.yml — routes security to advisories,
    questions to discussions, blocks blank issues

Verification:
  - `go build ./...` clean
  - `go vet ./...` clean
  - `go test -race -count=1` passes on api/, update/, cmd/, auth/,
    session/, config/, permissions/, mcp/
  - All modified Go files are gofmt-clean (broader pre-existing gofmt
    drift in the repo is left for a separate follow-up to keep this PR
    focused on the version bump and OSS hygiene)

Note: the table-driven case `{"dev version", "0.4.0", "0.3.9", true}`
in update/update_test.go is intentionally left intact — it tests
isNewer() semver-comparison logic, not the current installed version.
- Added fuzz tests for session JSONL parser (FuzzParseMessage, FuzzParseSessionMeta)
- Added fuzz test for config validator (FuzzValidateSettings)
- Added daemon tests (DefaultConfig, Stats, InvalidMethod, InvalidJSON, GetSession, JSON marshaling)
- Fixed 50+ more errcheck issues (filepath.Walk, bridge.Remember, session.SetPermissionMode, os.Remove, os.Rename)
- Daemon coverage: 34.6% → 45.5%
- Total errcheck reduced from 174 to 102 (additional 42% reduction)
- Restored cmd/chat_commands.go after accidental empty by sed
- Fixed f.Close/f.WriteString errcheck in chat_commands, dx, errors
- Fixed store.Close in cmdhistory_cmd.go
- Fixed rows.Close in cmdhistory/history.go
- Fixed b.store.Close in memory/yaad_bridge.go
- Added internal/testutil/mock_llm.go — configurable mock LLM server for testing
- Added comprehensive BackgroundAgentPool tests (submit, collect, concurrent, waitall)
- Added cmd/version_test.go (SetVersion, SetBuildDate, VersionString, ShortVersion)
- Verified full ecosystem integration via go.work (eyrie, tok, yaad, inspect, sight)
- Fixed all cmd/ errcheck: WAL operations, SetPermissionMode, GenCompletion, LoadEnvFile
- Fixed all engine/ errcheck: ConvoDAG, Memory, CostTracker, Snapshots, hooks
- Fixed cmdhistory rows.Close, db.Close patterns
- Total errcheck: 416 → 65 (84% reduction)
…ining cmd

- Fixed mcp stdin.Write/Close, Process.Kill
- Fixed memory store operations (UpdateNode, StoreGlobal, CodeLinks)
- Fixed onboarding SaveGlobal
- Fixed memory/yaad_bridge store.Close
- Fixed flaky background agent test timing
- Total errcheck: 416 → 35 (92% reduction)
- Remaining 35 are goroutine launches and multi-line expressions
- Fixed parallel/worktree.go os.Remove
- Fixed tool/devenv.go os.Rename
- Fixed tool/transaction.go f.Close
- Fixed snapshot/workspace.go gw.Close, gr.Close
- Total errcheck: 416 → 16 (96% reduction)
- Remaining 16 are unfixable: goroutine launches, recover(), multi-line fmt
- Comprehensive FileTracker tests (RecordRead, RecordModified, ExtractFromMessages, FormatForSummary, ParseFromSummary, Merge, RoundTrip)
- AdaptivePrompt tests (LearnFromFeedback, FormatForPrompt, Count, Persistence)
- Coverage: 73.2% → 73.4%
- Added golden file test infrastructure (testdata/golden/)
- Added CLI diagnostics tests (doctorReport, settingsSummary, mcpConfigSummary, builtInToolsSummary, sessionsSummary)
- Coverage: 73.2% → 73.5%
- Added comprehensive tool metadata tests for all core tools (Bash, Read, Write, Edit, Grep, Glob)
- Added Registry tests (Get, PrimaryTools, EyrieTools, no duplicates)
- Tests verify Name, Description, Parameters, RiskLevel for all registered tools
- Coverage: 73.5%
- docs/architecture.md with package map, data flow, design decisions
- Mermaid-compatible ASCII diagram of system layers
- TestWAL_ConcurrentAppend: 50 goroutines writing to WAL simultaneously
- TestSession_ConcurrentSaveLoad: concurrent reads while session exists
- TestSession_ConcurrentList: concurrent List() calls
- All pass with -race detector
- Onboarding tests: NeedsSetup (with/without keys), validateAPIKey, SaveAPIKeyToEnvFile, Welcome
- Memory tests: Save/Load/List/Search/Consolidate/ExtractFromSession/isMemoryWorthy
- Onboarding coverage: 3.3% → 39%
- Overall coverage: 73.6%
- Added PR code review tests (analyzeDiff, checkLine, formatReview, generateReviewSummary)
- Tests verify detection of hardcoded secrets, TODO comments, fmt.Println usage
- Overall coverage jumped to 77.6%
- Added API server tests (writeJSON, New, MethodNotAllowed, UnknownEndpoint)
- Coverage stable at 73.7%
- Fixed BackgroundAgentPool flaky test (use WaitAll instead of sleep)
- Fixed version tests (remove t.Parallel on global state mutations)
- Added FormatCostDisplay tests (all formatting branches)
- Added DefaultTimeoutConfig, WithTimeout, RemainingTime tests
- Created ChatClient interface (Chat, StreamChatContinue, SetAPIKey)
- Changed Session.client from concrete *client.EyrieClient to ChatClient interface
- Changed CoreLoop.Client from concrete to ChatClient interface
- Created mockClient test helper with canned responses
- Added session mock tests (AddUser, AddAssistant, LoadMessages, Cost, Metrics, Chat)
- This enables future tests to exercise engine streaming without real LLM calls
- TestSession_Stream_MockEndTurn: exercises full agent loop with mock LLM
- TestSession_Stream_MultiTurn: tests multi-turn conversation flow
- Both tests verify events are emitted and LLM is called
- Proves the ChatClient interface works for testing the stream path
- TestSlashCommands_NotEmpty, TestSlashCommands_ContainsEssentials
- TestSlashSuggestions with prefix matching
- TestHasString, TestBranchSummary, TestFilesSummary, TestHooksSummary
- Added chatModel test helper with mock session
- Tests for /help, /version, /clear, /model, /cost, /tokens, /tools, /status
- Batch test for 16 commands (/context, /env, /hooks, /stats, etc.)
- Tests for /new, /copy, /export, /unknown
- cmd coverage: 40.4% → 42%
- Overall: 73.9% → 74.1%
- Added 12 more safe command tests (/config, /yolo, /sandbox, /vim, /effort, etc.)
- Removed commands that trigger streaming (need client wired up)
- Coverage stable at 74.1%
- Added SetTestClient() and NewMockClientForTest() to engine package
- Wired mock client into cmd chatModel test helper
- Enables other packages to test with mock LLM without real API calls
- Coverage: 74.1%
- Set progRef in test chatModel so streaming commands don't nil-panic
- Added /doctor, /commit, /review, /compact to safe command list (non-race)
- Removed /lint and /test which hang (execute real shell commands)
- Coverage: 74.2% (from 73.1% at project start)
- TaskStore tests: Create, Get, List, Update, CreateWithParent
- Coverage: 74.2% — remaining 5.8% requires SQLite schema, filesystem, plugin runtime, TUI test infrastructure
- Fixed splitStatements to handle BEGIN...END blocks (trigger semicolons)
- Added 12 SQLiteStore integration tests (Create, Get, List, Messages, Update, Delete, Fork, Search, Stats, Compact)
- Session coverage: 65.5% → 73.2%
- Overall: 74.4%
Patel230 added 28 commits May 15, 2026 09:10
- Added NotebookEditTool, ConfigTool, BriefTool integration tests
- Fixed go.work to not pull broken inspect/sight local repos (sarif dep issue)
- Coverage: 74.5%
- TaskCreateTool, TaskGetTool, TaskListTool, TaskUpdateTool Execute tests
- CronCreateTool, CronListTool, CronDeleteTool Execute tests
- NotebookEditTool, ConfigTool, BriefTool Execute tests
- Coverage: 74.5%
- CostTracker tests: NewAndRecord, SessionTotal, Entries, LoadCostHistory
- splitJSONLines test
- Coverage: 74.6%
- SteeringQueue tests (Enqueue, Drain, HasPending, Clear, Notify)
- FewShotStore tests (Record, Retrieve, FormatForPrompt)
- PromptTuner tests (RecordOutcome, BestVariant, Report)
- Added tests for containsTag, containsIgnoreCase, toLower, indexOf, extractContext
- Added CleanOldSessions, ExportToMarkdown, SearchSessions integration tests
- Session coverage: 65.5% → 76%+
- Overall coverage: 75.0%
…terbot

6 new modules:
- cron: scheduled agent invocations with concurrency, retries, backoff
- task: long-horizon tasks with plan steps, handoffs, judge verification
- permissions/external_content: security wrapper for untrusted content
- session/coherence: conversational act classification and thread tracking
- session/fork: copy lineage into new thread for branch exploration
- session/provenance: input source tagging (user/system/cron/webhook)
…s, container hot-swap, sub-agent resume

Port features from herm into hawk natively:

- OutlineTool for AST-level signature scanning (10 languages, head/tail fallback)
- GitTool with 16 allowed subcommands, force-push detection, approval checks
- BackgroundAgentManager for fire-and-forget sub-agents with result collection
- agent_id/resume and retry_of support for sub-agent lifecycle management
- DevEnv real Docker build via exec.CommandContext (was no-op)
- RebuildAndForceSwap for mid-session container hot-swap
- Rich sub-agent prompts with exploration strategy and budget management
Sources compared:
- lacymorrow/lacy (shell UX, terminal context, NL detection)
- EleutherAI/lm-evaluation-harness (eval framework, caching, YAML tasks)
- higgsfield-ai/skills (skill ecosystem, multi-agent manifests, validation CI)
- aaif-goose/goose (recipes, malware check, adversary inspector, OAuth, Telegram)
- bmad-code-org/BMAD-METHOD (scale-adaptive, adversarial review, project context)
- aaronjmars/aeon (self-improve, reflect, coding soul)
- SylphAI-Inc/AdalFlow (prompt optimizer, few-shot selector, textual gradients)
- karpathy/autoresearch (autonomous experiment loop)
- Aider-AI/aider (auto-commit, directive scanner, edit strategies)
- opencode-ai/opencode (event bus, diff sandbox patterns)

Key features added:
- Terminal context capture (delta-based)
- Background preheating + connection warmup
- Real-time input classification indicator
- Smart NL rerouting on shell command failure
- Braille spinner animations with shimmer
- Ghost text suggestions (project-aware)
- Mode toggling (shell/agent/auto) with persistence
- Eval CLI (hawk eval run/list/results/cache-clear)
- YAML task definitions + result caching + reproducibility hashes
- Recipe system (YAML-based guided workflows)
- Extension malware check + adversarial/egress inspector
- Langfuse tracing + OAuth device flow + Telegram gateway
- Tool monitor + custom distributions support
- Scale-adaptive intelligence (patch/minor/major/epic)
- Adversarial review + project context + quick-dev workflow
- Checkpoint preview + course correction + investigation workflow
- Party mode (multi-persona) + brainstorming facilitation
- Self-improvement (cross-session learning) + coding soul
- Prompt auto-optimizer (textual gradients, few-shot selection)
- Autonomous experiment loop (modify/validate/keep/discard)
- Agent intelligence (auto-decomposition, pipeline detection, synthesis)
- Auto-commit + directive scanner + edit strategy selector
- Event bus (pub/sub) + clipboard bridge
- Spec-driven development (/spec command)
- Assumption tracker + quality gates + degradation detector
- Multi-repo context loading
- .hawkhints loader + source roots tracking
- Tool inspector (confidence-based) + tool confirmation router
- Large response handler (pagination)
- Background runner (async subagent delegation)
- Compaction trigger (proactive context management)
- LLMClient interface + Reflector (fixed missing types)

All 56 packages pass, 0 failures.
@Patel230 Patel230 merged commit 1722d76 into dev May 16, 2026
1 of 5 checks passed
@Patel230 Patel230 deleted the feat/hawk-production-hardening branch May 16, 2026 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant