Second product in the Ready Armor Suite, after Battle Ready Armor (BRA). Follows the same Control in Depth philosophy.
Containment for what agents do. Provenance for what they produce.
A runtime substrate for AI agents.
ARA sits between an agent and the world it operates in. It governs what the agent is permitted to do, narrows that surface for each task, and makes the agent's work auditable end to end. Operators stay in control without standing over the agent's shoulder.
Built for the demands of professional offensive security work — where the consequences of an unconstrained or unverifiable agent are immediate and external — and applicable, by design, to any agentic deployment that needs the same guarantees.
Read more about it via a much nicer on the eyes self-contained HTML page HERE
Read about its sister software, Battle Ready Armor (BRA) HERE
Grab the free tier of BRA HERE
Agents fail in two distinct ways. ARA defends each on its own track.
| What the agent does | What the agent claims | |
|---|---|---|
| The risk | Unauthorized action against systems, data, identity, third parties | Output that looks correct but isn't grounded in real evidence |
| The control | Containment of capability | Provenance for every claim |
| The promise | The agent only does what policy allows | Every claim ties back to verifiable source |
ARA's design treats both as equally first-class. Neither is bolted on to a model, a prompt, or a wrapper script — both are properties of the runtime the agent runs inside.
Most policy systems for agents conflate three different things: identity, engagement context, and the procedure being executed. ARA keeps them separate and composes them.
Armor — the operator's posture. Persistent, hand-curated. Says who the agent is across every engagement. Hotswappable · exportable · tradable · mechanically enforced.
Trim — the engagement overlay. Generated per engagement, lives for the length of the engagement, then expires. Says where, when, against whom, under what rules of engagement. Hotswappable · exportable · tradable · mechanically enforced.
Weapon — the procedure. Forged once from source materials, sealed, replayable. Says what task this run is performing — with citations, not recall. Hotswappable · exportable · tradable · mechanically enforced.
The three layers compose deterministically. The agent's effective policy at any moment is the product of all three; no layer can soften another's denial.
ARA is a small constellation of named components. Operators learn them in this order:
| Name | Role |
|---|---|
| Armory | Operator console. Localhost web app, two-token auth (admin + viewer), live updates throughout. |
| Courtyard | Runtime view. What every contained agent is doing right now — calls, decisions, approvals, drift signals. |
| Engagement | Bounded unit of work. Canonical intake record + mats pool + applied trim + equipped weapon, signed-exportable. |
| Armor | Persistent identity layer. Named loadouts with their own sigil, metadata, and stackable absolutes. |
| Trim | Engagement overlay. Marker-wrapped, hot-loaded on apply, perfectly reversible on release. |
| Mats | Source-material pool. Class-partitioned by what each file is, manifested with identity + origin, posture-gated. |
| Forge | Authoring pipeline. Multi-format ingestion, calibration-gated seal, end-to-end citation, replay-safe output. |
| Weapons | Sealed procedures. Hash-stamped, equipped reversibly, with a per-weapon full-text search index. |
| Bulwark | Unified policy substrate. Eleven layers, five actions, ttl + counter + then-mode modifiers, deterministic walk. |
| Rhema | LLM mediation. Ask + reply layers attach blocks, rewrites, approvals, or shadow-tests; first-use auto-classifies. |
| Tools | Tool classification registry. OS-agnostic keys on (tool, flag-pattern), confidence + status + TTL'd review state. |
| TOD | Time-of-day enforcement. Weekly windows in operator timezone with the four-way out-of-window decision flow. |
| Canaries | Honeytokens seeded into filesystem paths. Reply-layer rule fires the moment the agent echoes one back. |
| Shadows | Policy A/B testing. Variant rules with side-by-side request + response diffs and auto-promote readiness scoring. |
| Baseline | "What one good run looked like." Snapshot a clean session; later runs diff against it for one-click rule promotion. |
| Audit | Tamper-evident ledger. Cryptographically chained, durably appended, attestable signed exports. |
Every one of these has its own surface in the Armory.
Most agent frameworks let the model decide its own playbook on the fly. The agent reads the docs, draws inferences, and picks its next step. That works until it doesn't — until the agent invents a step that wasn't actually documented, cites a procedure that doesn't exist, or confidently extrapolates past what the source material supports.
ARA inverts this. The agent doesn't compose its playbook. It executes one that was authored, validated, and sealed before the engagement started.
The Forge is what produces those playbooks. It is the most operationally distinctive piece of ARA — and the most intellectually unique.
The agent forges the agent's own weapons. This is the proof of the substrate. When the operator opens a forge session, ARA hot-swaps in a special posture — the blacksmith armor + the blacksmith-hammer weapon — and the agent itself runs the authoring, under full containment. Tool surface narrowed to a small allowlist of text-extraction utilities; outputs constrained to a typed set of extraction channels. No shells, no spawning, dangerous filesystem paths denied, credential env vars scrubbed. Source materials never leave the contained workspace. The blacksmith is itself an ARA armor; the hammer is itself a sealed weapon. ARA's substrate is its own first proof — the same primitives that govern an engagement also govern the work that produced the engagement's procedure.
A multi-level schema as output. What emerges from a forge session isn't a flat playbook. It's a structured weapon record with several typed layers, each derived from operator-deposited materials and cross-citing each other. Some layers carry the vocabulary the weapon operates over; some carry the procedure the agent will traverse; some carry the ground truth the engagement starts with; some carry the resources the agent draws on; some carry the guardrails the agent never crosses; one carries the calibrated voice in which findings are written.
Every layer cites the source it was derived from. Every finding the agent eventually ships traces back through this lattice to a specific location in a specific source document. There is no "emergent" claim — only retrieved-and-explained.
Designed against ten cognitive-science approaches. The Forge's schema isn't an arbitrary engineering choice. It's the output of a structured review where every published agent-memory mechanism we could find was mapped onto an ARA primitive and either approved, deferred, or skipped on its merits. Ten of them ship as foundational in v1; several more have their infrastructure prepared for post-v1 promotion. Each one produces a structural property of the sealed weapon that no model could fabricate at runtime — long-horizon retrieval, anchored ground truth, encoded vocabularies, retrievability decay, predictive expectation, cross-modal channel discipline, use-history priors, selective retention across versions.
Multi-format ingestion, no pre-processing. Source materials go in as the operator already has them — text files, logs, PDFs, CSVs, spreadsheets (XLSX / XLS), Word documents, HTML, even screenshots and scanned images that get OCR'd inline. Methodologies, internal SOPs, vendor playbooks, client runbooks, RFCs, regulatory checklists, exported tickets — whatever shape the source happens to be in. Operators don't reformat, flatten, or strip anything; the Forge does the work.
Multi-stage authoring. Whatever lands gets run through a structured ingestion and authoring pipeline and emerges as a sealed, typed procedure graph: every step has a name, a precondition, a tool surface, exit conditions, and a citation back to a specific location in the original source — including the page and region of an image, not a flattened transcript of it.
End-to-end provenance. A finding the agent ships at the end of an engagement traces back to a step in the procedure, which traces back to a paragraph in the source material. There is no claim that doesn't have a chain. Provenance isn't a logging feature — it's an authoring-time invariant the Forge enforces structurally.
Calibration-gated seal. The Forge can't seal a weapon whose retrieval doesn't pass an operator-authored test. The operator writes a query set with expected results; before seal, the weapon's recall is measured against it. If accuracy falls below the operator's threshold, the seal blocks — and they find out before an engagement runs that the weapon won't surface the right things. Most knowledge bases hope retrieval works. The Forge proves it.
Conflict-aware merging. When two source documents disagree on the same data point, the Forge emits a Conflict record rather than silently picking one. Operators see the competing values side-by-side and resolve before seal. No precedence rule quietly deciding what an engagement believes.
Replicable, swappable extraction. The default extractors run deterministically — the same materials yield the same output on every run. No run-to-run drift, no vendor lock-in, no surprise behavior when a model version changes overnight. Operators who want model-assisted extraction can swap in an alternate under the same interface; results stay comparable across both. The substrate isn't hostage to any one model.
Knowledge-gap awareness. The Forge knows what facts an engagement will need to populate before each step is meaningful. Steps whose preconditions are already known auto-skip; steps whose preconditions aren't get fired automatically when their facts arrive. The agent doesn't re-run reconnaissance it doesn't need; it doesn't try to exploit something whose prerequisite was never established.
Determinism by construction. A sealed weapon is replay-safe. The same source materials plus the same engagement produce an equivalent procedure trace, every time. No "the model was creative today." If something changed in the output, something changed in the inputs — and the Forge can tell you which.
Reforge as typed delta. Re-forging from a predecessor weapon produces a structured, typed proposal of every addition, removal, and modification — which the operator reviews and approves before the new weapon is sealed. Weapon evolution is auditable: every change has a name, a citation, and an approver. No mystery "why does this weapon behave differently than last week."
The result: the agent gets a procedure that's known to be exhaustive against the source material, known to be cited end-to-end, and known to be the same thing it was the last time. The operator gets output they can hand to a client with confidence.
A typical engagement walks through four operator touchpoints.
- Curate an armor. Once. The armor encodes the operator's posture — what's never permitted regardless of task. Most operators maintain one or two armors and reuse them.
- Open an engagement. Intake metadata in, target / scope / time window / rules of engagement out. The Trim Wizard derives the engagement-scoped overlay; the operator reviews and applies.
- Forge or equip a weapon. If the engagement needs a procedure that doesn't exist yet, drop the source materials in and open a forge session — ARA hot-swaps in the blacksmith armor + blacksmith-hammer and the agent itself does the authoring, under containment. If a sealed weapon already fits, equip it.
- Run preflight, then run the agent. Preflight validates the engagement state across roughly a dozen axes — armor, trim, test window, time-of-day windows, rate limits, declared authorizations, posture-and-mat consistency, weapon binding. If preflight is green, the agent runs. The Armory shows everything live.
Approvals, decisions, and findings flow into the audit ledger as the agent works. At the end, the operator exports a signed session package. Done.
Pen testing is the hardest case for agentic AI today, which makes it the right shape for proving the substrate works:
- The cost of an agent doing the wrong thing is immediate and external — disruption to a target, a contract violation, an unauthorized probe of an out-of-scope asset.
- Authorization isn't binary; it's scoped, time-bounded, and conditional. Real engagements have rules of engagement. Most agent frameworks don't model them.
- Findings ship to clients. Hallucinated findings ship to clients too, unless something stops them. The cost of a fabricated vulnerability report is reputational damage that no amount of "the model's getting better" can recover.
- Engagements are ephemeral, parameterized, and compositional — exactly the surface a policy substrate has to express to be useful.
If ARA can make agents safe and accountable in offensive security, the same substrate covers softer use cases without modification.
The same primitives generalize cleanly:
- Customer-facing assistants — armor encodes brand and safety posture; trims encode per-customer scope and SLA; weapons encode approved task playbooks. Audit ships to compliance.
- Internal data platforms — armor restricts the agent to permitted data domains; trims represent per-query authorization; weapons enforce the analytic playbook the agent must follow.
- Regulated research — provenance for every cited claim; rate limits to satisfy upstream API agreements; preflight to assert authorization is in place before any external request.
- Long-running operations — TOD enforcement for change windows, stop conditions for budget caps, and a first-use registry that lets operators sleep while the agent operates within rails they pre-approved.
ARA is not a wrapper for one runtime, one model, or one provider — it is a layer the runtime runs inside.
ARA is self-hosted, single-operator-friendly, and licensed for internal use under selective preview. We are open to investment and partnership opportunities.
Preview access by invitation.