scan process revalidate enrich export
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
candidates → findings TP/FP/Fixed verdict → +committers → JSON / md-dir
+ownership
Each stage is a separate CLI subcommand and reads/writes a consistent on-disk representation. Stages are idempotent: re-running merges new information rather than overwriting.
data/<projectId>/
├── project.json # rootPath, githubUrl (auto-managed)
├── INFO.md # repo context injected into AI prompts (manual or agent-written)
├── config.json # priorityPaths, promptAppend, ignorePaths (optional)
├── files/ # one JSON per scanned file (FileRecord)
│ └── path/to/file.ts.json
├── runs/ # one JSON per run (RunMeta)
│ └── 20260429-abcd.json
└── reports/ # generated reports (markdown + JSON)
data/ is gitignored by default. Each FileRecord is the source of truth
for everything deepsec knows about a single source file: candidate
matches, AI findings, analysis history, git committer info, ownership.
Full schemas for every file under data/ are documented in
data-layout.md.
The merge model is additive: every stage adds to the FileRecord. A
re-scan merges new candidates into the existing set; a re-process appends
to analysisHistory and merges new findings; revalidation tags existing
findings with verdicts. Nothing is overwritten or deleted.
- What it does: Glob the project root, run regex matchers on every
matched file, write
candidatesto each FileRecord. - Cost: Free (no AI). ~15s for 2k files.
- Inputs: Project root, matcher set (built-ins + plugin contributions).
- Outputs:
data/<id>/files/**/*.jsonwithcandidatespopulated andstatus: "pending".
The matcher set is built per-run from the default registry plus any matchers contributed by active plugins. Plugin matchers can override built-ins by reusing the same slug.
- What it does: Pick batches of pending files, send each batch to the
configured AI agent backend with the system prompt + INFO.md, parse the
agent's JSON response into
Findings, write them back to each FileRecord. - Cost: $$. The expensive stage.
- Inputs: FileRecords with
status: "pending",INFO.md, the prompt template (packages/processor/src/index.ts:DEFAULT_PROMPT_TEMPLATE). - Outputs: FileRecord
findings[]populated,status: "analyzed",analysisHistory[]appended.
Two agent backends are supported, both routed through Vercel AI Gateway by default:
--agent |
SDK | Default model |
|---|---|---|
codex (default) |
@openai/codex-sdk |
gpt-5.5 |
claude |
@anthropic-ai/claude-agent-sdk |
claude-opus-4-7 |
Same prompt, same JSON output schema. You can mix backends within a project — re-process a file with a different agent and the second run's findings get merged with the first.
Concurrency: --concurrency 5 --batch-size 5 means 5 batches in flight,
5 files per batch = 25 files in the air at peak. The processor claims
files atomically via lockedByRunId so multiple workers can run in
parallel without stepping on each other.
- What it does: Re-check existing findings for false positives. The
agent re-reads the code, consults git history (was this fixed?), and
emits a verdict:
true-positive,false-positive,fixed, oruncertain. - Cost: $$. Comparable to
process. Worth running on HIGH+. - Inputs: Findings with no
revalidationfield, or with--force. - Outputs:
revalidation: { verdict, reasoning, … }on each finding.
Empirically reduces FP rate by 50%+ on most repos.
- What it does: Attach git committer info and (with a plugin) ownership data to FileRecords with findings.
- Cost: Free if no ownership plugin; otherwise one HTTP round-trip per file to the ownership provider.
- Inputs: FileRecords with findings, the project's git history.
- Outputs:
gitInfo: { recentCommitters, ownership }on each record.
Read-only stages. Don't modify FileRecords; just shape the data for human or downstream consumption.
- export — flat list of findings as JSON or directory of markdown.
- report — per-project markdown summary + JSON.
- metrics — cross-project counts and TP rates.
Five extension points, all defined in
packages/core/src/plugin.ts:
matchers— additivenotifiers— additiveagents— additiveownership— single-slot (last plugin wins)people— single-slotexecutor— single-slot
A plugin registers via deepsec.config.ts:
export default defineConfig({
plugins: [vercel(), myPlugin()],
});The CLI calls loadConfig() before parsing args, builds a PluginRegistry
from the active plugins, and stashes it on a module-level singleton
(getRegistry()). All internal code consults the registry rather than
hard-coding integrations.
See docs/plugins.md for the full plugin authoring guide.
-
One file = one FileRecord. The unit of work is a source file, not a finding. Scanner, processor, and revalidator all operate on files, so atomic per-file locking and idempotent merges fall out naturally.
-
Append-only analysis history. Re-running the processor doesn't overwrite past findings. It appends a new entry to
analysisHistoryand merges new findings (deduped by slug + title) intofindings. You can re-run with a different agent, prompt, or model and get a strict improvement instead of a destructive replacement. -
Plugin-mediated integrations. Matchers, notifiers, ownership sources, and the remote executor all sit behind plugin contracts. The open-source release ships with a generic core; organization-specific matchers, notifiers, ownership oracles, and people directories slot in as external plugins.