data/ is deepsec's on-disk state. Each project owns a subdirectory; the
files inside are append-only across runs.
data/<projectId>/
├── project.json # rootPath, githubUrl (auto-managed)
├── INFO.md # repo context injected into AI prompts
├── config.json # priorityPaths, promptAppend, ignorePaths (optional)
├── files/ # one JSON per scanned source file (FileRecord)
│ └── path/to/source.ts.json
├── runs/ # one JSON per run (RunMeta)
│ └── 20260429215021-19ac.json
└── reports/ # generated markdown + JSON reports
data/ is gitignored by default. To version it (CI, sharing across
machines), commit it explicitly.
The schemas below are the source of truth for any tool that reads
data/ directly. They live in
packages/core/src/types.ts.
Auto-written on first scan; safe to edit by hand.
| Field | Type | Purpose |
|---|---|---|
projectId |
string |
Matches the directory name. |
rootPath |
string |
Absolute path to the codebase. Updated each scan with the most recent --root. |
createdAt |
string (ISO) |
Project init time. |
githubUrl |
string? |
https://github.com/owner/repo/blob/branch — used for clickable links in exports. Auto-detected from git remote if not set. |
Optional. Read by scan and the AI agents.
| Field | Type | Purpose |
|---|---|---|
priorityPaths |
string[] |
Path prefixes processed first. |
promptAppend |
string |
Free-form text appended to the system prompt for this project. |
ignorePaths |
string[] |
Glob patterns to skip during scan. |
Free-form markdown injected into the AI prompt for process,
triage, and revalidate. See
getting-started.md for the agent prompt that
writes a good one.
The core per-file accumulator. Every stage adds to this record;
nothing is overwritten. Re-scanning merges new candidates.
Re-processing appends to analysisHistory. Revalidation annotates
findings rather than replacing them.
The on-disk path mirrors the source path under <rootPath> plus a
.json suffix (src/api/auth.ts → files/src/api/auth.ts.json).
| Field | Type | Purpose |
|---|---|---|
filePath |
string |
Path relative to rootPath. |
projectId |
string |
The owning project. |
candidates |
CandidateMatch[] |
Regex matcher hits — see below. |
lastScannedAt |
string (ISO) |
Most recent scan timestamp. |
lastScannedRunId |
string |
runId of the scan that last touched this file. |
fileHash |
string (sha-256) |
Source content hash at last scan. |
findings |
Finding[] |
Latest set of AI-produced findings. |
analysisHistory |
AnalysisEntry[] |
Append-only log of every AI investigation. |
gitInfo |
object? |
Git committer info + ownership data, written by enrich. |
status |
"pending" | "processing" | "analyzed" | "error" |
Lifecycle state. |
lockedByRunId |
string? |
When non-empty, a run holds this file. Cleared on completion. |
| Field | Type | Purpose |
|---|---|---|
vulnSlug |
string |
Matcher slug that fired. |
lineNumbers |
number[] |
1-indexed source lines. |
snippet |
string |
Short excerpt around the first match. |
matchedPattern |
string |
Human-readable label of the regex (the matcher's label). |
| Field | Type | Purpose |
|---|---|---|
severity |
"CRITICAL" | "HIGH" | "MEDIUM" | "HIGH_BUG" | "BUG" | "LOW" |
See README severity table. |
vulnSlug |
string |
Matcher slug or other-<topic> if no matcher fits. |
title |
string |
One-sentence summary. |
description |
string |
Full explanation. |
lineNumbers |
number[] |
1-indexed lines. |
recommendation |
string |
Suggested fix. |
confidence |
"high" | "medium" | "low" |
The agent's self-rated confidence. |
triage |
Triage? |
Set by triage. |
revalidation |
Revalidation? |
Set by revalidate. |
| Field | Type | Purpose |
|---|---|---|
priority |
"P0" | "P1" | "P2" | "skip" |
Recommended action bucket. |
exploitability |
"trivial" | "moderate" | "difficult" |
Effort to weaponize. |
impact |
"critical" | "high" | "medium" | "low" |
Blast radius if exploited. |
reasoning |
string |
Why this priority. |
triagedAt |
string (ISO) |
Timestamp. |
model |
string |
Model used for triage. |
| Field | Type | Purpose |
|---|---|---|
verdict |
"true-positive" | "false-positive" | "fixed" | "uncertain" |
Re-checked verdict. |
reasoning |
string |
Why this verdict. Includes git-history evidence if fixed. |
adjustedSeverity |
Severity? |
Set if revalidation re-rates the finding. |
revalidatedAt |
string (ISO) |
Timestamp. |
runId |
string |
runId of the revalidate run. |
model |
string |
Model used. |
One per AI investigation of this file. Append-only — nothing is ever deleted.
| Field | Type | Purpose |
|---|---|---|
runId |
string |
The producing run. |
investigatedAt |
string (ISO) |
Timestamp. |
durationMs |
number |
Wall-clock total. |
durationApiMs |
number? |
API time only (excludes process orchestration). |
agentType |
string |
claude-agent-sdk or codex. |
model |
string |
Model identifier. |
modelConfig |
Record<string, unknown> |
Provider-specific settings echoed back. |
agentSessionId |
string? |
The agent's session/thread id, for reproducing or replaying. |
findingCount |
number |
Findings produced in this entry. |
numTurns |
number? |
Conversation turn count. |
costUsd |
number? |
Estimated USD cost. |
usage |
{ inputTokens, outputTokens, cacheReadInputTokens, cacheCreationInputTokens }? |
Token accounting. |
refusal |
RefusalReport? |
See models.md. |
codexStderr |
string? |
Captured codex stderr when an investigation produced 0 output tokens (forensic only). |
reinvestigateMarker |
number? |
Wave marker from --reinvestigate <N>. |
| Field | Type | Purpose |
|---|---|---|
refused |
boolean |
True if the agent skipped or declined any part of the investigation. |
reason |
string? |
Free-form reason if refused. |
skipped |
Array<{ filePath?: string; reason: string }>? |
Per-file skip reasons. |
raw |
string? |
Trimmed raw model response to the follow-up question, for debugging. |
| Field | Type | Purpose |
|---|---|---|
recentCommitters |
Array<{ name, email, date }> |
Top contributors over the file's recent history. |
enrichedAt |
string (ISO) |
Last enrich timestamp. |
ownership |
OwnershipData? |
If an ownership plugin is active, structured ownership/escalation data. |
pending -- scan finished, awaits AI
↓
processing -- a run is currently investigating (lockedByRunId set)
↓
analyzed -- AnalysisEntry appended; findings updated
error is set if the agent crashed mid-investigation. Re-running
process will retry error and pending files.
One per scan / process / revalidate invocation. Used for status
reporting (deepsec status) and for filtering exports by run.
| Field | Type | Purpose |
|---|---|---|
runId |
string |
<YYYYMMDDHHMMSS>-<rand4>. Sortable. |
projectId |
string |
Owning project. |
rootPath |
string |
Resolved root for the run. |
createdAt |
string (ISO) |
Run start. |
completedAt |
string? (ISO) |
Run end (absent while running). |
type |
"scan" | "process" | "revalidate" |
Stage. |
phase |
"running" | "done" | "error" |
Terminal status. |
scannerConfig |
{ matcherSlugs }? |
Set on scan runs. |
processorConfig |
{ agentType, model, modelConfig }? |
Set on process / revalidate runs. |
stats |
object |
Counters: filesScanned, candidatesFound, findingsCount, totalCostUsd, truePositives, falsePositives, … |
Generated by deepsec report. One markdown per project plus a JSON
summary. Re-running report overwrites; nothing here is incremental.
The append-only model means a few patterns work well:
- Find every TP HIGH+ finding across a project:
jq -r '. as $r | $r.findings[] | select(.revalidation.verdict=="true-positive") | select(.severity=="HIGH" or .severity=="CRITICAL") | [$r.filePath, .severity, .title] | @tsv' data/<id>/files/**/*.json
- Total spend on a project:
jq -s 'map(.analysisHistory[].costUsd // 0) | add' data/<id>/files/**/*.json
- Files still pending after a run:
jq -r 'select(.status=="pending") | .filePath' data/<id>/files/**/*.json
For richer queries, prefer deepsec export --format json — it applies
filters consistently with the rest of the CLI.