Accenture
diff --git a/‎README.md‎
Lines changed: 19 additions & 8 deletions b/‎README.md‎
Lines changed: 19 additions & 8 deletions
diff --git a/‎docs/ai/confidence.md‎
Lines changed: 5 additions & 5 deletions b/‎docs/ai/confidence.md‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎docs/ai/interaction-score.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/ai/interaction-score.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/ai/overrides.md‎
Lines changed: 17 additions & 0 deletions b/‎docs/ai/overrides.md‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎docs/cli-examples.md‎
Lines changed: 6 additions & 6 deletions b/‎docs/cli-examples.md‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎docs/cli-reference.md‎
Lines changed: 14 additions & 3 deletions b/‎docs/cli-reference.md‎
Lines changed: 14 additions & 3 deletions
diff --git a/‎docs/compliance.md‎
Lines changed: 16 additions & 0 deletions b/‎docs/compliance.md‎
Lines changed: 16 additions & 0 deletions
@@ -25,18 +25,25 @@ MethodAtlas addresses this by turning an existing Java test suite (JUnit 5, JUni
 
 ## Key capabilities
 
-- **Deterministic test discovery** — JavaParser AST analysis; no inference, no false positives on method existence
+- **Deterministic test discovery** — JavaParser AST analysis; no inference, no false positives on method existence; JUnit 5, JUnit 4, and TestNG detected automatically from import declarations
 - **SARIF 2.1.0 output** — first-class integration with static analysis platforms and IDE tooling
 - **AI security classification** — classifies each test method against a closed security taxonomy; supports Ollama, OpenAI, Anthropic, Azure OpenAI, Groq, xAI, GitHub Models, Mistral, and OpenRouter
 - **Confidence scoring** — per-method decimal score (`-ai-confidence`); filter by threshold for audit packages
 - **Content hash fingerprints** — SHA-256 of the class AST text (`-content-hash`); all methods in the same class share the same hash; enables incremental scanning and change detection
+- **AI result cache** — reuse previous AI classifications by hash (`-ai-cache`); unchanged classes cost zero API calls
 - **Tag vs AI drift detection** — `-drift-detect` flags methods where `@Tag("security")` in source disagrees with the AI classification
+- **Classification overrides** — `-override-file` records human-reviewed corrections; overrides persist across re-runs and set confidence to `1.0` or `0.0`
+- **Delta report** — `-diff` compares two CSV scans and emits a change report: methods added, removed, or modified between runs; useful for CI regression gates
+- **Security-only filter** — `-security-only` suppresses non-security methods from CSV/plain output; applied automatically in SARIF mode
+- **Mismatch limit** — `-mismatch-limit` safety gate for `-apply-tags-from-csv`; aborts without touching source files when the CSV diverges from the current codebase
 - **GitHub Actions annotations** — `-github-annotations` emits inline PR annotations for security-relevant methods without requiring a GitHub Advanced Security licence
 - **Apply-tags** — writes AI-suggested `@DisplayName` and `@Tag` annotations back into source files; idempotent
+- **Apply-tags-from-csv** — applies human-reviewed annotation decisions from a CSV back to source; separates the review step from the write-back
 - **Manual AI workflow** — two-phase prepare/consume workflow for environments where API access is blocked
 - **Local inference** — Ollama support keeps source code entirely within your network
 - **YAML configuration** — share scan settings across a team or CI pipeline without repeating CLI flags
 - **Custom taxonomy** — supply an external taxonomy file aligned to ISO 27001, NIST SP 800-53, PCI DSS, or your own controls framework
+- **Scan provenance** — `-emit-metadata` prepends tool version and timestamp to CSV; embed in evidence packages
 - **Multiple output modes** — CSV (default), plain text, SARIF, and GitHub Actions annotations
 
 ## Quick start
@@ -101,9 +108,9 @@ For each discovered JUnit test method, MethodAtlas emits one record.
 ### CSV (default)
 
 ```csv
-fqcn,method,loc,tags,ai_security_relevant,ai_display_name,ai_tags,ai_reason,ai_interaction_score
-com.acme.auth.LoginTest,testLoginWithValidCredentials,12,,true,SECURITY: auth - validates session token,security;auth,Verifies session token is issued on successful login.,0.0
-com.acme.util.DateTest,format_returnsIso8601,5,,false,,,,0.1
+fqcn,method,loc,tags,display_name,ai_security_relevant,ai_display_name,ai_tags,ai_reason,ai_interaction_score
+com.acme.auth.LoginTest,testLoginWithValidCredentials,12,,,true,SECURITY: auth - validates session token,security;auth,Verifies session token is issued on successful login.,0.0
+com.acme.util.DateTest,format_returnsIso8601,5,,,false,,,,0.1
 ```
 
 ### SARIF 2.1.0
@@ -182,10 +189,10 @@ Pass `-ai-confidence` to add a `0.0–1.0` confidence score per method:
 
 ```bash
 ./methodatlas -ai -ai-confidence /path/to/tests | \
-  awk -F',' 'NR==1 || ($10+0) >= 0.7'   # keep only high-confidence findings
+  awk -F',' 'NR==1 || ($11+0) >= 0.7'   # keep only high-confidence findings
 ```
 
-`ai_confidence` is column 10 in standard output (column 11 when `-content-hash` is also passed).
+`ai_confidence` is column 11 in standard output (column 12 when `-content-hash` is also passed).
 
 | Score | Meaning |
 | --- | --- |
@@ -302,12 +309,16 @@ Full documentation is available at [accenture.github.io/MethodAtlas](https://acc
 
 | Document | Contents |
 | --- | --- |
-| [docs/cli-reference.md](docs/cli-reference.md) | Complete option reference, YAML schema, and example commands |
+| [docs/cli-reference.md](docs/cli-reference.md) | Complete option reference, YAML schema, exit codes, and example commands |
 | [docs/output-formats.md](docs/output-formats.md) | CSV, plain text, SARIF, and GitHub Annotations format descriptions |
 | [docs/usage-modes/](docs/usage-modes/index.md) | All operating modes: static inventory, API AI, manual workflow, apply-tags, apply-tags-from-csv, delta, security-only |
 | [docs/ai/providers.md](docs/ai/providers.md) | Per-provider setup: Ollama, OpenAI, Anthropic, Azure OpenAI, Groq, xAI, GitHub Models, Mistral, OpenRouter |
+| [docs/ai/overrides.md](docs/ai/overrides.md) | Classification override file: format, governance, and CI integration |
 | [docs/ai/confidence.md](docs/ai/confidence.md) | Confidence scoring: interpretation and threshold guidance |
 | [docs/ai/caching.md](docs/ai/caching.md) | AI result caching: skip unchanged classes, two-pass SARIF pattern, CI cache key strategy |
 | [docs/ai/drift-detection.md](docs/ai/drift-detection.md) | Tag vs AI drift detection: detecting stale `@Tag("security")` annotations |
+| [docs/ai/interaction-score.md](docs/ai/interaction-score.md) | Placebo-test detection: interaction-score semantics and CI thresholds |
+| [docs/compliance.md](docs/compliance.md) | Compliance framework mapping: OWASP SAMM, NIST SSDF, ISO 27001, DORA; reproducibility statement |
 | [docs/deployment/](docs/deployment/index.md) | Regulated environment guidance: PCI-DSS, ISO 27001, NIST SSDF, DORA, SOC 2, air-gapped |
-| [docs/concepts/data-governance.md](docs/concepts/data-governance.md) | What data is submitted to AI providers and data residency options |
+| [docs/deployment/onboarding.md](docs/deployment/onboarding.md) | Onboarding a brownfield codebase: six-phase progression from static scan to CI gate |
+| [docs/concepts/data-governance.md](docs/concepts/data-governance.md) | What data is submitted to AI providers, data residency options, enterprise secret management |
@@ -34,10 +34,10 @@ In regulated environments, test coverage evidence submitted to auditors needs to
 Output (CSV excerpt):
 
 ```csv
-fqcn,method,loc,tags,ai_security_relevant,ai_display_name,ai_tags,ai_reason,ai_confidence
-com.acme.crypto.AesGcmTest,roundTrip_encryptDecrypt,18,,true,SECURITY: crypto - AES-GCM round-trip,security;crypto,Verifies ciphertext and plaintext integrity under AES-GCM.,1.0
-com.acme.auth.SessionTest,sessionToken_isRotatedAfterLogin,12,,true,SECURITY: auth - session token rotation after login,security;auth,Session token is replaced on successful login to prevent fixation.,0.7
-com.acme.util.DateFormatterTest,format_returnsIso8601,5,,false,,,Test verifies date formatting output only.,0.0
+fqcn,method,loc,tags,display_name,ai_security_relevant,ai_display_name,ai_tags,ai_reason,ai_interaction_score,ai_confidence
+com.acme.crypto.AesGcmTest,roundTrip_encryptDecrypt,18,,,true,SECURITY: crypto - AES-GCM round-trip,security;crypto,Verifies ciphertext and plaintext integrity under AES-GCM.,0.0,1.0
+com.acme.auth.SessionTest,sessionToken_isRotatedAfterLogin,12,,,true,SECURITY: auth - session token rotation after login,security;auth,Session token is replaced on successful login to prevent fixation.,0.0,0.7
+com.acme.util.DateFormatterTest,format_returnsIso8601,5,,,false,,,Test verifies date formatting output only.,0.0,0.0
 ```
 
 ## Filtering high-confidence findings
@@ -47,5 +47,5 @@ Because the output is plain CSV, standard shell tools work:
 ```bash
 # Keep only rows where ai_confidence >= 0.7
 ./methodatlas -ai -ai-confidence /src | \
-  awk -F',' 'NR==1 || ($9+0) >= 0.7'
+  awk -F',' 'NR==1 || ($11+0) >= 0.7'
 ```
@@ -57,9 +57,9 @@ When AI enrichment is enabled, every row in CSV and plain-text output carries th
 
 **CSV:**
 ```
-fqcn,method,loc,tags,ai_security_relevant,ai_display_name,ai_tags,ai_reason,ai_interaction_score
-com.acme.AuthTest,shouldValidatePassword,8,security,true,SECURITY: ...,security;auth,Validates...,0.0
-com.acme.AuthTest,shouldInvokeEncoder,5,security,true,SECURITY: ...,security;auth,Calls encoder.,1.0
+fqcn,method,loc,tags,display_name,ai_security_relevant,ai_display_name,ai_tags,ai_reason,ai_interaction_score
+com.acme.AuthTest,shouldValidatePassword,8,security,,true,SECURITY: ...,security;auth,Validates...,0.0
+com.acme.AuthTest,shouldInvokeEncoder,5,security,,true,SECURITY: ...,security;auth,Calls encoder.,1.0
 ```
 
 **Plain text:**
@@ -81,7 +81,7 @@ gate without tuning.
 ```bash
 # Print security-relevant tests with interaction score ≥ 0.8
 ./methodatlas -ai -security-only src/test/java \
-  | awk -F',' 'NR==1 || $9+0 >= 0.8' \
+  | awk -F',' 'NR==1 || $10+0 >= 0.8' \
   > weak-security-tests.csv
 ```
 
 
@@ -218,3 +218,20 @@ applied on top.
 
 See [CLI reference — `-override-file`](../cli-reference.md#-override-file) for
 the full flag description.
+
+## Governance and review cadence
+
+The override file is a living document that records human classification decisions. Without a defined review cadence, entries can become stale — referencing methods that were renamed, removed, or whose security relevance changed as the codebase evolved.
+
+Recommended practices:
+
+**Trigger-based review (minimum):** review the override file whenever:
+- A class named in an override entry is renamed or moved
+- A sprint introduces new test methods in a class that has class-level overrides
+- A security review flags a method that is currently overridden to `securityRelevant: false`
+
+**Time-based review (regulated environments):** review the entire file at each release candidate or at a fixed calendar interval (e.g. quarterly). The review should confirm that each entry's `note` field describes a rationale that still applies.
+
+**Process:** store the override file in version control alongside the source. Each change to the file constitutes a PR; the PR description and approval record serve as the audit trail. In regulated environments, require a minimum of one security team reviewer on override file PRs separate from the developer who made the change.
+
+For organisations where the override file is owned by a dedicated security team and delivered from a separate repository, see [Remote Override Sources](remote-overrides.md).
@@ -86,7 +86,7 @@ export ANTHROPIC_API_KEY=sk-ant-...
 
 # Filter high-confidence findings (requires -ai-confidence)
 ./methodatlas -ai -ai-confidence /path/to/tests | \
-  awk -F',' 'NR==1 || ($9+0) >= 0.7'
+  awk -F',' 'NR==1 || ($11+0) >= 0.7'
 ```
 
 ## Source write-back
@@ -127,11 +127,11 @@ Running against a mix of functional and cryptographic test classes:
 Produces output such as:
 
 ```csv
-fqcn,method,loc,tags,ai_security_relevant,ai_display_name,ai_tags,ai_reason
-org.egothor.methodatlas.MethodAtlasAppTest,csvMode_detectsMethodsLocAndTags,22,,false,,,Test verifies functional output format only.
-zeroecho.core.alg.aes.AesGcmCrossCheckTest,aesGcm_stream_vs_jca_ctxOnly_crosscheck,52,,true,SECURITY: crypto - cross-check AES-GCM stream encryption with JCA reference,security;crypto,Verifies custom AES-GCM matches JCA output — ensures cryptographic correctness.
-zeroecho.core.alg.aes.AesLargeDataTest,aesGcmLargeData_ctxOnly,27,,true,SECURITY: crypto - AES-GCM round-trip with context-only parameters,security;crypto,Tests encryption and decryption correctness for large data using AES-GCM.
-zeroecho.core.alg.mldsa.MldsaLargeDataTest,mldsa_complete_suite_streaming_sign_verify_large_data,24,,true,SECURITY: crypto - ML-DSA streaming signature and verification for large data,security;crypto;owasp,Validates ML-DSA signature creation and verification including tamper detection.
+fqcn,method,loc,tags,display_name,ai_security_relevant,ai_display_name,ai_tags,ai_reason,ai_interaction_score
+org.egothor.methodatlas.MethodAtlasAppTest,csvMode_detectsMethodsLocAndTags,22,,,false,,,Test verifies functional output format only.,0.0
+zeroecho.core.alg.aes.AesGcmCrossCheckTest,aesGcm_stream_vs_jca_ctxOnly_crosscheck,52,,,true,SECURITY: crypto - cross-check AES-GCM stream encryption with JCA reference,security;crypto,Verifies custom AES-GCM matches JCA output — ensures cryptographic correctness.,0.0
+zeroecho.core.alg.aes.AesLargeDataTest,aesGcmLargeData_ctxOnly,27,,,true,SECURITY: crypto - AES-GCM round-trip with context-only parameters,security;crypto,Tests encryption and decryption correctness for large data using AES-GCM.,0.0
+zeroecho.core.alg.mldsa.MldsaLargeDataTest,mldsa_complete_suite_streaming_sign_verify_large_data,24,,,true,SECURITY: crypto - ML-DSA streaming signature and verification for large data,security;crypto;owasp,Validates ML-DSA signature creation and verification including tamper detection.,0.0
 ```
 
 Observations:
 
@@ -187,9 +187,9 @@ Appends a SHA-256 content fingerprint to every emitted record. The hash is compu
 In CSV output, a `content_hash` column is appended immediately after `tags`:
 
 ```text
-fqcn,method,loc,tags,content_hash
-com.acme.tests.SampleOneTest,alpha,8,fast;crypto,3a7f9b...
-com.acme.tests.SampleOneTest,beta,6,param,3a7f9b...
+fqcn,method,loc,tags,display_name,content_hash
+com.acme.tests.SampleOneTest,alpha,8,fast;crypto,,3a7f9b...
+com.acme.tests.SampleOneTest,beta,6,param,,3a7f9b...
 ```
 
 In plain-text output, a `HASH=<value>` token is appended to each line. In SARIF output, the hash is stored as `properties.contentHash`.
@@ -499,3 +499,14 @@ Runs the prepare phase of the manual AI workflow. For each test class MethodAtla
 Runs the consume phase. MethodAtlas reads operator-filled response files and merges the AI JSON into the output CSV. Missing or empty response files are treated as absent AI data; the scan continues.
 
 For practical examples grouped by use case, see [CLI Examples](cli-examples.md).
+
+## Exit codes
+
+| Code | Condition |
+|---|---|
+| `0` | Scan completed successfully; all source files processed |
+| `1` | `-apply-tags-from-csv` aborted because the mismatch count reached or exceeded `-mismatch-limit` |
+| `1` | A source file could not be read or written during `-apply-tags-from-csv` |
+| `1` | A required argument value is missing or malformed (printed to stderr before exit) |
+
+Note: AI classification failures for individual classes (provider timeout, parse error in the AI response) do not cause a non-zero exit. The affected rows are emitted with blank AI columns and the scan continues. Only structural errors — bad arguments, mismatch-limit violations, and I/O failures during source write-back — produce exit code `1`.
@@ -94,6 +94,22 @@ security test coverage is maintained and repeated across development cycles.
 The SARIF output integrates with code scanning dashboards that provide the
 timestamped, per-commit audit trail supervisors may request.
 
+## Reproducibility and AI non-determinism
+
+MethodAtlas separates two distinct layers with different reproducibility properties.
+
+**The structural layer is fully deterministic.** Method discovery (FQCN, method name, LOC, source-level `@Tag` values, content hash) is driven entirely by JavaParser AST analysis of the source files. Given the same source revision, this layer always produces identical output, regardless of provider, model, or time.
+
+**The AI layer is non-deterministic by nature.** Language models use probabilistic sampling. Even with the same model, same source, and same prompt, a different run may produce a slightly different `ai_reason`, a different `ai_confidence` value, or — rarely — a different `securityRelevant` verdict. This is a fundamental property of all language model inference, not a defect in MethodAtlas.
+
+Two mechanisms mitigate AI non-determinism for compliance purposes:
+
+1. **`-ai-cache`** — once a class has been classified, its result is stored in a CSV indexed by SHA-256 content hash. Subsequent runs reuse the stored result without calling the provider. The scan output is therefore reproducible for all unchanged classes.
+
+2. **`-override-file`** — human-reviewed corrections are applied deterministically on every run and take precedence over AI output. An override entry sets confidence to `1.0` or `0.0`, reflecting the higher certainty of a human decision.
+
+For evidence packages submitted to assessors, the recommended practice is to treat the classified CSV (produced with `-ai -content-hash`) as the authoritative record after human review. Re-running the scan on the same commit using the same cache produces output identical to the reviewed artefact for all unchanged classes; any new or changed classes are the only source of variance.
+
 ## Further reading
 
 - [OWASP SAMM v2 — Security Testing practice](https://owaspsamm.org/model/verification/security-testing/)