Confidence scoring

When -ai-confidence is passed alongside -ai, MethodAtlas instructs the model to attach a numeric confidence score to each method classification, enabling downstream filtering and audit-evidence quality control.

When to use

Enable confidence scoring when you need to distinguish high-certainty security findings from borderline classifications — for example, when assembling an evidence package for a compliance audit or when gating a CI pipeline on only well-supported security test coverage.

See AI Providers for provider options and the -ai-confidence flag description in CLI reference.

What the score means

The score is a decimal in the range 0.0–1.0. It reflects how certain the model is that a method genuinely tests a security property — not simply how confident it is that its answer is "yes".

A score of 0.0 does not indicate uncertainty. It means the model is confident the method is not security-relevant. The absence of a high score is itself information: the model read the test body and concluded it exercises no security property.

The model is instructed to prefer securityRelevant=false over returning a score below 0.5. Low-confidence hits are suppressed rather than surfaced with a low score. In practice you will see scores clustered at 1.0, ~0.7, and 0.0.

Score reference

Score range	Meaning	Example method
`1.0`	The method name and body explicitly and unambiguously test a named security property.	`testSQLInjectionBlocked` — asserts parameterised query rejects `' OR 1=1 --` with an exception
`~0.7`	The method clearly tests a security-adjacent concern, but mapping to a specific tag requires inference from class name or surrounding tests.	`testUserInputHandling` — sanitises input but the class context is needed to confirm the injection-prevention intent
`~0.5`	The classification is plausible but ambiguous — the method name or body is equally consistent with a non-security interpretation.	`testFormSubmit` — exercises a form handler that happens to include CSRF token validation alongside unrelated UI logic
`0.0`	The method is not security-relevant — the model is confident it tests no security property.	`testNullInput` — verifies that `format(null)` returns an empty string with no security implication

Why confidence matters

In regulated environments, test coverage evidence submitted to auditors needs to be defensible. Confidence scores let you:

Filter high-confidence findings — export only rows where ai_confidence >= 0.7 for your audit evidence package.
Flag borderline classifications for human review — rows around 0.5 are candidates for manual inspection with override files.
Track classification quality over time — a sudden drop in average confidence across a module may indicate that test naming conventions have become unclear.

Example

./methodatlas -ai -ai-confidence -ai-provider ollama -ai-model qwen2.5-coder:7b /path/to/tests

Output (CSV excerpt):

fqcn,method,loc,tags,display_name,ai_security_relevant,ai_display_name,ai_tags,ai_reason,ai_interaction_score,ai_confidence
com.acme.crypto.AesGcmTest,roundTrip_encryptDecrypt,18,,,true,SECURITY: crypto - AES-GCM round-trip,security;crypto,Verifies ciphertext and plaintext integrity under AES-GCM.,0.0,1.0
com.acme.auth.SessionTest,sessionToken_isRotatedAfterLogin,12,,,true,SECURITY: auth - session token rotation after login,security;auth,Session token is replaced on successful login to prevent fixation.,0.0,0.7
com.acme.util.DateFormatterTest,format_returnsIso8601,5,,,false,,,Test verifies date formatting output only.,0.0,0.0

The third row (format_returnsIso8601) has ai_confidence=0.0 because the model is confident this method tests no security property — not because the model is uncertain.

Filtering high-confidence findings

Native threshold with `-min-confidence`

The cleanest approach is to let MethodAtlas apply the threshold during the scan, so the output only ever contains qualifying records:

# Keep only methods with ai_confidence >= 0.7 — no post-processing needed
./methodatlas -ai -ai-confidence -min-confidence 0.7 /path/to/tests

# Combine with -security-only for a compact high-confidence security inventory
./methodatlas -ai -ai-confidence -security-only -min-confidence 0.7 /path/to/tests > audit.csv

# Same threshold applied to JSON output
./methodatlas -ai -ai-confidence -json -min-confidence 0.7 /path/to/tests

The flag can also be set in the YAML configuration file so all team members and CI pipelines share the same threshold without needing to pass the flag explicitly:

minConfidence: 0.7
ai:
  enabled: true
  confidence: true
  provider: ollama
  model: qwen2.5-coder:7b

A command-line -min-confidence value always overrides the YAML setting.

Important: -min-confidence only takes effect when -ai-confidence is also enabled. Without confidence scoring every method has an implicit score of 0.0, so applying a threshold would silently drop all output. MethodAtlas therefore ignores -min-confidence unless -ai-confidence is active.

Post-processing with shell tools

When you need to filter an existing CSV file (for example, a scan result cached from a previous run), standard shell tools work:

# Keep only rows where ai_confidence >= 0.7
./methodatlas -ai -ai-confidence /src | \
  awk -F',' 'NR==1 || ($11+0) >= 0.7'

Note that the column index varies when other optional columns are active. The Python alternative using csv.DictReader (shown in CLI Examples) is more robust because it selects the column by name rather than position.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confidence scoring

When to use

What the score means

Score reference

Why confidence matters

Example

Filtering high-confidence findings

Native threshold with `-min-confidence`

Post-processing with shell tools

FilesExpand file tree

confidence.md

Latest commit

History

confidence.md

File metadata and controls

Confidence scoring

When to use

What the score means

Score reference

Why confidence matters

Example

Filtering high-confidence findings

Native threshold with -min-confidence

Post-processing with shell tools

Native threshold with `-min-confidence`